Documentation Index
Fetch the complete documentation index at: https://docs.myspellchecker.com/llms.txt
Use this file to discover all available pages before exploring further.
After syllable-level checks pass, assembled syllable sequences are looked up against the dictionary. Unknown words get correction suggestions generated via the SymSpell algorithm, ranked by edit distance and frequency.
How It Works
Syllable Assembly
After syllable validation, valid syllables are assembled into potential words:
syllables = ["မြန်", "မာ", "နိုင်", "ငံ"]
# Assembled to: ["မြန်မာ", "နိုင်ငံ"]
Dictionary Lookup
Assembled words are checked against the word dictionary:
"မြန်မာ" → Valid (in dictionary)
"နိုင်ငံ" → Valid (in dictionary)
"xyz" → Invalid (not in dictionary)
Suggestion Generation
For invalid words, SymSpell generates suggestions in O(1) time:
"နိူင်ငံ" → Suggestions: ["နိုင်ငံ"] (edit distance 1)
SymSpell Algorithm
mySpellChecker uses the Symmetric Delete algorithm for fast suggestions:
Traditional Approach (Slow)
For each dictionary word:
Calculate edit distance to input
If distance ≤ max_distance:
Add to suggestions
# Complexity: O(n * m) where n=dictionary size, m=word length
SymSpell Approach (Fast)
Pre-compute all delete variants of dictionary words
Store in hash table
For lookup:
Generate delete variants of input
Look up in hash table
Return matches
# Complexity: O(1) average lookup
Why It’s Fast
| Operation | Traditional | SymSpell |
|---|
| Single lookup | O(n × m) | O(1) |
| Scales with dictionary size | Slow (linear) | Very Fast (constant) |
Configuration
Enable Word Validation
from myspellchecker import SpellChecker
from myspellchecker.providers import SQLiteProvider
from myspellchecker.core.constants import ValidationLevel
# Create spell checker
provider = SQLiteProvider(database_path="path/to/dictionary.db")
checker = SpellChecker(provider=provider)
# Word-level validation (includes syllable) is specified per-check
result = checker.check(text, level=ValidationLevel.WORD)
Suggestion Settings
from myspellchecker import SpellChecker
from myspellchecker.core.config import SpellCheckerConfig
from myspellchecker.providers import SQLiteProvider
config = SpellCheckerConfig(
# Maximum suggestions per error
max_suggestions=10,
# Maximum edit distance for suggestions
max_edit_distance=2,
# Include phonetically similar suggestions
use_phonetic=True,
)
provider = SQLiteProvider(database_path="path/to/dictionary.db")
checker = SpellChecker(config=config, provider=provider)
SymSpell Configuration
from myspellchecker.algorithms.symspell import SymSpell
from myspellchecker.providers import SQLiteProvider
provider = SQLiteProvider("dictionary.db")
symspell = SymSpell(
provider,
max_edit_distance=2, # Max edit distance for suggestions
prefix_length=10, # Prefix length for optimization (default: 10)
count_threshold=1, # Min frequency threshold
)
symspell.build_index(["syllable", "word"]) # Build the index
Word Error Types
Unknown Word
Word not found in dictionary:
result = checker.check("အသစ်စက်စက်")
# Error: WordError for unrecognized compound
Misspelled Word
Word is close to a valid dictionary entry:
result = checker.check("နိူင်ငံ") # Typo
# Error: WordError with suggestion "နိုင်ငံ"
Compound Error
Multiple syllable errors forming invalid word:
result = checker.check("မယ်နမာ") # Multiple errors
# Error: WordError with suggestions based on similar compounds
Morphological Synthesis
Before generating errors, word validation checks if an OOV word is a productive
formation from known dictionary words. This suppresses false positives on valid
compounds and reduplications.
Reduplication Validation
Myanmar creates valid words through reduplication (repeating syllables for emphasis):
# These OOV words are accepted as valid reduplications:
"ကောင်းကောင်း" # AA: ကောင်း + ကောင်း ("well", from "good")
"သေသေချာချာ" # AABB: သေ + သေ + ချာ + ချာ ("carefully")
Supported patterns: AA, AABB, ABAB, RHYME (known pairs).
Safeguards: base must be in dictionary, frequency >= 5, POS must be V/ADJ/ADV/N.
Compound Word Synthesis
Myanmar forms compounds by joining morphemes:
# These OOV words are accepted as valid compounds:
"ကျောင်းသား" # N+N: ကျောင်း (school) + သား (child) = "student"
"စားသောက်" # V+V: စား (eat) + သောက် (drink) = "eating and drinking"
Uses dynamic programming to find optimal splits. Allowed patterns:
N+N, V+V, N+V, V+N, ADJ+N. Blocked: P+P, P+N, N+P.
Morpheme-Level Suggestions
When a compound word has a typo in one morpheme, the suggestion engine
corrects that specific morpheme instead of suggesting unrelated words:
# Input: "ကျောင်းသာ" (typo: သာ should be သား)
# Morpheme strategy detects: ကျောင်း is valid, သာ is not
# Corrects: သာ → သား via SymSpell
# Suggests: "ကျောင်းသား"
Configuration
Enable/disable morphological synthesis in ValidationConfig. Tune algorithm parameters
in the dedicated CompoundResolverConfig and ReduplicationConfig:
from myspellchecker.core.config import SpellCheckerConfig, ValidationConfig
from myspellchecker.core.config.algorithm_configs import (
CompoundResolverConfig,
ReduplicationConfig,
)
config = SpellCheckerConfig(
validation=ValidationConfig(
use_reduplication_validation=True, # Default: True
use_compound_synthesis=True, # Default: True
),
# Algorithm-level tuning for compound resolution
compound_resolver=CompoundResolverConfig(
min_morpheme_frequency=10, # Min frequency per morpheme
max_parts=4, # Max compound parts
),
# Algorithm-level tuning for reduplication detection
reduplication=ReduplicationConfig(
min_base_frequency=5, # Min base word frequency
),
)
Suggestion Ranking
The DefaultRanker scores suggestions using a multi-factor formula where lower scores indicate better suggestions:
score = (edit_distance × plausibility) - freq_bonus - phonetic_bonus
- nasal_bonus - same_nasal_bonus - pos_bonus - span_bonus
The base score starts at the edit distance, optionally scaled by a plausibility multiplier derived from Myanmar-weighted substitution costs (e.g., aspirated pairs and medial confusions get lower costs). Then several bonuses are subtracted:
| Factor | Effect | Description |
|---|
| Frequency bonus | Up to configurable ceiling | Asymptotic bonus based on corpus frequency |
| Phonetic bonus | Configurable weight | Rewards phonetically similar suggestions |
| Nasal bonus | Fixed weight | Rewards nasal variant matches (န် / ံ) |
| Same nasal bonus | Fixed weight | Rewards same nasal ending as input |
| POS fit bonus | Configurable weight | Rewards grammatically fitting suggestions (via POS bigrams) |
| Span bonus | Length-scaled | Prefers suggestions matching the error span length |
All weights are configurable via RankerConfig. Alternative rankers (FrequencyFirstRanker, PhoneticFirstRanker, EditDistanceOnlyRanker) emphasize different factors. See Suggestion Ranking for the full algorithm details.
Frequency-Based Ranking
# Higher frequency words rank higher
"နိုင်ငံ" (freq: 50000) → Higher rank
"နှိုင်ငံ" (freq: 100) → Lower rank
Edit Distance Ranking
# Lower edit distance ranks higher
"နိုင်ငံ" (distance: 1) → Higher rank
"နိမ်ငံ" (distance: 2) → Lower rank
| Metric | Value |
|---|
| Speed | Fast |
| Lookup Complexity | O(1) average |
| Suggestion Generation | O(k) where k = candidates |
Word validation is fast thanks to SymSpell’s pre-computed delete index. Memory usage scales with dictionary size.
API Reference
Using SpellChecker for Word Validation
from myspellchecker import SpellChecker
from myspellchecker.providers import SQLiteProvider
from myspellchecker.core.constants import ValidationLevel
provider = SQLiteProvider(database_path="path/to/dictionary.db")
checker = SpellChecker(provider=provider)
# Validate text at word level
result = checker.check("မြန်မာနိုင်ငံ", level=ValidationLevel.WORD)
# Check for word-level errors
for error in result.errors:
print(f"Error: {error.text}")
print(f"Suggestions: {error.suggestions}")
print(f"Confidence: {error.confidence}")
# Check if text is valid
print(f"Has errors: {result.has_errors}")
Note: Direct instantiation of WordValidator requires a DI container setup.
For most use cases, use SpellChecker.check() with level=ValidationLevel.WORD.
SymSpell Interface
from myspellchecker.algorithms.symspell import SymSpell
from myspellchecker.providers import SQLiteProvider
# Initialize with provider
provider = SQLiteProvider("dictionary.db")
symspell = SymSpell(provider, max_edit_distance=2)
# Build index before lookups
symspell.build_index(["word"])
# Lookup word
suggestions = symspell.lookup("နိူင်ငံ", level="word", max_suggestions=5)
for suggestion in suggestions:
print(f"{suggestion.term}: distance={suggestion.edit_distance}, freq={suggestion.frequency}")
# Compound lookup (for multi-word)
results = symspell.lookup_compound("နိူင် ငံ", max_edit_distance=2)
Common Patterns
Custom Word List
from myspellchecker import SpellChecker
from myspellchecker.providers import MemoryProvider
# Use MemoryProvider to add custom words
provider = MemoryProvider()
# Add domain-specific words
custom_words = ["အိုင်တီ", "ဆော့ဖ်ဝဲ", "ဒေတာဘေ့စ်"]
for word in custom_words:
provider.add_word(word, frequency=1000)
# Create checker with custom provider
checker = SpellChecker(provider=provider)
Ignore Unknown Words
def check_with_ignore_list(text: str, ignore_words: set) -> list:
"""Check text, ignoring specified words."""
result = checker.check(text)
return [
error for error in result.errors
if error.text not in ignore_words
]
# Usage
ignore = {"အိုင်တီ", "API", "HTTP"}
errors = check_with_ignore_list("API ကို သုံး", ignore)
Get Top Suggestions Only
def get_best_suggestion(word: str) -> str | None:
"""Get the single best suggestion for a word."""
result = checker.check(word)
if result.has_errors and result.errors[0].suggestions:
return result.errors[0].suggestions[0]
return None
Troubleshooting
Issue: Valid words marked as errors
Cause: Word not in dictionary
Solution: Add to dictionary:
myspellchecker build --input new_words.txt --output dictionary.db --incremental
Issue: Poor suggestions
Cause: Low corpus frequency or missing similar words
Solution: Improve corpus quality or adjust settings:
config = SpellCheckerConfig(
max_edit_distance=3, # Allow more distance
use_phonetic=True, # Enable phonetic matching
)
Issue: Slow suggestion generation
Cause: Large edit distance or dictionary
Solution: Reduce max_edit_distance:
from myspellchecker.providers import SQLiteProvider
config = SpellCheckerConfig(max_edit_distance=1) # Faster
provider = SQLiteProvider(database_path="path/to/dictionary.db")
checker = SpellChecker(config=config, provider=provider)
Next Steps