Skip to main content
Traditional spell checking compares an input word against every dictionary entry, which is far too slow for real-time use. SymSpell avoids this by pre-computing delete variants at build time, turning each lookup into a constant-time hash table hit. This page walks through the algorithm with Myanmar language examples.

The Problem: Traditional Edit Distance

Traditional spell checking calculates the Levenshtein edit distance between an input word and every word in the dictionary:
Input: "မနမာ" (misspelled)

Compare with dictionary:
  "မြန်မာ" → edit distance = 2
  "မြန်" → edit distance = 3
  "မာန" → edit distance = 2
  ... (repeat for 100,000+ words)
Problem: This is O(N × M) where N = dictionary size, M = word length. For a 100,000-word dictionary, this means 100,000 comparisons per lookup.

The Solution: Symmetric Delete

SymSpell’s key insight: Instead of comparing words directly, pre-compute all possible deletions.

Why “Symmetric”?

If we delete characters from both the dictionary word AND the misspelled word, they’ll meet in the middle:
Dictionary word: "မြန်မာ" (Myanmar)
                      ↓ delete "ြ"
                  "မန်မာ"
                      ↑ delete "န"
Misspelled word:  "မနမာ"
Both reach the same intermediate form through deletions!

How It Works

1

Indexing (Build Time)

For each word in the dictionary, generate all possible deletions up to max_edit_distance (typically 2):
Dictionary word: "မြန်မာ" (Myanmar)

Generate deletes (distance 1):
  Delete position 0: "ြန်မာ"   (remove မ)
  Delete position 1: "မန်မာ"   (remove ြ)
  Delete position 2: "မြမာ"    (remove န်)
  Delete position 3: "မြန်မ"   (remove ာ)
  Delete position 4: "မြန်မာ"  (remove invisible char)

Generate deletes (distance 2):
  "န်မာ", "ြမာ", "မမာ", "မြမ", ...
Store in hash map:
Index = {
  "မန်မာ": ["မြန်မာ"],
  "ြန်မာ": ["မြန်မာ"],
  "မြမာ": ["မြန်မာ"],
  "န်မာ": ["မြန်မာ", "ကန်မာ"],  // Multiple words can map here
  ...
}
2

Lookup (Query Time)

When checking a misspelled word, generate its deletions too:
Input: "မနမာ" (misspelled - missing ြ and န်)

Generate deletes:
  Distance 0: "မနမာ" (the word itself)
  Distance 1: "နမာ", "မမာ", "မနာ", "မနမ"
  Distance 2: "မာ", "နာ", "မမ", ...
3

Match

Look up each deletion in the pre-built index:
Lookup "မနမာ" → Not found
Lookup "နမာ"  → Not found
Lookup "မမာ"  → Found! Points to ["မြန်မာ"]
Result: Found candidate “မြန်မာ” in O(1) hash lookup!

Visual Example: Syllable Correction

Misspelled syllable: "ကျေင်း" (wrong vowel: ေ instead of ော)
Correct syllable:    "ကျောင်း" (school)

Index contains:
┌──────────────────────────────────────────────────────┐
│  Deletion          →  Original Words                 │
├──────────────────────────────────────────────────────┤
"ကျင်း"          →  ["ကျောင်း", "ကျင်း"]          │
"ကျောင်"         →  ["ကျောင်း"]                    │
"ကျေင်း"         →  ["ကျောင်း"]  ← Direct match!   │
└──────────────────────────────────────────────────────┘

Lookup process:
1. Check "ကျေင်း" in index → Found! Candidate: "ကျောင်း"
2. Verify edit distance: "ကျေင်း""ကျောင်း" = 1 (vowel substitution)
3. Return suggestion: "ကျောင်း"

Visual Example: Word Correction

Misspelled word: "မယ်နမာ" (incorrect spelling of Myanmar)
Correct word:    "မြန်မာ"

Step 1: Generate deletions from "မယ်နမာ"
┌──────────────────────────────────────┐
│  Distance 1 deletions:               │
"ယ်နမာ", "မနမာ", "မယ်မာ", "မယ်နာ"
├──────────────────────────────────────┤
│  Distance 2 deletions:               │
"နမာ", "ယ်မာ", "မမာ", "မနာ", ...    │
└──────────────────────────────────────┘

Step 2: Lookup in index
┌─────────────────────────────────────────────────┐
"မနမာ"Not in index
"နမာ"Not in index
"မမာ"  → Found! → ["မြန်မာ"]                   │
└─────────────────────────────────────────────────┘

Step 3: Verify and rank
  Candidate: "မြန်မာ"
  Edit distance from "မယ်နမာ": 3
  Frequency: 50,000 (very common)
  → High confidence suggestion

Myanmar-Specific Considerations

Character Clusters

Myanmar characters often form clusters (consonant + medials + vowels). SymSpell treats each Unicode code point as a unit:
Word: "ကြောင်" (cat)

Unicode breakdown:
  က (U+1000) - Ka
  ြ (U+103C) - Medial Ra
  ေ (U+1031) - Vowel E
  ာ (U+102C) - Vowel Aa
  င (U+1004) - Nga
  ် (U+103A) - Asat

Deletions are per code point:
  "ြောင်" (delete က)
  "ကောင်" (delete ြ)
  "ကြာင်"  (delete ေ, distance 1)
  "ကြေင်" (delete ာ, distance 1)
  ...

Common Myanmar Typos SymSpell Catches

Error TypeMisspelledCorrectEdit Distance
Missing medial”ကောင်""ကြောင်”1
Wrong medial”ကျောင်""ကြောင်”2 (substitute)
Missing asat”မြနမာ""မြန်မာ”1
Missing tone”ကြီ""ကြီး”1
Extra medial”ကျောင့်""ကျောင်း”1

Syllable vs Word Level

mySpellChecker applies SymSpell at two levels: 1. Syllable Level (faster, catches 90% of errors):
# Syllable correction
"ကျေင်း""ကျောင်း"  # Wrong vowel fix (school)
"မြန""မြန်"          # Missing asat
2. Word Level (slower, for complex errors):
# Word correction
"မယ်နမာ""မြန်မာ"    # Multiple character errors
"ကျေးဇူ""ကျေးဇူး"    # Missing visarga

Performance Characteristics

Time Complexity

OperationTraditionalSymSpell
Single lookupO(N × M)O(1) average
Build index-O(V × L^d)
Where:
  • V = vocabulary size
  • L = average term length
  • d = max edit distance

Space Complexity

SymSpell trades memory for speed:
Dictionary size: 100,000 words
Average deletions per word: ~50 (at distance 2)
Index size: ~5 million entries

Memory usage: ~50-300 MB (varies by dictionary size and max_edit_distance)

Benchmark: Myanmar Dictionary

Dictionary SizeIndex Build TimeLookup TimeMemory
10,000 words0.5s<1ms~15MB
50,000 words2s<1ms~50MB
100,000 words5s<1ms~50-100MB
Source docstring notes: “Typical Myanmar corpus (100K terms, d=2): ~50-100MB index.” Memory grows with max_edit_distance — values above 2 cause exponential growth.

Configuration

SpellCheckerConfig Options

from myspellchecker import SpellChecker
from myspellchecker.core.config import SpellCheckerConfig, SymSpellConfig

config = SpellCheckerConfig(
    # Maximum edit distance for suggestions
    max_edit_distance=2,  # Default: 2 (covers 99% of typos)

    # SymSpell-specific settings
    symspell=SymSpellConfig(
        prefix_length=10,  # Optimize index size (default)
    )
)

checker = SpellChecker(config=config)

Edit Distance Guidelines

DistanceCoverageSpeedUse Case
1~85%FastestReal-time typing
2~99%FastDefault, recommended
3~99.9%SlowerBatch processing

Prefix Length

The prefix_length parameter optimizes memory by only indexing the first N characters:
# prefix_length=10 (default in SymSpell class)
# Only generates deletions for first N characters
# Reduces index size by ~40% with minimal accuracy loss

# For Myanmar (shorter words), 5-7 is optimal
config = SpellCheckerConfig(
    symspell=SymSpellConfig(prefix_length=5)
)

Implementation Details

Index Structure

# Internal index structure
# Maps delete variations to sets of (original_term, edit_distance) tuples
self._deletes: Dict[str, Set[Tuple[str, int]]] = {
    "မန်မာ": {
        ("မြန်မာ", 1),  # original term, deletion distance
    },
    "ကောင်": {
        ("ကြောင်", 1),
        ("ကောင်း", 1),
    },
    ...
}

Lookup Algorithm

def lookup(word: str, max_distance: int = 2) -> list[Suggestion]:
    candidates = set()

    # Check the word itself
    if word in dictionary:
        candidates.add(word)

    # Generate and check deletions
    for deletion in generate_deletes(word, max_distance):
        if deletion in index:
            for original in index[deletion]:
                # Verify actual edit distance
                dist = edit_distance(word, original)
                if dist <= max_distance:
                    candidates.add((original, dist))

    # Rank using multi-factor scoring (edit distance, frequency,
    # phonetic similarity, syllable match, etc.) via SuggestionRanker
    return ranker.rank(candidates)

Suggestion Dataclass

Each suggestion returned by lookup() includes:
FieldTypeDescription
termstrThe suggested correction
edit_distanceintDamerau-Levenshtein distance from input
frequencyintCorpus frequency of the term
phonetic_scorefloatPhonetic similarity score (0.0-1.0)
syllable_distancefloat | NoneMyanmar syllable-aware weighted distance
weighted_distancefloat | NoneMyanmar-weighted edit distance using substitution costs
is_nasal_variantboolTrue if nasal ending difference only (န်↔ံ)
has_same_nasal_endingboolTrue if same nasal consonant as input

Additional Methods

MethodDescription
build_index(levels)Build delete index for specified levels (["syllable"], ["word"], or both)
lookup_compound(term)Compound word segmentation with suggestions
compute_frequency_denominator(provider, level, percentile)Static method to compute optimal frequency denominator from corpus statistics

Comparison with Other Algorithms

AlgorithmLookup SpeedIndex SizeAccuracy
Brute ForceO(N×M)Small100%
BK-TreeO(log N)Medium100%
SymSpellO(1)Large100%
Norvig’sO(26^d × M)Small100%
SymSpell is the fastest for dictionary-based spell checking, making it ideal for real-time applications.

See Also