Skip to main content
mySpellChecker uses the SymSpell (Symmetric Delete) algorithm for fast spelling correction. This document explains how SymSpell works with Myanmar language examples.

The Problem: Traditional Edit Distance

Traditional spell checking calculates the Levenshtein edit distance between an input word and every word in the dictionary:
Input: "မနမာ" (misspelled)

Compare with dictionary:
  "မြန်မာ" → edit distance = 2
  "မြန်" → edit distance = 3
  "မာန" → edit distance = 2
  ... (repeat for 100,000+ words)
Problem: This is O(N × M) where N = dictionary size, M = word length. For a 100,000-word dictionary, this means 100,000 comparisons per lookup.

The Solution: Symmetric Delete

SymSpell’s key insight: Instead of comparing words directly, pre-compute all possible deletions.

Why “Symmetric”?

If we delete characters from both the dictionary word AND the misspelled word, they’ll meet in the middle:
Dictionary word: "မြန်မာ" (Myanmar)
                      ↓ delete "ြ"
                  "မန်မာ"
                      ↑ delete "န"
Misspelled word:  "မနမာ"
Both reach the same intermediate form through deletions!

How It Works

1

Indexing (Build Time)

For each word in the dictionary, generate all possible deletions up to max_edit_distance (typically 2):
Dictionary word: "မြန်မာ" (Myanmar)

Generate deletes (distance 1):
  Delete position 0: "ြန်မာ"   (remove မ)
  Delete position 1: "မန်မာ"   (remove ြ)
  Delete position 2: "မြမာ"    (remove န်)
  Delete position 3: "မြန်မ"   (remove ာ)
  Delete position 4: "မြန်မာ"  (remove invisible char)

Generate deletes (distance 2):
  "န်မာ", "ြမာ", "မမာ", "မြမ", ...
Store in hash map:
Index = {
  "မန်မာ": ["မြန်မာ"],
  "ြန်မာ": ["မြန်မာ"],
  "မြမာ": ["မြန်မာ"],
  "န်မာ": ["မြန်မာ", "ကန်မာ"],  // Multiple words can map here
  ...
}
2

Lookup (Query Time)

When checking a misspelled word, generate its deletions too:
Input: "မနမာ" (misspelled - missing ြ and န်)

Generate deletes:
  Distance 0: "မနမာ" (the word itself)
  Distance 1: "နမာ", "မမာ", "မနာ", "မနမ"
  Distance 2: "မာ", "နာ", "မမ", ...
3

Match

Look up each deletion in the pre-built index:
Lookup "မနမာ" → Not found
Lookup "နမာ"  → Not found
Lookup "မမာ"  → Found! Points to ["မြန်မာ"]
Result: Found candidate “မြန်မာ” in O(1) hash lookup!

Visual Example: Syllable Correction

Misspelled syllable: "ကျေင်း" (wrong vowel: ေ instead of ော)
Correct syllable:    "ကျောင်း" (school)

Index contains:
┌──────────────────────────────────────────────────────┐
│  Deletion          →  Original Words                 │
├──────────────────────────────────────────────────────┤
"ကျင်း"          →  ["ကျောင်း", "ကျင်း"]          │
"ကျောင်"         →  ["ကျောင်း"]                    │
"ကျေင်း"         →  ["ကျောင်း"]  ← Direct match!   │
└──────────────────────────────────────────────────────┘

Lookup process:
1. Check "ကျေင်း" in index → Found! Candidate: "ကျောင်း"
2. Verify edit distance: "ကျေင်း""ကျောင်း" = 1 (vowel substitution)
3. Return suggestion: "ကျောင်း"

Visual Example: Word Correction

Misspelled word: "မယ်နမာ" (incorrect spelling of Myanmar)
Correct word:    "မြန်မာ"

Step 1: Generate deletions from "မယ်နမာ"
┌──────────────────────────────────────┐
│  Distance 1 deletions:               │
"ယ်နမာ", "မနမာ", "မယ်မာ", "မယ်နာ"
├──────────────────────────────────────┤
│  Distance 2 deletions:               │
"နမာ", "ယ်မာ", "မမာ", "မနာ", ...    │
└──────────────────────────────────────┘

Step 2: Lookup in index
┌─────────────────────────────────────────────────┐
"မနမာ"Not in index
"နမာ"Not in index
"မမာ"  → Found! → ["မြန်မာ"]                   │
└─────────────────────────────────────────────────┘

Step 3: Verify and rank
  Candidate: "မြန်မာ"
  Edit distance from "မယ်နမာ": 3
  Frequency: 50,000 (very common)
  → High confidence suggestion

Myanmar-Specific Considerations

Character Clusters

Myanmar characters often form clusters (consonant + medials + vowels). SymSpell treats each Unicode code point as a unit:
Word: "ကြောင်" (cat)

Unicode breakdown:
  က (U+1000) - Ka
  ြ (U+103C) - Medial Ra
  ေ (U+1031) - Vowel E
  ာ (U+102C) - Vowel Aa
  င (U+1004) - Nga
  ် (U+103A) - Asat

Deletions are per code point:
  "ြောင်" (delete က)
  "ကောင်" (delete ြ)
  "ကြင်"  (delete ေ + ာ cluster)
  ...

Common Myanmar Typos SymSpell Catches

Error TypeMisspelledCorrectEdit Distance
Missing medial”ကောင်""ကြောင်”1
Wrong medial”ကျောင်""ကြောင်”2 (substitute)
Missing asat”မြန်မာ""မြန်မာ”1
Missing tone”ကြီ""ကြီး”1
Extra medial”ကျောင့်""ကျောင်း”1

Syllable vs Word Level

mySpellChecker applies SymSpell at two levels: 1. Syllable Level (faster, catches 90% of errors):
# Syllable correction
"ကျေင်း""ကျောင်း"  # Wrong vowel fix (school)
"မြန""မြန်"          # Missing asat
2. Word Level (slower, for complex errors):
# Word correction
"မယ်နမာ""မြန်မာ"    # Multiple character errors
"ကျေးဇူ""ကျေးဇူး"    # Missing visarga

Performance Characteristics

Time Complexity

OperationTraditionalSymSpell
Single lookupO(N × M)O(1) average
Build index-O(N × M × D!)
Where:
  • N = dictionary size
  • M = average word length
  • D = max edit distance

Space Complexity

SymSpell trades memory for speed:
Dictionary size: 100,000 words
Average deletions per word: ~50 (at distance 2)
Index size: ~5 million entries

Memory usage: ~200-500 MB (depending on word lengths)

Benchmark: Myanmar Dictionary

Dictionary SizeIndex Build TimeLookup TimeMemory
10,000 words0.5s<1ms50MB
50,000 words2s<1ms150MB
100,000 words5s<1ms300MB

Configuration

SpellCheckerConfig Options

from myspellchecker import SpellChecker
from myspellchecker.core.config import SpellCheckerConfig

config = SpellCheckerConfig(
    # Maximum edit distance for suggestions
    max_edit_distance=2,  # Default: 2 (covers 99% of typos)

    # SymSpell-specific settings
    symspell=SymSpellConfig(
        prefix_length=10,  # Optimize index size (default)
    )
)

checker = SpellChecker(config=config)

Edit Distance Guidelines

DistanceCoverageSpeedUse Case
1~85%FastestReal-time typing
2~99%FastDefault, recommended
3~99.9%SlowerBatch processing

Prefix Length

The prefix_length parameter optimizes memory by only indexing the first N characters:
# prefix_length=10 (default in SymSpell class)
# Only generates deletions for first N characters
# Reduces index size by ~40% with minimal accuracy loss

# For Myanmar (shorter words), 5-7 is optimal
config = SpellCheckerConfig(
    symspell=SymSpellConfig(prefix_length=5)
)

Implementation Details

Index Structure

# Internal index structure
# Maps delete variations to sets of (original_term, edit_distance) tuples
self._deletes: Dict[str, Set[Tuple[str, int]]] = {
    "မန်မာ": {
        ("မြန်မာ", 1),  # original term, deletion distance
    },
    "ကောင်": {
        ("ကြောင်", 1),
        ("ကောင်း", 1),
    },
    ...
}

Lookup Algorithm

def lookup(word: str, max_distance: int = 2) -> list[Suggestion]:
    candidates = set()

    # Check the word itself
    if word in dictionary:
        candidates.add(word)

    # Generate and check deletions
    for deletion in generate_deletes(word, max_distance):
        if deletion in index:
            for original in index[deletion]:
                # Verify actual edit distance
                dist = edit_distance(word, original)
                if dist <= max_distance:
                    candidates.add((original, dist))

    # Rank by distance, then frequency
    return sorted(candidates, key=lambda x: (x[1], -frequency[x[0]]))

Comparison with Other Algorithms

AlgorithmLookup SpeedIndex SizeAccuracy
Brute ForceO(N×M)Small100%
BK-TreeO(log N)Medium100%
SymSpellO(1)Large100%
Norvig’sO(26^d × M)Small100%
SymSpell is the fastest for dictionary-based spell checking, making it ideal for real-time applications.

See Also