SpellChecker uses a mixin-based decomposition to organize detection and suggestion logic into focused modules while preserving a single public API surface:
SpellChecker (core/spellchecker.py)
PreNormalizationDetectorsMixin
11 pre-normalization detectors
Run before text normalization
PostNormalizationDetectorsMixin
38 ordered detectors (via detection_registry.py)
Particle confusion, medial confusion, compound typos, etc.
The post-normalization detection pipeline is controlled by an ordered registry in core/detection_registry.py. Each entry maps to a _detect_* method inherited from detector mixins:
POST_NORM_DETECTOR_SEQUENCE = ( # Stacking and structural errors (run first) "_detect_broken_stacking", # Asat→virama in Pali words "_detect_missing_stacking", # Missing Pali/Sanskrit virama stacking "_detect_missing_asat", # Missing asat on normalized text "_detect_missing_visarga_suffix", # Missing visarga in clause-linker suffixes "_detect_missing_visarga_in_compound", # Missing visarga inside compound words # Medial and particle confusion "_detect_medial_confusion", # Medial ya-pin/ya-yit confusion "_detect_colloquial_contractions", # Colloquial contraction detection "_detect_particle_confusion", # Particle confusion (ကိ/ကု → ကို) "_detect_compound_confusion_typos", # Compound confusion (ha-htoe + aspirated) "_detect_suffix_confusion_typos", # Suffix confusion on invalid compounds # Token repair and frequency-based correction "_detect_invalid_token_with_strong_candidates", # Invalid token repair via strong DB candidates "_detect_frequency_dominant_valid_variants", # Valid-token variant correction via frequency + semantic "_detect_broken_compound_morpheme", # Broken compound morpheme (ed-1 variant) "_detect_missegmented_confusable", # Confusable errors hidden by segmentation # Particle and diacritic errors "_detect_ha_htoe_particle_typos", # Ha-htoe particle confusion (မာ → မှာ) "_detect_aukmyit_confusion", # Aukmyit confusion (ထည် → ထည့်) "_detect_extra_aukmyit_confusion", # Extra aukmyit (ပြော့ → ပြော) "_detect_sequential_particle_confusion",# Sequential particle (တော် → တော့) "_detect_particle_misuse", # Particle misuse via verb-frame (ကို → မှ/မှာ/တွင်) # Context-aware detectors "_detect_homophone_left_context", # Homophone left-context (ဖက် → ဖတ်) "_detect_collocation_errors", # Collocation error (wrong word partner) "_detect_semantic_agent_implausibility",# Non-human subject implausibility "_detect_merged_classifier_mismatch", # Merged NUM+classifier mismatch # Sentence-level detectors "_detect_dangling_particles", # Dangling sentence-end particles "_detect_sentence_structure_issues", # Dangling word, missing conjunction "_detect_tense_mismatch", # Temporal adverb vs particle mismatch "_detect_formal_yi_in_colloquial_context", # Verb+၏ in colloquial context "_detect_negation_sfp_mismatch", # Negation pattern mismatch "_detect_merged_sfp_conjunction", # Merged SFP + conjunction "_detect_missing_visarga", # Missing visarga (း) via frequency ratio # Register and style "_detect_register_mixing", # Formal/colloquial register mixing "_detect_informal_with_honorific", # Informal particle + honorific "_detect_informal_h_after_completive", # Terse ဟ after completive # Post-processing detectors "_detect_vowel_after_asat", # Vowel after asat (ကျွန်ုတော် → ကျွန်တော်) "_detect_missing_diacritic_in_compound",# Missing anusvara/dot-below "_detect_unknown_compound_segments", # Unknown freq=0 compound segments "_detect_broken_compound_space", # Space inside compound word "_detect_punctuation_errors", # Punctuation errors (lowest priority))
Ordering is intentional. For example, broken_stacking must run before colloquial_contractions to prevent stacking errors from being claimed as colloquial variants.