src/myspellchecker/rules/ and can be customized for specific use cases.
Rule Files Overview
| File | Purpose |
|---|---|
particles.yaml | Linguistic particles with POS tags |
typo_corrections.yaml | Common typo patterns |
morphology.yaml | Suffix/prefix patterns |
morphotactics.yaml | Morpheme combination rules |
grammar_rules.yaml | Grammar validation rules |
aspects.yaml | Verb aspect markers |
compounds.yaml | Compound word patterns |
classifiers.yaml | Numeral classifiers |
negation.yaml | Negation patterns |
register.yaml | Formal/colloquial mappings |
tone_rules.yaml | Tone mark rules |
homophones.yaml | Homophone pairs |
orthographic_corrections.yaml | Orthographic correction patterns |
rerank_rules.yaml | Suggestion re-ranking rules |
ambiguous_words.yaml | Multi-POS words |
pos_inference.yaml | POS inference patterns |
pronouns.yaml | Pronoun definitions |
collocations.yaml | Collocation error detection (wrong word near context words) |
compound_confusion.yaml | Compound word confusion patterns (ha-htoe, aspiration, consonant, suffix) |
confusable_pairs.yaml | Curated confusable word pairs for real-word error detection |
confusion_matrix.yaml | Data-driven character substitution costs from corpus analysis |
corruption_weights.yaml | Synthetic error generation weights for training |
detector_confidences.yaml | Confidence scores for text-level detectors by error type |
homophone_confusion.yaml | Context-dependent homophone disambiguation rules |
medial_confusion.yaml | Medial ya-pin / ya-yit confusion correction patterns |
medial_swap_pairs.yaml | Medial consonant swap pairs for candidate generation |
semantic_rules.yaml | Semantic validation (agent implausibility, classifier agreement) |
stacking_pairs.yaml | Valid consonant stacking pairs via virama (U+1039) |
tense_markers.yaml | Temporal adverb and sentence-final particle tense mismatch rules |
File Structure
All rule files follow a common structure:Particles (particles.yaml)
Defines Myanmar linguistic particles organized by syntactic function.Structure
POS Tags
| Tag | Description | Example |
|---|---|---|
P_PAST | Past tense | ခဲ့ |
P_FUT | Future tense | မယ်, မည် |
P_PROG | Progressive | နေ |
P_PERF | Perfective | ပြီ |
P_SUBJ | Subject marker | က |
P_OBJ | Object marker | ကို |
P_LOC | Locative | မှာ, တွင် |
P_SENT | Sentence ending | တယ်, သည် |
P_POSS | Possessive | ရဲ့, ၏ |
Formality Levels
colloquial- Spoken/informalneutral- Both formal and informalformal- Written/formalpolite- Respectful registerliterary- Literary style
Typo Corrections (typo_corrections.yaml)
Defines common Myanmar typo patterns with corrections.Structure
Error Types
| Type | Description |
|---|---|
missing_ha_htoe | Missing ှ modifier |
character_confusion | Similar looking characters |
ya_pin_ra_yit | ျ vs ြ confusion |
missing_asat | Missing ် marker |
tone_mark_error | Wrong or missing tone mark |
visual_similar | OCR-type errors |
Context Types
after_noun- Follows a nounafter_verb- Follows a verbcontext_dependent- Requires context analysisstandalone- Independent of context
Morphology (morphology.yaml)
Defines suffix and prefix patterns for POS inference.Structure
Aspects (aspects.yaml)
Defines verb aspect markers.Structure
Classifiers (classifiers.yaml)
Defines numeral classifiers for counting.Structure
Register (register.yaml)
Maps formal and colloquial equivalents.Structure
Negation (negation.yaml)
Defines negation patterns.Structure
Homophones (homophones.yaml)
Defines homophone pairs for context checking.Structure
Compounds (compounds.yaml)
Defines compound word formations.Structure
Custom Configuration
Loading Custom Rules
Extending Rules
Add custom entries by creating additional YAML files:Schema Validation
Rule files are validated against JSON schemas insrc/myspellchecker/schemas/. There is one schema file per YAML rule file (30 total, including _common.schema.json for shared definitions):
_common.schema.json, shared definitions referenced by other schemasambiguous_words.schema.jsonaspects.schema.jsonclassifiers.schema.jsoncollocations.schema.jsoncompound_confusion.schema.jsoncompounds.schema.jsonconfusable_pairs.schema.jsonconfusion_matrix.schema.jsoncorruption_weights.schema.jsondetector_confidences.schema.jsongrammar_rules.schema.jsonhomophone_confusion.schema.jsonhomophones.schema.jsonmedial_confusion.schema.jsonmedial_swap_pairs.schema.jsonmorphology.schema.jsonmorphotactics.schema.jsonnegation.schema.jsonorthographic_corrections.schema.jsonparticles.schema.jsonpos_inference.schema.jsonpronouns.schema.jsonregister.schema.jsonrerank_rules.schema.jsonsemantic_rules.schema.jsonstacking_pairs.schema.jsontense_markers.schema.jsontone_rules.schema.jsontypo_corrections.schema.json
Best Practices
- Confidence scores: Use 0.9+ for high-certainty rules, 0.7-0.9 for moderate, below 0.7 for context-dependent
- Context constraints: Always specify context when rules are position-dependent
- Examples: Include examples for documentation and testing
- Version control: Update
metadata.last_updatedwhen modifying rules - Testing: Test rule changes with representative corpus data
See Also
- Grammar Checkers - Grammar validation
- Morphology Analysis - Word structure
- Configuration - Configuration guide