Documentation Index
Fetch the complete documentation index at: https://docs.myspellchecker.com/llms.txt
Use this file to discover all available pages before exploring further.
mySpellChecker uses YAML configuration files to define linguistic rules for Myanmar language. These files are located in src/myspellchecker/rules/ and can be customized for specific use cases.
Rule Files Overview
| File | Purpose |
|---|
particles.yaml | Linguistic particles with POS tags |
typo_corrections.yaml | Common typo patterns |
morphology.yaml | Suffix/prefix patterns |
morphotactics.yaml | Morpheme combination rules |
grammar_rules.yaml | Grammar validation rules |
aspects.yaml | Verb aspect markers |
compounds.yaml | Compound word patterns |
classifiers.yaml | Numeral classifiers |
negation.yaml | Negation patterns |
register.yaml | Formal/colloquial mappings |
tone_rules.yaml | Tone mark rules |
homophones.yaml | Homophone pairs |
orthographic_corrections.yaml | Orthographic correction patterns |
rerank_rules.yaml | Suggestion re-ranking rules |
ambiguous_words.yaml | Multi-POS words |
pos_inference.yaml | POS inference patterns |
pronouns.yaml | Pronoun definitions |
collocations.yaml | Collocation error detection (wrong word near context words) |
compound_confusion.yaml | Compound word confusion patterns (ha-htoe, aspiration, consonant, suffix) |
confusable_pairs.yaml | Curated confusable word pairs for real-word error detection |
confusion_matrix.yaml | Data-driven character substitution costs from corpus analysis |
corruption_weights.yaml | Synthetic error generation weights for training |
detector_confidences.yaml | Confidence scores for text-level detectors by error type |
homophone_confusion.yaml | Context-dependent homophone disambiguation rules |
medial_confusion.yaml | Medial ya-pin / ya-yit confusion correction patterns |
medial_swap_pairs.yaml | Medial consonant swap pairs for candidate generation |
semantic_rules.yaml | Semantic validation (agent implausibility, classifier agreement) |
stacking_pairs.yaml | Valid consonant stacking pairs via virama (U+1039) |
tense_markers.yaml | Temporal adverb and sentence-final particle tense mismatch rules |
File Structure
All rule files follow a common structure:
version: "1.0.0"
category: "category_name"
description: "Description of the rule file"
metadata:
created_date: "2025-12-30"
last_updated: "2025-12-30"
total_entries: 100
source: "Source description"
# Main content section
rules:
- ...
Particles (particles.yaml)
Defines Myanmar linguistic particles organized by syntactic function.
Structure
particles:
verbs:
tense:
- particle: "ခဲ့"
pos_tag: "P_PAST"
type: "past_tense"
meaning: "Past tense marker"
formality: "neutral"
examples:
- correct: "သွားခဲ့တယ်"
translation: "went"
confidence: 0.98
aspect:
- particle: "နေ"
pos_tag: "P_PROG"
type: "progressive"
meaning: "Progressive aspect"
formality: "neutral"
confidence: 0.98
| Tag | Description | Example |
|---|
P_PAST | Past tense | ခဲ့ |
P_FUT | Future tense | မယ်, မည် |
P_PROG | Progressive | နေ |
P_PERF | Perfective | ပြီ |
P_SUBJ | Subject marker | က |
P_OBJ | Object marker | ကို |
P_LOC | Locative | မှာ, တွင် |
P_SENT | Sentence ending | တယ်, သည် |
P_POSS | Possessive | ရဲ့, ၏ |
colloquial - Spoken/informal
neutral - Both formal and informal
formal - Written/formal
polite - Respectful register
literary - Literary style
Typo Corrections (typo_corrections.yaml)
Defines common Myanmar typo patterns with corrections.
Structure
corrections:
particles:
- incorrect: "မာ"
correct: "မှာ"
error_type: "missing_ha_htoe"
context: "after_noun"
excluded_pos: ["ADJ"]
meaning: "Location particle"
confidence: 0.92
examples:
incorrect: "အိမ်မာ ရှိတယ်"
correct: "အိမ်မှာ ရှိတယ်"
medial_confusions:
- incorrect: "ကျောင်း"
correct: "ကြောင်း"
error_type: "ya_pin_ra_yit"
context: "after_verb"
pos_constraint:
preceding: ["V"]
confidence: 0.95
Error Types
| Type | Description |
|---|
missing_ha_htoe | Missing ှ modifier |
character_confusion | Similar looking characters |
ya_pin_ra_yit | ျ vs ြ confusion |
missing_asat | Missing ် marker |
tone_mark_error | Wrong or missing tone mark |
visual_similar | OCR-type errors |
Context Types
after_noun - Follows a noun
after_verb - Follows a verb
context_dependent - Requires context analysis
standalone - Independent of context
Morphology (morphology.yaml)
Defines suffix and prefix patterns for POS inference.
Structure
suffixes:
verb_suffixes:
- suffix: "ခဲ့"
pos: "V"
meaning: "past tense"
confidence: 0.9
- suffix: "သည်"
pos: "P_SENT"
meaning: "formal sentence ending"
confidence: 0.95
noun_suffixes:
- suffix: "များ"
pos: "N"
meaning: "plural"
confidence: 0.9
adverb_suffixes:
- suffix: "စွာ"
pos: "ADV"
meaning: "manner"
confidence: 0.85
Aspects (aspects.yaml)
Defines verb aspect markers.
Structure
markers:
- marker: "ပြီ"
category: "completion"
description: "Action completed"
can_combine: false
register: "neutral"
is_final: true
- marker: "နေ"
category: "progressive"
description: "Ongoing action"
can_combine: true
register: "neutral"
is_final: false
combinations:
- sequence: ["ပြီး", "သွား"]
description: "Completed and went"
invalid_sequences:
- sequence: ["ပြီ", "ပြီ"]
reason: "Duplicate completion marker"
typos:
- incorrect: "ပရီ"
correct: "ပြီ"
Classifiers (classifiers.yaml)
Defines numeral classifiers for counting.
Structure
classifiers:
people:
- word: "ယောက်"
description: "For people"
examples: ["လူ", "ကလေး", "လူကြီး"]
animals:
- word: "ကောင်"
description: "For animals"
examples: ["ခွေး", "ကြောင်", "ငါး"]
flat_objects:
- word: "ရွက်"
description: "For flat objects"
examples: ["စာရွက်", "အရွက်"]
round_objects:
- word: "လုံး"
description: "For round objects"
examples: ["ပန်းသီး", "ဘောလုံး"]
Register (register.yaml)
Maps formal and colloquial equivalents.
Structure
register_pairs:
- formal: "သည်"
colloquial: "တယ်"
category: "sentence_ending"
- formal: "၏"
colloquial: "ရဲ့"
category: "possessive"
- formal: "တွင်"
colloquial: "မှာ"
category: "locative"
formal_words:
- "သည်"
- "၏"
- "နှင့်"
colloquial_words:
- "တယ်"
- "ရဲ့"
- "နဲ့"
Negation (negation.yaml)
Defines negation patterns.
Structure
prefix: "မ"
endings:
ဘူး:
type: "standard_negative"
description: "Colloquial negative ending"
register: "colloquial"
ပါဘူး:
type: "polite_negative"
description: "Polite negative ending"
register: "polite"
နဲ့:
type: "prohibition"
description: "Don't! (prohibition)"
register: "colloquial"
ပါ:
type: "formal_negative"
description: "Formal negative ending"
register: "formal"
typo_map:
ဘူ: "ဘူး"
ဘုး: "ဘူး"
auxiliaries:
ချင်:
meaning: "want to"
နိုင်:
meaning: "can"
ရ:
meaning: "able to"
Homophones (homophones.yaml)
Defines homophone pairs for context checking.
Structure
homophones:
# Simple map: word → list of homophones
"ကား": ["ကာ"] # car vs protect/shield
"ကာ": ["ကား"]
"ကျောင်း": ["ကြောင်း"] # school vs reason
"ကြောင်း": ["ကျောင်း"]
"ကံ": ["ကန်", "ကင်"] # luck vs kick vs (rare)
"ကန်": ["ကံ", "ကင်"]
Context disambiguation is handled automatically via N-gram probabilities at the strategy level, so no per-entry disambiguation context is needed in the YAML.
Compounds (compounds.yaml)
Defines compound word formations.
Structure
prefixes:
- prefix: "အ"
type: "nominalization"
description: "Noun-forming prefix"
suffixes:
- suffix: "သူ"
type: "agent"
description: "Person who does X"
noun_compounds:
- components: ["ပန်း", "ခြံ"]
compound: "ပန်းခြံ"
meaning: "flower garden"
verb_compounds:
- components: ["စား", "သောက်"]
compound: "စားသောက်"
meaning: "dine/eat and drink"
reduplication:
- base: "ဖြေး"
reduplicated: "ဖြေးဖြေး"
meaning: "slowly"
Custom Configuration
Loading Custom Rules
from myspellchecker.grammar.config import GrammarRuleConfig
# Load from custom YAML file path
config = GrammarRuleConfig(config_path="/path/to/custom/grammar_rules.yaml")
# Access rule data
particles = config.particle_tags
morphology = config.morphology_config
aspects = config.aspects_config
Extending Rules
Add custom entries by creating additional YAML files:
# custom_particles.yaml
version: "1.0.0"
category: "particles"
particles:
custom:
- particle: "မိ"
pos_tag: "P_CUSTOM"
type: "custom_type"
meaning: "Custom particle"
confidence: 0.80
Schema Validation
Rule files are validated against JSON schemas in src/myspellchecker/schemas/. There is one schema file per YAML rule file (30 total, including _common.schema.json for shared definitions):
_common.schema.json, shared definitions referenced by other schemas
ambiguous_words.schema.json
aspects.schema.json
classifiers.schema.json
collocations.schema.json
compound_confusion.schema.json
compounds.schema.json
confusable_pairs.schema.json
confusion_matrix.schema.json
corruption_weights.schema.json
detector_confidences.schema.json
grammar_rules.schema.json
homophone_confusion.schema.json
homophones.schema.json
medial_confusion.schema.json
medial_swap_pairs.schema.json
morphology.schema.json
morphotactics.schema.json
negation.schema.json
orthographic_corrections.schema.json
particles.schema.json
pos_inference.schema.json
pronouns.schema.json
register.schema.json
rerank_rules.schema.json
semantic_rules.schema.json
stacking_pairs.schema.json
tense_markers.schema.json
tone_rules.schema.json
typo_corrections.schema.json
Best Practices
- Confidence scores: Use 0.9+ for high-certainty rules, 0.7-0.9 for moderate, below 0.7 for context-dependent
- Context constraints: Always specify context when rules are position-dependent
- Examples: Include examples for documentation and testing
- Version control: Update
metadata.last_updated when modifying rules
- Testing: Test rule changes with representative corpus data
See Also