Compound Resolution
TheCompoundResolver validates OOV words by splitting them into known dictionary morphemes using dynamic programming for optimal segmentation.
Common Compound Patterns
| Pattern | Example | Meaning |
|---|---|---|
| N+N | ကျောင်း + သား → ကျောင်းသား | student |
| V+V | စား + သောက် → စားသောက် | eat and drink |
| ADJ+N | ကောင်း + ကျိုး → ကောင်းကျိုး | benefit |
| N+V | လက် + ခံ → လက်ခံ | accept |
| ADV+V | သေ + ချာ → သေချာ | careful |
How It Works
- Segment the OOV word into syllables
- Use dynamic programming to find optimal splits into dictionary morphemes
- Look up POS tags for each morpheme
- Validate the POS pattern against allowed compound patterns (from
morphotactics.yaml) - Score based on morpheme frequencies and pattern bonuses
Usage
Configuration
Parameters
| Parameter | Default | Description |
|---|---|---|
min_morpheme_frequency | 10 | Minimum corpus frequency for each morpheme |
max_parts | 4 | Maximum number of parts in a compound |
cache_size | 1024 | LRU cache entries for resolved compounds |
Morphotactic Rules
Compound POS patterns are defined inrules/morphotactics.yaml:
CompoundSplit Result
Reduplication
TheReduplicationEngine validates OOV words formed by reduplicating known dictionary words — a productive morphological process in Myanmar.
Reduplication Patterns
| Pattern | Structure | Example | Meaning |
|---|---|---|---|
| AA | Syllable repeats | ကောင်းကောင်း | ”well” (from ကောင်း “good”) |
| AABB | Each syllable doubles | သေသေချာချာ | ”very carefully” |
| ABAB | Whole word repeats | ခဏခဏ | ”frequently” |
| RHYME | Known rhyme pairs | From grammar patterns | Fixed expressions |
How It Works
- Check against known rhyme reduplication patterns (fast path)
- Segment into syllables and detect the reduplication pattern
- Extract the base word from the pattern
- Validate: base must be in dictionary with sufficient frequency
- Check POS: only V, ADJ, ADV, N can productively reduplicate
Usage
Configuration
ReduplicationResult
Integration with Word Validation
Both engines are integrated into the word validation pipeline. WhenWordValidator encounters an OOV word, it checks compound resolution and reduplication before flagging a spelling error:
See Also
- Word Validation — Dictionary + SymSpell suggestions
- Morphology Analysis — Word structure analysis
- Morpheme Suggestions — Morpheme-level correction
- Grammar Checkers — Compound word grammar checker