့, း) change the meaning of words completely. However, these marks are frequently omitted or misused in informal typing.
The Tone Disambiguator module uses context to infer the correct tonal spelling when multiple valid dictionary words share the same base form.
The Problem
Consider the wordသာ:
သာ(Low tone): meaning “merely/pleasant”.သား(High tone): meaning “son”.
ငါ:
ငါ(Low tone): Pronoun “I/Me”.ငါး(High tone): “Five” or “Fish”.
ငါ as correct even in the sentence ငါ ကောင် (intended ငါး ကောင် - five animals), because ငါ is a valid word.
Solution: Context-Aware Disambiguation
mySpellChecker includes a rule-based disambiguator that looks at a 3-word window around ambiguous terms to decide the likely intended meaning.
Usage
The disambiguator is available as a utility:Supported Ambiguity Patterns
The system currently handles several high-frequency ambiguous clusters:| Word | Interpretation | Context Clues |
|---|---|---|
| ငါ | Pronoun (I/Me) | ငါ့, က, ကို |
| ငါး | Fish / Five | ကောင်, ရေ, ကြော် |
| သား | Son/offspring | သမီး, မိသား, သားသမီး, လင်, မယား, မိဘ, အဖေ, အမေ |
| မ | Female prefix | သမီး, မိန်းမ |
| မ | Negative prefix | ဘူး, ရဘူး |
| ကျ | Fall (verb) | တယ်, သည်, မယ်, ခဲ့, နေ, ပြီ |
| ခု | Unit (counter) | တစ်, နှစ် (preceded by numbers) |
| ခု | Now (temporal) | အခု, ယခု |
Tone Mark Correction
Beyond ambiguous words, the module also detects missing or wrong tone marks for specific high-probability patterns:- Question Particle:
သလာ→သလား(e.g.,စားပြီးပြီသလာ→စားပြီးပြီသလား) - Numbers:
သုံ→သုံး(Three) - Numbers/Wind:
လေ→လေး(Four/wind, context-dependent and only in numeral contexts)
Configuration
The disambiguator is integrated into the mainSpellChecker validation pipeline and configured via ToneConfig:
| Field | Default | Description |
|---|---|---|
context_window | 3 | Number of words to consider on each side of the ambiguous word. Larger windows provide more context but are slower. |
min_confidence | 0.2 | Minimum confidence threshold. Suggestions below this are not returned. |
tone_ambiguous_map | None | Override the default ambiguity patterns with a custom map (loaded from tone_rules.yaml via GrammarRuleConfig). |
tone_errors_map | None | Override the default tone mark error patterns with a custom map. |
tone_rules.yaml.