Skip to main content
Myanmar language is tonal, and tone marks (like , ) change the meaning of words completely. However, these marks are frequently omitted or misused in informal typing. The Tone Disambiguator module uses context to infer the correct tonal spelling when multiple valid dictionary words share the same base form.

The Problem

Consider the word သာ:
  1. သာ (Low tone): meaning “merely/pleasant”.
  2. သား (High tone): meaning “son”.
Or ငါ:
  1. ငါ (Low tone): Pronoun “I/Me”.
  2. ငါး (High tone): “Five” or “Fish”.
A standard spell checker might see ငါ as correct even in the sentence ငါ ကောင် (intended ငါး ကောင် - five animals), because ငါ is a valid word.

Solution: Context-Aware Disambiguation

mySpellChecker includes a rule-based disambiguator that looks at a 3-word window around ambiguous terms to decide the likely intended meaning.

Usage

The disambiguator is available as a utility:
from myspellchecker.text.tone import ToneDisambiguator

disambiguator = ToneDisambiguator()

# Check a sentence (list of words)
words = ["ငါ", "ကောင်", "စား", "မယ်"]
corrections = disambiguator.check_sentence(words)

# Result: [(index, original, correction, confidence)]
# [(0, "ငါ", "ငါး", 0.85)]

Supported Ambiguity Patterns

The system currently handles several high-frequency ambiguous clusters:
WordInterpretationContext Clues
ငါPronoun (I/Me)ငါ့, က, ကို
ငါးFish / Fiveကောင်, ရေ, ကြော်
သားSon/offspringသမီး, မိသား, သားသမီး, လင်, မယား, မိဘ, အဖေ, အမေ
Female prefixသမီး, မိန်းမ
Negative prefixဘူး, ရဘူး
ကျFall (verb)တယ်, သည်, မယ်, ခဲ့, နေ, ပြီ
ခုUnit (counter)တစ်, နှစ် (preceded by numbers)
ခုNow (temporal)အခု, ယခု

Tone Mark Correction

Beyond ambiguous words, the module also detects missing or wrong tone marks for specific high-probability patterns:
  • Question Particle: သလာသလား (e.g., စားပြီးပြီသလာစားပြီးပြီသလား)
  • Numbers: သုံသုံး (Three)
  • Numbers/Wind: လေလေး (Four/wind, context-dependent and only in numeral contexts)

Configuration

The disambiguator is integrated into the main SpellChecker validation pipeline and configured via ToneConfig:
from myspellchecker.core.config import ToneConfig

config = ToneConfig(
    context_window=3,      # Words to consider on each side (1-10)
    min_confidence=0.2,    # Minimum confidence to return a suggestion (0.0-1.0)
)
FieldDefaultDescription
context_window3Number of words to consider on each side of the ambiguous word. Larger windows provide more context but are slower.
min_confidence0.2Minimum confidence threshold. Suggestions below this are not returned.
tone_ambiguous_mapNoneOverride the default ambiguity patterns with a custom map (loaded from tone_rules.yaml via GrammarRuleConfig).
tone_errors_mapNoneOverride the default tone mark error patterns with a custom map.
The default ambiguity patterns and tone mark corrections are hard-coded in the module. To customize them, provide your own mappings via the YAML-based grammar rules system using tone_rules.yaml.