A word can be spelled correctly yet still be wrong. Myanmar homophones like စား (eat) and စာ (letter) pass dictionary checks but produce nonsense when swapped. Catching these errors requires looking beyond the word itself. mySpellChecker uses two complementary strategies: Syntactic Grammar Checking (Layer 2.5) applies rigid POS-based rules, while N-gram Probability (Layer 3) scores how likely a word sequence is given the surrounding context. Example (Myanmar homophone confusion):Documentation Index
Fetch the complete documentation index at: https://docs.myspellchecker.com/llms.txt
Use this file to discover all available pages before exploring further.
“သူ စား ချင်တယ်။” (He wants to eat.) Correct. “သူ စာ ချင်တယ်။” (He letter wants-to, which is ungrammatical; ချင် requires a preceding verb.) Spelling correct, context incorrect.Another example:
“ကျောင်း သား တစ်ယောက်” (A student) Correct. “ကျောင်း သာ တစ်ယောက်” (School only one person) Grammatically awkward.
Syntactic Grammar Checking
This layer uses Part-of-Speech (POS) tagging and deterministic rules to catch grammatical errors that statistical models might miss due to data sparsity.How it Works
- POS Tagging: Every word in the dictionary can optionally have a POS tag (e.g.,
Nfor Noun,Vfor Verb,Pfor Particle). - Rule Engine: A set of linguistic rules defines valid and invalid sequences.
Example Rules
-
Verb + Particle Agreement:
- Invalid:
သွား(Go/Verb) +ကျောင်း(School/Noun) → “Go School” (Grammatically awkward, missing particle) - Correction:
ကျောင်းသွားorကျောင်းကို သွား→ “Go to school”
- Invalid:
-
Nominalizer Particle
ကြောင်းvs Nounကျောင်း:သွားကြောင်း ပြောတယ်(Said that [he] went). Correct:ကြောင်းnominalizes the verb.သွားကျောင်း ပြောတယ်is invalid becauseကျောင်း(school) cannot follow a verb directly
-
Particle Selection (
မှာvsမှ):ရုံးမှာ ရှိတယ်(Is at the office). Correct:မှာindicates location.ရုံးမှ လာတယ်(Came from the office). Correct:မှindicates origin.- Rule: After Noun,
မှာtypically means “at/in”.မှtypically means “from” or marks conditional.
-
Subject Marker Agreement (
ကvsကို):သူက စာအုပ်ကို ဖတ်တယ်(He reads the book). Correct.သူကို စာအုပ်က ဖတ်တယ်(The book reads him). Semantically incorrect.- Rule: Animate subjects typically take
က; objects takeကို
-
Question Particle Matching:
ဘယ်သူလဲ(Who is it?). Correct:ဘယ်question word +လဲparticle.ဘယ်သူလားis also valid but carries a different nuance (softer question)ဘာလဲvsဘာပဲhave different meanings: “What?” vs “Whatever”
N-gram Probability
mySpellChecker uses N-gram models (Bigrams and Trigrams) to calculate the probability of word sequences.- Bigram: Probability of Word B following Word A ().
- Trigram: Probability of Word C following A and B ().
The Algorithm
-
Detection:
When the checker encounters a sequence of words, it queries the database for the frequency of that sequence.
If is below
bigram_threshold, the word is flagged as suspicious. -
Correction:
The system generates candidates for the suspicious word (using SymSpell or Phonetic matching).
It then re-calculates probabilities for each candidate in the sentence.
Example: Input sentence “သူ စာ ချင်တယ်” (suspicious word: စာ)
- Candidate “စား” (eat): (high, because it is a common verb after a pronoun)
- Candidate “စာ” (letter): (low, because it is less common as a standalone)
Advanced Strategies
The N-gram checker employs several heuristics to handle unseen data and improve accuracy:1. Backoff Smoothing (Unigram Check)
If a bigram probability is zero (unseen sequence), the checker looks at the unigram frequency of the word.- If the word is very common globally (high unigram frequency), we assume it is likely correct but used in a novel context. It is not flagged as an error.
- If the word is rare, it is more likely to be a typo.
- Input:
မြန်မာ ဂီတ(Myanmar music), a bigram unseen in the corpus ဂီတhas high unigram frequency (common word for “music”)- Result: Not flagged as error, assumed to be valid novel combination
2. Typo Heuristic
For unseen rare words, the checker searches for “neighbors” (words with Edit Distance = 1) that fit the context with high probability.- If a neighbor has a high bigram probability (), we assume the current word is a typo of that neighbor and flag it.
- Input:
စာအုပ် ဖတ်တတ်(rare/unseen wordဖတ်တတ်) - Neighbor found:
ဖတ်တယ်(reads) with Edit Distance = 1 - (high bigram probability)
- Result: Flag
ဖတ်တတ်as likely typo ofဖတ်တယ်
Tone Disambiguation
In Myanmar language, tone marks ( ့, း) drastically change the meaning of a word. Many spelling errors involve missing or incorrect tone marks (e.g., ငါ vs ငါး).
The ToneDisambiguator module uses a specialized context window to resolve these ambiguities.
How it Works
It maintains a list of Ambiguous Groups (e.g., the “Three Tones of Ka”). When it encounters a word from such a group, it checks the surrounding +/- 3 words against a set of context patterns. Example 1:သံ (Sound/Iron) vs သုံး (Three)
- Input:
သံ ယောက်(Iron person?) - Context:
ယောက်(classifier for people) follows the word. - Pattern Match: The pattern
("ယောက်", "ခု", "လုံး")is associated with the numberသုံး(Three). - Correction:
သုံး ယောက်(Three people).
ငါ (I/me) vs ငါး (Fish/Five)
- Input:
ငါ ကောင်(I animal?) - Context:
ကောင်(classifier for animals) follows the word. - Pattern Match: Classifiers for counting animals/fish follow numbers.
- Correction:
ငါး ကောင်(Five animals/fish).
အစ (Beginning) vs အစာ (Food)
- Input:
အစ စား(Eat beginning?) - Context:
စား(eat) follows, and eating requires food (အစာ) - Pattern Match: The verb
စားcollocates withအစာ(food), notအစ(beginning). - Correction:
အစာ စား(Eat food)
Common Myanmar Grammar Errors Detected
| Error Type | Example (Incorrect) | Correction | Rule Applied |
|---|---|---|---|
| Homophone confusion | စာ ချင်တယ် | စား ချင်တယ် | Context: verb pattern |
| Missing particle | ကျောင်း သွား | ကျောင်းကို သွား | Verb requires object marker |
| Wrong particle | ရုံးမှာ လာတယ် | ရုံးမှ လာတယ် | Motion verb needs မှ (from) |
| Tone mark error | သုံ ယောက် | သုံး ယောက် | Classifier context |
| Plural marker | သူတို့ သွားတယ် | သူတို့ သွားကြတယ် | Plural subject agreement |
| Complementizer | သွားကျောင်း | သွားကြောင်း | Verb complement (“that”) |