Documentation Index
Fetch the complete documentation index at: https://docs.myspellchecker.com/llms.txt
Use this file to discover all available pages before exploring further.
mySpellChecker includes eight specialized grammar checkers that target common Myanmar grammatical errors. Each checker focuses on a specific grammar domain:
| Checker | Purpose | Error Types |
|---|
| AspectChecker | Verb aspect markers | Typos, invalid sequences |
| ClassifierChecker | Numeral classifiers | Typos, agreement errors |
| CompoundChecker | Compound words | Typos, malformed compounds |
| MergedWordChecker | Merged word detection | Segmenter merge errors |
| NegationChecker | Negation patterns | Typos, missing endings |
| ParticleChecker | Particle context validation | Particle misuse |
| TenseAgreementChecker | Tense-time agreement | Tense mismatch |
| RegisterChecker | 3-way register detection | Mixed register usage |
AspectChecker
Validates Myanmar verb aspect markers that modify verbs to express temporal, modal, and aspectual meanings.
Aspect Categories
| Category | Markers | Meaning | Example |
|---|
| Completion | ပြီ, ပြီး | Action completed | သွားပြီ (went) |
| Progressive | နေ | Ongoing action | စားနေ (eating) |
| Habitual | တတ် | Habitual action | စားတတ် (eats habitually) |
| Resultative | ထား | Maintained state | ရေးထား (have written) |
| Directional | လာ, သွား | Motion direction | ပြန်လာ (come back) |
| Desiderative | ချင် | Desire/want | လာချင် (want to come) |
| Potential | နိုင်, ရ | Ability/possibility | လုပ်နိုင် (can do) |
| Immediate | လိုက် | Following action | လိုက်သွား (follow and go) |
| Experiential | ဖူး | Past experience | ရေးဖူး (have written before) |
Usage
from myspellchecker.grammar.checkers.aspect import AspectChecker
checker = AspectChecker()
# Check if word is an aspect marker
checker.is_aspect_marker("ပြီ") # True
checker.is_aspect_marker("စား") # False
# Check for typos
checker.is_aspect_typo("ပရီ") # True (typo for ပြီ)
correction = checker.get_typo_correction("ပရီ") # "ပြီ"
# Get detailed aspect info
info = checker.get_aspect_info("ပြီ")
print(info.category) # "completion"
print(info.description) # "Action completed"
print(info.is_final) # True (typically at phrase end)
# Validate aspect sequences
errors = checker.validate_sequence(["စား", "ပြီး", "သွား"])
for error in errors:
print(f"{error.text}: {error.reason}")
ClassifierChecker
Validates Myanmar numeral + classifier patterns. Myanmar uses numeral classifiers similar to Chinese/Japanese.
Pattern: Numeral + Classifier + Noun
| Numeral | Classifier | Noun | Meaning |
|---|
| သုံး | ယောက် | (person) | 3 people |
| ငါး | ကောင် | (animal) | 5 animals |
| နှစ် | အုပ် | (book) | 2 books |
| တစ် | လုံး | (round object) | 1 (round object) |
Usage
from myspellchecker.grammar.checkers.classifier import (
ClassifierChecker,
get_classifier_checker,
is_classifier,
is_numeral
)
checker = ClassifierChecker()
# Check if word is a numeral
is_numeral("သုံး") # True (three)
is_numeral("၃") # True (digit 3)
# Check if word is a classifier
is_classifier("ယောက်") # True (classifier for people)
is_classifier("လူ") # False (just a noun)
# Get classifier category
category = checker.get_classifier_category("ကောင်") # "animals"
# Check for classifier typos
typo_result = checker.check_classifier_typo("ယေက်")
if typo_result:
correction, confidence = typo_result
print(f"Correction: {correction}") # "ယောက်"
# Validate classifier usage
errors = checker.validate_sequence(["သုံး", "ယေက်", "ရှိ"])
for error in errors:
print(f"{error.word} → {error.suggestion}")
Classifier-Noun Agreement
# Get compatible classifiers for a noun
classifiers = checker.get_compatible_classifiers("ခွေး") # ["ကောင်"]
# Check classifier-noun agreement
error = checker.check_agreement(classifier="ယောက်", noun="ခွေး")
if error:
print(error.reason) # ခွေး (dog) should use ကောင် not ယောက်
CompoundChecker
Detects and validates Myanmar compound word formations.
Compound Types
| Type | Pattern | Example | Result |
|---|
| Noun-Noun | N + N | ပန်း + ခြံ | ပန်းခြံ (flower garden) |
| Verb-Verb | V + V | စား + သောက် | စားသောက် (dine) |
| Reduplication | X + X | ဖြေး → | ဖြေးဖြေး (slowly) |
| Affixed | Prefix + Root | အ + လုပ် | အလုပ် (work) |
Usage
from myspellchecker.grammar.checkers.compound import (
CompoundChecker,
get_compound_checker,
is_compound,
is_reduplication,
)
checker = CompoundChecker()
# Check if word is a recognized compound
is_compound("ပန်းခြံ") # True
# Check for reduplication
is_reduplication("ဖြေးဖြေး") # True
base = checker.get_reduplication_base("ဖြေးဖြေး") # "ဖြေး"
# Detect compound pattern
info = checker.detect_compound_pattern("အလုပ်")
if info:
print(info.compound_type) # "affixed"
print(info.components) # ["အ", "လုပ်"]
print(info.pattern) # "PREFIX(nominalization) + STEM"
Analyze Compounds
# Comprehensive compound analysis
result = checker.analyze_word("ပန်းခြံ")
print(result["is_compound"]) # True
print(result["components"]) # ["ပန်း", "ခြံ"]
print(result["has_prefix"]) # False
print(result["is_reduplication"]) # False
print(result["confidence"]) # 0.95
MergedWordChecker
Detects words that the segmenter may have incorrectly merged from a particle + verb sequence into a single compound word.
Problem
Myanmar word segmenters sometimes merge adjacent tokens when the concatenation forms a valid dictionary word:
| Input | Intended | Segmented | Issue |
|---|
| သူက စားသောကြောင့် | သူ + က + စား + သောကြောင့် | သူ + ကစား + သောကြောင့် | ”က” + “စား” merged to “ကစား” (play) |
Detection Strategy
A merged word is flagged ONLY when ALL conditions hold:
- The word is in the known ambiguous-merge set (e.g., “ကစား”)
- The preceding word is a NOUN or PRONOUN (POS: N, PRON)
- The following word is a clause-linking particle or verb-final marker
This three-way evidence requirement prevents false positives on legitimate uses.
Configuration
The checker uses a conservative confidence of 0.80 since this is a heuristic that cannot be 100% certain without semantic understanding.
from myspellchecker.grammar.checkers.merged_word import MergedWordChecker
checker = MergedWordChecker()
errors = checker.validate_sequence(words, pos_tags)
NegationChecker
Validates Myanmar negation patterns. Myanmar negation follows specific structures.
Negation Patterns
| Pattern | Structure | Example | Meaning |
|---|
| Standard | မ + verb + ဘူး | မသွားဘူး | don’t go |
| Polite | မ + verb + ပါဘူး | မသွားပါဘူး | politely don’t go |
| Prohibition | မ + verb + နဲ့ | မလုပ်နဲ့ | Don’t do! |
| Formal | မ + verb + ပါ | မရှိပါ | doesn’t exist (formal) |
Usage
from myspellchecker.grammar.checkers.negation import (
NegationChecker,
get_negation_checker,
is_negative_ending,
)
checker = NegationChecker()
# Check for negation prefix
checker.starts_with_negation("မသွား") # True
checker.starts_with_negation("သွား") # False
# Check negative endings
is_negative_ending("ဘူး") # True
is_negative_ending("တယ်") # False
# Check for ending typos
typo_result = checker.check_ending_typo("ဘူ")
if typo_result:
correction, confidence = typo_result
print(f"Correction: {correction}") # "ဘူး"
# Validate negation patterns
errors = checker.validate_sequence(["မ", "သွား", "ဘူ"])
for error in errors:
print(f"{error.word} → {error.suggestion}")
Detect Negation Patterns
# Detect negation pattern starting at a given index
pattern = checker.detect_negation_pattern(["မ", "သွား", "ဘူး"], 0)
if pattern:
print(pattern.pattern_type) # "standard_negative"
print(pattern.verb) # "သွား"
print(pattern.ending) # "ဘူး"
print(pattern.register) # "colloquial"
ParticleChecker
Validates Myanmar particle usage given verb and noun context. Myanmar particles (postpositions) must agree with the verb type and syntactic role of surrounding words.
Common Misuse Patterns
| Pattern | Incorrect | Correct | Explanation |
|---|
| Motion verb + static locative | ကျောင်းမှာ သွားတယ် | ကျောင်းကို သွားတယ် | Use ကို/သို့ with motion verbs |
| Sequential ပြီ where ပြီး needed | စားပြီ သွားတယ် | စားပြီး သွားတယ် | ပြီး links sequential actions |
| Negation + affirmative ending | မသွားတယ် | မသွားဘူး | Negated sentences need ဘူး |
Features
- Particle confusion pair detection from YAML rules (
particle_contexts.yaml)
- Verb-particle frame checking — validates verb+particle compatibility
- POS-tag-aware validation with fallback heuristics when tags unavailable
- Configurable confidence thresholds via
ParticleCheckerConfig
Usage
from myspellchecker.grammar.checkers.particle import ParticleChecker
checker = ParticleChecker()
# Validate particle usage in a word sequence
errors = checker.validate_sequence(
words=["ကျောင်း", "ကို", "ရှိ", "တယ်"],
pos_tags=["N", "PPM", "V", "SFP"] # Optional POS tags
)
for error in errors:
print(f"{error.text}: {error.reason}")
print(f" Suggestion: {error.suggestions}")
print(f" Confidence: {error.confidence}")
Singleton Access
from myspellchecker.grammar.checkers.particle import get_particle_checker
# Thread-safe singleton (loads YAML once)
checker = get_particle_checker()
YAML Configuration
Particle rules are defined in rules/particle_contexts.yaml:
particle_confusions:
- particle: "ကို"
confused_with: "မှာ"
context: "static_location"
description: "Static locative used with motion verb"
confidence: 0.70
verb_particle_frames:
- verbs: ["သွား", "လာ", "ပြန်"]
required_particles: ["ကို", "သို့"]
incompatible_particles: ["မှာ", "တွင်"]
note: "Motion verbs require directional particles"
TenseAgreementChecker
Validates that aspectual particles (sentence-final markers) agree with temporal adverbials in Myanmar sentences. When a temporal adverb indicates a specific tense, the sentence-final particle must match.
Examples
| Status | Sentence | Explanation |
|---|
| Correct | မနေ့က သွားခဲ့တယ် | yesterday + past marker |
| Incorrect | မနေ့က သွားမယ် | yesterday + future marker |
| Correct | မနက်ဖြန် သွားမယ် | tomorrow + future marker |
| Incorrect | မနက်ဖြန် သွားခဲ့တယ် | tomorrow + past marker |
Usage
from myspellchecker.grammar.checkers.tense_agreement import TenseAgreementChecker
checker = TenseAgreementChecker()
# Validate tense-time agreement
errors = checker.validate_sequence(["မနေ့က", "ကျောင်း", "သွား", "မယ်"])
for error in errors:
print(f"{error.text}: {error.reason}")
print(f" Time adverb: {error.time_adverb}")
print(f" Detected tense: {error.detected_tense}")
print(f" Suggestion: {error.suggestions}")
Checking Individual Words
# Check if a word is a temporal adverb
checker.is_time_adverb("မနေ့က") # True
checker.get_adverb_tense("မနေ့က") # "past"
# Check if a word is an aspect marker
checker.is_aspect_marker("မယ်") # True
checker.get_marker_tense("မယ်") # "future"
YAML Configuration
Tense rules are defined in rules/tense_markers.yaml:
tense_agreement_rules:
past_time_adverbs: ["မနေ့က", "တုန်းက", "အရင်က"]
future_time_adverbs: ["မနက်ဖြန်", "နောက်နှစ်", "လာမယ့်"]
past_aspect_markers: ["ခဲ့တယ်", "ခဲ့သည်"]
future_aspect_markers: ["မယ်", "မည်"]
incompatible_pairs:
- time_class: past
incompatible_aspects: ["မယ်", "မည်"]
confidence: 0.80
Configuration
from myspellchecker.core.config import TenseAgreementCheckerConfig
config = TenseAgreementCheckerConfig(
default_confidence=0.75, # Default confidence for tense mismatches
high_confidence=0.85, # When both adverb and marker are unambiguous
)
checker = TenseAgreementChecker(checker_config=config)
RegisterChecker
Validates register consistency across three tiers: formal, polite, and colloquial. Myanmar has distinct register markers at the sentence-final position, and mixing registers within a sentence is a stylistic error.
Three-Tier Register System
| Register | Sentence-Final Particles | Pronouns | Use Context |
|---|
| Formal | သည်, ၏ | သူသည် | Written prose, news, official documents |
| Polite | ပါတယ်, ပါမယ် | — | Respectful speech, customer service |
| Colloquial | တယ်, မယ် | သူ, ငါ | Casual conversation, informal writing |
Mixing Severity
| Combination | Severity | Confidence |
|---|
| Formal + Colloquial | High (strong mismatch) | 0.85 |
| Formal + Polite | Low (formality gap) | 0.65 |
| Polite + Colloquial | Medium | 0.75 |
Usage
from myspellchecker.grammar.checkers.register import RegisterChecker
checker = RegisterChecker()
# Get register of a word
info = checker.get_register("သည်")
print(info.register) # "formal"
info = checker.get_register("ပါတယ်")
print(info.register) # "polite"
info = checker.get_register("တယ်")
print(info.register) # "colloquial"
# Check register type
checker.is_formal("သည်") # True
checker.is_colloquial("တယ်") # True
checker.is_neutral("စာအုပ်") # True
Detect Sentence Register
# Detect predominant register (now returns 3-way classification)
register, consistency, infos = checker.detect_sentence_register(
["သူ", "သည်", "စာအုပ်", "ဖတ်", "တယ်"]
)
print(register) # "mixed"
print(consistency) # 0.5 (50% consistent)
Validate Register Consistency
# Check for mixed register errors
errors = checker.validate_sequence(["သူ", "သည်", "စာအုပ်", "ဖတ်", "တယ်"])
for error in errors:
print(f"{error.text}: {error.reason}")
print(f" Detected: {error.detected_register}")
print(f" Expected: {error.expected_register}")
print(f" Suggestion: {error.suggestion}")
Configuration
from myspellchecker.core.config import RegisterCheckerConfig
config = RegisterCheckerConfig(
register_mismatch_confidence=0.85, # Formal + colloquial mixing
register_formality_gap_confidence=0.65, # Formal + polite mixing
)
checker = RegisterChecker(register_config=config)
Integration with SpellChecker
All grammar checkers are automatically used when grammar checking is enabled:
from myspellchecker import SpellChecker
from myspellchecker.core.config import SpellCheckerConfig
from myspellchecker.providers import SQLiteProvider
config = SpellCheckerConfig(
use_rule_based_validation=True # Enable all grammar checkers
)
provider = SQLiteProvider(database_path="path/to/dictionary.db")
checker = SpellChecker(config=config, provider=provider)
result = checker.check("သူသည် စာအုပ် ဖတ်တယ်။")
# Grammar errors include all checker types
for error in result.errors:
if hasattr(error, 'error_type'):
print(f"Type: {error.error_type}") # aspect_error, register_error, etc.
print(f"Word: {error.text}")
print(f"Reason: {error.reason}")
Error Types Summary
| Checker | Error Types |
|---|
| AspectChecker | aspect_typo, invalid_sequence, incomplete_aspect |
| ClassifierChecker | typo, agreement, missing, invalid_pattern |
| CompoundChecker | compound_typo, invalid_compound, incomplete_reduplication |
| MergedWordChecker | merged_word |
| NegationChecker | typo, missing_ending, invalid_pattern |
| ParticleChecker | particle_misuse |
| TenseAgreementChecker | tense_mismatch |
| RegisterChecker | register_error |
See Also