Documentation Index
Fetch the complete documentation index at: https://docs.myspellchecker.com/llms.txt
Use this file to discover all available pages before exploring further.
While N-gram context checking catches many errors through statistical probabilities, some syntactic mistakes like conflicting tense markers, missing case particles, and wrong classifier order require explicit grammatical rules. The Grammar Engine applies YAML-defined POS-sequence rules to flag these patterns.
Overview
from myspellchecker.grammar import SyntacticRuleChecker
from myspellchecker.providers import SQLiteProvider
provider = SQLiteProvider(database_path="path/to/dictionary.db")
checker = SyntacticRuleChecker(provider)
# Check word sequence
corrections = checker.check_sequence(["ကျွန်တော်", "ကျောင်း", "သွားတယ်"])
for idx, error_word, suggestion, confidence in corrections:
print(f"Position {idx}: '{error_word}' → '{suggestion}' ({confidence:.0%})")
Architecture
The Grammar Engine coordinates eight specialized checkers:
Configuration
GrammarEngineConfig
from myspellchecker.core.config import GrammarEngineConfig
config = GrammarEngineConfig(
# Confidence thresholds
high_confidence=0.90,
medium_confidence=0.85,
default_confidence_threshold=0.80,
low_confidence_threshold=0.55,
# Feature-specific thresholds
exact_match_confidence=0.95,
context_confidence_threshold=0.65,
pos_sequence_confidence=0.80,
verb_particle_confidence=0.75,
tense_marker_confidence=0.60,
sentence_final_confidence=0.70,
question_confidence=0.60,
)
GrammarRuleConfig
Load custom grammar rules from YAML:
from myspellchecker.grammar.config import GrammarRuleConfig
config = GrammarRuleConfig(config_path="custom_rules.yml")
Grammar Rules
Rule Types
| Rule Type | Description | Example |
|---|
| Particle Typos | Common particle mistakes | ဘူ → ဘူး |
| Medial Confusions | Ya-pin vs Ya-yit | ကျောင်း → ကြောင်း |
| POS Sequences | Invalid tag combinations | N-N without particle |
| Verb-Particle | Verb ending agreement | Missing tense marker |
| Sentence Structure | Sentence completeness | Missing final particle |
Particle Typo Detection
# Common particle typos loaded from config
typo_info = config.get_particle_typo("ဘူ")
# Returns: {"correction": "ဘူး", "meaning": "negative ending", "context": "after_verb"}
# Check in context
corrections = checker.check_sequence(["မ", "သွား", "ဘူ"])
# Detects: "ဘူ" → "ဘူး" (missing visarga in negative ending)
Detects common medial character confusions:
# Ya-pin (ျ) vs Ya-yit (ြ) confusion
# ကျောင်း (school) vs ကြောင်း (because)
words = ["သွား", "ကျောင်း"] # After verb "go"
corrections = checker.check_sequence(words)
# Suggests: "ကြောင်း" (because) after verb
Common Confusions:
| Confusion | Characters | Example |
|---|
| Ya-pin / Ya-yit | ျ / ြ | ကျောင်း / ကြောင်း |
| Wa-swe / Ha-htoe | ွ / ှ | နွေး (warm) / နှေး (slow) |
POS Sequence Validation
Validates POS tag sequences:
# Invalid: Two nouns without particle
words = ["ကျောင်း", "သား"] # school + son
corrections = checker.check_sequence(words)
# May suggest: "ကျောင်းသား" (student) as compound
# Valid: Noun + Subject marker + Verb
words = ["သူ", "က", "သွားတယ်"] # he + SUBJ + went
corrections = checker.check_sequence(words)
# Returns: [] (no errors)
Tag Sequence Rules:
| Sequence | Validity | Reason |
|---|
| N + V | Warning | Usually needs particle |
| V + V | Info | Except auxiliaries (SVC) |
| P_SENT + P_SENT | Error | Double sentence particles |
| N + P_SUBJ | Valid | Subject marking |
Verb-Particle Agreement
# Tense markers must follow verbs
words = ["သူ", "ခဲ့"] # he + PAST
corrections = checker.check_sequence(words)
# Flags: "ခဲ့" (past tense) should follow a verb
# Correct usage
words = ["သွား", "ခဲ့", "တယ်"] # went + PAST + declarative
corrections = checker.check_sequence(words)
# Returns: [] (no errors)
Sentence Structure Validation
# Missing sentence-final particle
words = ["သူ", "သွား"] # he went
corrections = checker.check_sequence(words)
# Suggests: "သွားတယ်" (adding declarative)
# Question without question particle
words = ["ဘယ်", "သွား", "မလဲ"]
corrections = checker.check_sequence(words)
# Validates question word with question ending
Specialized Checkers
AspectChecker
Validates aspect markers (completion, continuation):
from myspellchecker.grammar.checkers import AspectChecker
aspect_checker = AspectChecker()
errors = aspect_checker.validate_sequence(["သွား", "ပြိ"])
# Detects: "ပြိ" → "ပြီ" (completion marker typo)
Aspect Markers:
ပြီ - Completion
နေ - Continuation
ခဲ့ - Past
မယ် - Future
ClassifierChecker
Validates Myanmar numeral classifiers:
from myspellchecker.grammar.checkers import ClassifierChecker
classifier_checker = ClassifierChecker()
errors = classifier_checker.validate_sequence(["တစ်", "ယေက်"])
# Detects: "ယေက်" → "ယောက်" (person classifier)
Common Classifiers:
ယောက် - People
ကောင် - Animals
လုံး - Round objects
ခု - General objects
CompoundChecker
Validates compound words and reduplications:
from myspellchecker.grammar.checkers import CompoundChecker
compound_checker = CompoundChecker()
errors = compound_checker.validate_sequence(["ပန်", "ခြံ"])
# Detects: Missing tone mark → "ပန်းခြံ" (garden)
NegationChecker
Validates negation patterns:
from myspellchecker.grammar.checkers import NegationChecker
negation_checker = NegationChecker()
errors = negation_checker.validate_sequence(["မ", "သွား", "ဘူ"])
# Detects: "ဘူ" → "ဘူး" (negative ending)
Negation Patterns:
မ...ဘူး - Colloquial negative
မ...ပါ - Polite negative
မ... - Literary negative
RegisterChecker
Validates register consistency (formal vs colloquial):
from myspellchecker.grammar.checkers import RegisterChecker
register_checker = RegisterChecker()
errors = register_checker.validate_sequence(["သွားတယ်", "ပါသည်"])
# Warns: Mixed register (colloquial + formal)
Register Types:
- Colloquial:
တယ်, ဘူး, မယ်
- Formal:
သည်, ပါသည်, မည်, ပါမည်, ၏
Integration with SpellChecker
The Grammar Engine integrates automatically via validation strategies when rule-based validation is enabled:
from myspellchecker import SpellChecker
from myspellchecker.core.config import SpellCheckerConfig
from myspellchecker.providers import SQLiteProvider
# Enable grammar checking via use_rule_based_validation
config = SpellCheckerConfig(
use_rule_based_validation=True, # Enables grammar rules in validation
use_context_checker=True, # Context checking includes grammar strategies
)
provider = SQLiteProvider(database_path="path/to/dictionary.db")
checker = SpellChecker(config=config, provider=provider)
result = checker.check("ကျွန်တော် ကျောင်း သွားတယ်")
# Grammar errors included in result.errors
Note: Grammar engine configuration (GrammarEngineConfig) is managed internally. For advanced customization, use SyntacticRuleChecker directly with a custom config path.
Custom Rules
YAML Configuration
# custom_rules.yml
particle_typos:
"ဘူ":
correction: "ဘူး"
meaning: "negative ending"
context: "after_verb"
medial_confusions:
"ကျောင်း":
correction: "ကြောင်း"
context: "after_verb"
meaning: "because"
invalid_sequences:
- prev: "P_SENT"
curr: "P_SENT"
severity: "error"
message: "Double sentence particles"
Loading Custom Rules
checker = SyntacticRuleChecker(
provider=provider,
config_path="custom_rules.yml",
)
See Also