Skip to main content
While N-gram context checking catches many errors through statistical probabilities, some syntactic mistakes like conflicting tense markers, missing case particles, and wrong classifier order require explicit grammatical rules. The Grammar Engine applies YAML-defined POS-sequence rules to flag these patterns.

Overview

from myspellchecker.grammar import SyntacticRuleChecker
from myspellchecker.providers import SQLiteProvider

provider = SQLiteProvider(database_path="path/to/dictionary.db")
checker = SyntacticRuleChecker(provider)

# Check word sequence
corrections = checker.check_sequence(["ကျွန်တော်", "ကျောင်း", "သွားတယ်"])
for idx, error_word, suggestion, confidence in corrections:
    print(f"Position {idx}: '{error_word}' → '{suggestion}' ({confidence:.0%})")

Architecture

The Grammar Engine coordinates eight specialized checkers: SyntacticRuleChecker Architecture

Configuration

GrammarEngineConfig

from myspellchecker.core.config import GrammarEngineConfig

config = GrammarEngineConfig(
    # Confidence thresholds
    high_confidence=0.90,
    medium_confidence=0.85,
    default_confidence_threshold=0.80,
    low_confidence_threshold=0.55,

    # Feature-specific thresholds
    exact_match_confidence=0.95,
    context_confidence_threshold=0.65,
    pos_sequence_confidence=0.80,
    verb_particle_confidence=0.75,
    tense_marker_confidence=0.60,
    sentence_final_confidence=0.70,
    question_confidence=0.60,
)

GrammarRuleConfig

Load custom grammar rules from YAML:
from myspellchecker.grammar.config import GrammarRuleConfig

config = GrammarRuleConfig(config_path="custom_rules.yml")

Grammar Rules

Rule Types

Rule TypeDescriptionExample
Particle TyposCommon particle mistakesဘူ → ဘူး
Medial ConfusionsYa-pin vs Ya-yitကျောင်း → ကြောင်း
POS SequencesInvalid tag combinationsN-N without particle
Verb-ParticleVerb ending agreementMissing tense marker
Sentence StructureSentence completenessMissing final particle

Particle Typo Detection

# Common particle typos loaded from config
typo_info = config.get_particle_typo("ဘူ")
# Returns: {"correction": "ဘူး", "meaning": "negative ending", "context": "after_verb"}

# Check in context
corrections = checker.check_sequence(["မ", "သွား", "ဘူ"])
# Detects: "ဘူ" → "ဘူး" (missing visarga in negative ending)

Medial Confusion Detection

Detects common medial character confusions:
# Ya-pin (ျ) vs Ya-yit (ြ) confusion
# ကျောင်း (school) vs ကြောင်း (because)

words = ["သွား", "ကျောင်း"]  # After verb "go"
corrections = checker.check_sequence(words)
# Suggests: "ကြောင်း" (because) after verb
Common Confusions:
ConfusionCharactersExample
Ya-pin / Ya-yitျ / ြကျောင်း / ကြောင်း
Wa-swe / Ha-htoeွ / ှနွေး (warm) / နှေး (slow)

POS Sequence Validation

Validates POS tag sequences:
# Invalid: Two nouns without particle
words = ["ကျောင်း", "သား"]  # school + son
corrections = checker.check_sequence(words)
# May suggest: "ကျောင်းသား" (student) as compound

# Valid: Noun + Subject marker + Verb
words = ["သူ", "က", "သွားတယ်"]  # he + SUBJ + went
corrections = checker.check_sequence(words)
# Returns: [] (no errors)
Tag Sequence Rules:
SequenceValidityReason
N + VWarningUsually needs particle
V + VInfoExcept auxiliaries (SVC)
P_SENT + P_SENTErrorDouble sentence particles
N + P_SUBJValidSubject marking

Verb-Particle Agreement

# Tense markers must follow verbs
words = ["သူ", "ခဲ့"]  # he + PAST
corrections = checker.check_sequence(words)
# Flags: "ခဲ့" (past tense) should follow a verb

# Correct usage
words = ["သွား", "ခဲ့", "တယ်"]  # went + PAST + declarative
corrections = checker.check_sequence(words)
# Returns: [] (no errors)

Sentence Structure Validation

# Missing sentence-final particle
words = ["သူ", "သွား"]  # he went
corrections = checker.check_sequence(words)
# Suggests: "သွားတယ်" (adding declarative)

# Question without question particle
words = ["ဘယ်", "သွား", "မလဲ"]
corrections = checker.check_sequence(words)
# Validates question word with question ending

Specialized Checkers

AspectChecker

Validates aspect markers (completion, continuation):
from myspellchecker.grammar.checkers import AspectChecker

aspect_checker = AspectChecker()
errors = aspect_checker.validate_sequence(["သွား", "ပြိ"])
# Detects: "ပြိ" → "ပြီ" (completion marker typo)
Aspect Markers:
  • ပြီ - Completion
  • နေ - Continuation
  • ခဲ့ - Past
  • မယ် - Future

ClassifierChecker

Validates Myanmar numeral classifiers:
from myspellchecker.grammar.checkers import ClassifierChecker

classifier_checker = ClassifierChecker()
errors = classifier_checker.validate_sequence(["တစ်", "ယေက်"])
# Detects: "ယေက်" → "ယောက်" (person classifier)
Common Classifiers:
  • ယောက် - People
  • ကောင် - Animals
  • လုံး - Round objects
  • ခု - General objects

CompoundChecker

Validates compound words and reduplications:
from myspellchecker.grammar.checkers import CompoundChecker

compound_checker = CompoundChecker()
errors = compound_checker.validate_sequence(["ပန်", "ခြံ"])
# Detects: Missing tone mark → "ပန်းခြံ" (garden)

NegationChecker

Validates negation patterns:
from myspellchecker.grammar.checkers import NegationChecker

negation_checker = NegationChecker()
errors = negation_checker.validate_sequence(["မ", "သွား", "ဘူ"])
# Detects: "ဘူ" → "ဘူး" (negative ending)
Negation Patterns:
  • မ...ဘူး - Colloquial negative
  • မ...ပါ - Polite negative
  • မ... - Literary negative

RegisterChecker

Validates register consistency (formal vs colloquial):
from myspellchecker.grammar.checkers import RegisterChecker

register_checker = RegisterChecker()
errors = register_checker.validate_sequence(["သွားတယ်", "ပါသည်"])
# Warns: Mixed register (colloquial + formal)
Register Types:
  • Colloquial: တယ်, ဘူး, မယ်
  • Formal: သည်, ပါသည်, မည်, ပါမည်,

Integration with SpellChecker

The Grammar Engine integrates automatically via validation strategies when rule-based validation is enabled:
from myspellchecker import SpellChecker
from myspellchecker.core.config import SpellCheckerConfig
from myspellchecker.providers import SQLiteProvider

# Enable grammar checking via use_rule_based_validation
config = SpellCheckerConfig(
    use_rule_based_validation=True,  # Enables grammar rules in validation
    use_context_checker=True,         # Context checking includes grammar strategies
)

provider = SQLiteProvider(database_path="path/to/dictionary.db")
checker = SpellChecker(config=config, provider=provider)
result = checker.check("ကျွန်တော် ကျောင်း သွားတယ်")
# Grammar errors included in result.errors
Note: Grammar engine configuration (GrammarEngineConfig) is managed internally. For advanced customization, use SyntacticRuleChecker directly with a custom config path.

Custom Rules

YAML Configuration

# custom_rules.yml
particle_typos:
  "ဘူ":
    correction: "ဘူး"
    meaning: "negative ending"
    context: "after_verb"

medial_confusions:
  "ကျောင်း":
    correction: "ကြောင်း"
    context: "after_verb"
    meaning: "because"

invalid_sequences:
  - prev: "P_SENT"
    curr: "P_SENT"
    severity: "error"
    message: "Double sentence particles"

Loading Custom Rules

checker = SyntacticRuleChecker(
    provider=provider,
    config_path="custom_rules.yml",
)

See Also