Grammar Engine - mySpellChecker

While N-gram context checking catches many errors through statistical probabilities, some syntactic mistakes like conflicting tense markers, missing case particles, and wrong classifier order require explicit grammatical rules. The Grammar Engine applies YAML-defined POS-sequence rules to flag these patterns.

Overview

from myspellchecker.grammar import SyntacticRuleChecker
from myspellchecker.providers import SQLiteProvider

provider = SQLiteProvider(database_path="path/to/dictionary.db")
checker = SyntacticRuleChecker(provider)

# Check word sequence
corrections = checker.check_sequence(["ကျွန်တော်", "ကျောင်း", "သွားတယ်"])
for idx, error_word, suggestion, confidence in corrections:
    print(f"Position {idx}: '{error_word}' → '{suggestion}' ({confidence:.0%})")

Architecture

The Grammar Engine coordinates eight specialized checkers:

Configuration

GrammarEngineConfig

from myspellchecker.core.config import GrammarEngineConfig

config = GrammarEngineConfig(
    # Confidence thresholds
    high_confidence=0.90,
    medium_confidence=0.85,
    default_confidence_threshold=0.80,
    low_confidence_threshold=0.55,

    # Feature-specific thresholds
    exact_match_confidence=0.95,
    context_confidence_threshold=0.65,
    pos_sequence_confidence=0.80,
    verb_particle_confidence=0.75,
    tense_marker_confidence=0.60,
    sentence_final_confidence=0.70,
    question_confidence=0.60,
)

GrammarRuleConfig

Load custom grammar rules from YAML:

from myspellchecker.grammar.config import GrammarRuleConfig

config = GrammarRuleConfig(config_path="custom_rules.yml")

Grammar Rules

Rule Types

Rule Type	Description	Example
Particle Typos	Common particle mistakes	ဘူ → ဘူး
Medial Confusions	Ya-pin vs Ya-yit	ကျောင်း → ကြောင်း
POS Sequences	Invalid tag combinations	N-N without particle
Verb-Particle	Verb ending agreement	Missing tense marker
Sentence Structure	Sentence completeness	Missing final particle

Particle Typo Detection

# Common particle typos loaded from config
typo_info = config.get_particle_typo("ဘူ")
# Returns: {"correction": "ဘူး", "meaning": "negative ending", "context": "after_verb"}

# Check in context
corrections = checker.check_sequence(["မ", "သွား", "ဘူ"])
# Detects: "ဘူ" → "ဘူး" (missing visarga in negative ending)

Medial Confusion Detection

Detects common medial character confusions:

# Ya-pin (ျ) vs Ya-yit (ြ) confusion
# ကျောင်း (school) vs ကြောင်း (because)

words = ["သွား", "ကျောင်း"]  # After verb "go"
corrections = checker.check_sequence(words)
# Suggests: "ကြောင်း" (because) after verb

Common Confusions:

Confusion	Characters	Example
Ya-pin / Ya-yit	ျ / ြ	ကျောင်း / ကြောင်း
Wa-swe / Ha-htoe	ွ / ှ	နွေး (warm) / နှေး (slow)

POS Sequence Validation

Validates POS tag sequences:

# Invalid: Two nouns without particle
words = ["ကျောင်း", "သား"]  # school + son
corrections = checker.check_sequence(words)
# May suggest: "ကျောင်းသား" (student) as compound

# Valid: Noun + Subject marker + Verb
words = ["သူ", "က", "သွားတယ်"]  # he + SUBJ + went
corrections = checker.check_sequence(words)
# Returns: [] (no errors)

Tag Sequence Rules:

Sequence	Validity	Reason
N + V	Warning	Usually needs particle
V + V	Info	Except auxiliaries (SVC)
P_SENT + P_SENT	Error	Double sentence particles
N + P_SUBJ	Valid	Subject marking

Verb-Particle Agreement

# Tense markers must follow verbs
words = ["သူ", "ခဲ့"]  # he + PAST
corrections = checker.check_sequence(words)
# Flags: "ခဲ့" (past tense) should follow a verb

# Correct usage
words = ["သွား", "ခဲ့", "တယ်"]  # went + PAST + declarative
corrections = checker.check_sequence(words)
# Returns: [] (no errors)

Sentence Structure Validation

# Missing sentence-final particle
words = ["သူ", "သွား"]  # he went
corrections = checker.check_sequence(words)
# Suggests: "သွားတယ်" (adding declarative)

# Question without question particle
words = ["ဘယ်", "သွား", "မလဲ"]
corrections = checker.check_sequence(words)
# Validates question word with question ending

Specialized Checkers

AspectChecker

Validates aspect markers (completion, continuation):

from myspellchecker.grammar.checkers import AspectChecker

aspect_checker = AspectChecker()
errors = aspect_checker.validate_sequence(["သွား", "ပြိ"])
# Detects: "ပြိ" → "ပြီ" (completion marker typo)

Aspect Markers:

ပြီ - Completion
နေ - Continuation
ခဲ့ - Past
မယ် - Future

ClassifierChecker

Validates Myanmar numeral classifiers:

from myspellchecker.grammar.checkers import ClassifierChecker

classifier_checker = ClassifierChecker()
errors = classifier_checker.validate_sequence(["တစ်", "ယေက်"])
# Detects: "ယေက်" → "ယောက်" (person classifier)

Common Classifiers:

ယောက် - People
ကောင် - Animals
လုံး - Round objects
ခု - General objects

CompoundChecker

Validates compound words and reduplications:

from myspellchecker.grammar.checkers import CompoundChecker

compound_checker = CompoundChecker()
errors = compound_checker.validate_sequence(["ပန်", "ခြံ"])
# Detects: Missing tone mark → "ပန်းခြံ" (garden)

NegationChecker

Validates negation patterns:

from myspellchecker.grammar.checkers import NegationChecker

negation_checker = NegationChecker()
errors = negation_checker.validate_sequence(["မ", "သွား", "ဘူ"])
# Detects: "ဘူ" → "ဘူး" (negative ending)

Negation Patterns:

မ...ဘူး - Colloquial negative
မ...ပါ - Polite negative
မ... - Literary negative

RegisterChecker

Validates register consistency (formal vs colloquial):

from myspellchecker.grammar.checkers import RegisterChecker

register_checker = RegisterChecker()
errors = register_checker.validate_sequence(["သွားတယ်", "ပါသည်"])
# Warns: Mixed register (colloquial + formal)

Register Types:

Colloquial: တယ်, ဘူး, မယ်
Formal: သည်, ပါသည်, မည်, ပါမည်, ၏

Integration with SpellChecker

The Grammar Engine integrates automatically via validation strategies when rule-based validation is enabled:

from myspellchecker import SpellChecker
from myspellchecker.core.config import SpellCheckerConfig
from myspellchecker.providers import SQLiteProvider

# Enable grammar checking via use_rule_based_validation
config = SpellCheckerConfig(
    use_rule_based_validation=True,  # Enables grammar rules in validation
    use_context_checker=True,         # Context checking includes grammar strategies
)

provider = SQLiteProvider(database_path="path/to/dictionary.db")
checker = SpellChecker(config=config, provider=provider)
result = checker.check("ကျွန်တော် ကျောင်း သွားတယ်")
# Grammar errors included in result.errors

Note: Grammar engine configuration (GrammarEngineConfig) is managed internally. For advanced customization, use SyntacticRuleChecker directly with a custom config path.

Custom Rules

YAML Configuration

# custom_rules.yml
particle_typos:
  "ဘူ":
    correction: "ဘူး"
    meaning: "negative ending"
    context: "after_verb"

medial_confusions:
  "ကျောင်း":
    correction: "ကြောင်း"
    context: "after_verb"
    meaning: "because"

invalid_sequences:
  - prev: "P_SENT"
    curr: "P_SENT"
    severity: "error"
    message: "Double sentence particles"

Loading Custom Rules

checker = SyntacticRuleChecker(
    provider=provider,
    config_path="custom_rules.yml",
)

Documentation Index

​Overview

​Architecture

​Configuration

​GrammarEngineConfig

​GrammarRuleConfig

​Grammar Rules

​Rule Types

​Particle Typo Detection

​Medial Confusion Detection

​POS Sequence Validation

​Verb-Particle Agreement

​Sentence Structure Validation

​Specialized Checkers

​AspectChecker

​ClassifierChecker

​CompoundChecker

​NegationChecker

​RegisterChecker

​Integration with SpellChecker

​Custom Rules

​YAML Configuration

​Loading Custom Rules

​See Also