Overview - mySpellChecker

Grammar checking validates syntactic correctness using POS tags and rule-based analysis. It operates at Layer 2.5 in the validation pipeline, sitting between word validation and context checking. It catches errors where every word is spelled correctly but the sentence structure is wrong.

Why Grammar Checking?

Myanmar text can have:

Correct spelling but wrong particles: “သူ ကို” vs “သူ က”
Verb-modifier mismatches: Wrong causative or passive markers
Sentence structure errors: Missing required components

Grammar checking catches these errors that:

Pass syllable validation (valid syllables)
Pass word validation (valid words)
Fail syntactic rules

How It Works

POS Tagging

Text is tagged with part-of-speech labels:

text = "သူ ကျောင်း သွား သည်"
# Tags: [PRON, N, V, P_SENT]

Rule Application

Grammar rules check tag sequences:

# Rule: V should be followed by P_SENT at sentence end
pattern = r"V P_SENT$"
sequence = "N N V P_SENT"  # Valid

# Rule: Subject particle should follow N
pattern = r"N P_SUBJ"
sequence = "V P_SUBJ"  # Invalid - verb can't have subject particle

Error Generation

Invalid sequences generate grammar errors:

# Error: "V + P_SUBJ" is invalid
# Suggestion: Change P_SUBJ to appropriate particle

Grammar Rules

Particle Rules

Subject Particle (က)

# Valid: Noun + Subject particle
"သူ က" → Valid

# Invalid: Verb + Subject particle
"သွား က" → Invalid

Object Particle (ကို)

# Valid: Noun + Object particle
"စာအုပ် ကို" → Valid

# Invalid: Adjective + Object particle
"လှ ကို" → Invalid

Location Particles (မှာ, တွင်)

# Valid: Noun + Location particle
"ကျောင်း မှာ" → Valid
"မြို့ တွင်" → Valid

Verb Modifier Rules

Causative Construction

# Valid: V + causative marker
"စား စေ" → Valid (cause to eat)

# Pattern check
if followed_by(V, CAUS) and not is_compatible(V, CAUS):
    error("Verb cannot take causative marker")

Passive Construction

# Valid: V + passive marker (ခံ = undergo/receive)
"ရိုက် ခံ" → Valid (was hit)
"ဆူ ခံ" → Valid (was scolded)

Sentence Structure Rules

# Rule 1: Sentence must end with P_SENT or PUNCT
if not ends_with(sentence, [P_SENT, PUNCT, P_Q]):
    warning("Sentence may be incomplete")

# Rule 3: Missing subject marker after initial noun (3+ word sentences only)
# 2-word "Noun Verb" is valid minimal SOV — rule only fires for longer sentences
if len(words) >= 3 and is_noun(words[0]) and is_verb(words[1]):
    suggest(f"{words[0]}က")  # Suggest adding subject marker

# Rule 4: Question sentences should end with P_Q
if has_question_word(sentence) and not ends_with(sentence, P_Q):
    warning("Question should end with question particle")

Configuration

Enable Grammar Checking

from myspellchecker import SpellChecker
from myspellchecker.core.config import SpellCheckerConfig, POSTaggerConfig
from myspellchecker.providers import SQLiteProvider

config = SpellCheckerConfig(
    use_rule_based_validation=True,
    pos_tagger=POSTaggerConfig(tagger_type="viterbi"),
)
provider = SQLiteProvider(database_path="path/to/dictionary.db")
checker = SpellChecker(config=config, provider=provider)

Grammar Rule Configuration

Grammar checking is configured through GrammarEngineConfig:

from myspellchecker import SpellChecker
from myspellchecker.core.config import SpellCheckerConfig
from myspellchecker.providers import SQLiteProvider

# Configure grammar checking via SpellCheckerConfig
config = SpellCheckerConfig(
    use_rule_based_validation=True,
    # Grammar engine is automatically initialized
)

provider = SQLiteProvider(database_path="path/to/dictionary.db")
checker = SpellChecker(config=config, provider=provider)

For advanced use, access the internal SyntacticRuleChecker:

from myspellchecker.grammar.engine import SyntacticRuleChecker
from myspellchecker.core.config.grammar_configs import GrammarEngineConfig

# Create a custom grammar checker with confidence thresholds
grammar_config = GrammarEngineConfig(
    default_confidence_threshold=0.80,
    high_confidence=0.90,
    medium_confidence=0.85,
)

rule_checker = SyntacticRuleChecker(provider, grammar_config=grammar_config)

Built-in Grammar Checkers

The grammar system includes several specialized checkers:

Checker	Description
`AspectChecker`	Validates verb aspect markers
`ClassifierChecker`	Validates numeral classifiers
`CompoundChecker`	Validates compound word patterns
`MergedWordChecker`	Detects incorrectly merged particle+verb sequences
`NegationChecker`	Validates negation patterns
`RegisterChecker`	Detects formal/informal register mixing

YAML Rules Configuration

Grammar rules are defined in YAML files located in src/myspellchecker/rules/:

File	Purpose
`grammar_rules.yaml`	Core syntactic grammar rules
`typo_corrections.yaml`	Common typo patterns and corrections
`particles.yaml`	Particle definitions with POS tags
`classifiers.yaml`	Numeral classifier rules
`register.yaml`	Formal/informal register rules
`compounds.yaml`	Compound word patterns
`aspects.yaml`	Verb aspect rules
`negation.yaml`	Negation pattern rules
`pronouns.yaml`	Pronoun definitions
`homophones.yaml`	Homophone confusion pairs
`pos_inference.yaml`	POS inference patterns
`ambiguous_words.yaml`	Ambiguous word disambiguation
`tone_rules.yaml`	Tone mark validation rules
`morphology.yaml`	Morphological patterns
`morphotactics.yaml`	Morphotactic constraints

Load custom rules via GrammarRuleConfig:

from myspellchecker.grammar.config import GrammarRuleConfig

# Load custom rules from your own YAML files
config = GrammarRuleConfig(
    config_path="/path/to/custom_grammar_rules.yaml",
    typo_path="/path/to/custom_typo_corrections.yaml",
    particles_path="/path/to/particles.yaml",
    pronouns_path="/path/to/pronouns.yaml",
    classifiers_path="/path/to/classifiers.yaml",
    register_path="/path/to/register.yaml",
    homophones_path="/path/to/homophones.yaml",
    compounds_path="/path/to/compounds.yaml",
    aspects_path="/path/to/aspects.yaml",
    pos_inference_path="/path/to/pos_inference.yaml",
    ambiguous_words_path="/path/to/ambiguous_words.yaml",
    tone_rules_path="/path/to/tone_rules.yaml",
    negation_path="/path/to/negation.yaml",
    morphology_path="/path/to/morphology.yaml",
)

GrammarRuleConfig does not have a morphotactics_path parameter. The morphotactics.yaml file is loaded internally by the grammar engine.

All paths are optional. When omitted, the built-in YAML files from src/myspellchecker/rules/ are used.

Error Types and Severity

Severity	Description	Example
`error`	Definite grammatical error	မသွားတယ် (“negation + affirmative ending”) → မသွားဘူး
`warning`	Likely error, may be valid	ပြီနေ (“completed before progressive”), a contradictory aspect sequence
`info`	Style suggestion	ငါသွားပါသည် (“colloquial pronoun + formal ending”), which is register mixing

Error Response

result = checker.check("သွား က")  # Verb with subject particle

for error in result.errors:
    if error.error_type == "grammar_error":
        print(f"Type: {error.error_type}")
        print(f"Text: {error.text}")
        print(f"Suggestions: {error.suggestions}")
        print(f"Confidence: {error.confidence}")

API Reference

SyntacticRuleChecker

from myspellchecker.grammar.engine import SyntacticRuleChecker
from myspellchecker.core.config.grammar_configs import GrammarEngineConfig

# Create checker with provider (required) and optional config
grammar_config = GrammarEngineConfig()
checker = SyntacticRuleChecker(provider, grammar_config=grammar_config)

# Check word sequence for grammar errors (POS tags are looked up internally)
words = ["သူ", "ကျောင်း", "သွား", "သည်"]

corrections = checker.check_sequence(words)

for idx, error_word, suggestion in corrections:
    print(f"Position {idx}: '{error_word}' → '{suggestion}'")

Individual Grammar Checkers

from myspellchecker.grammar.checkers import AspectChecker

# Each checker returns its own specific error type:
#   AspectChecker.validate_sequence(words) -> list[AspectError]
#   ClassifierChecker.validate_sequence(words) -> list[ClassifierError]
#   CompoundChecker.validate_sequence(words) -> list[CompoundError]
#   MergedWordChecker.validate_sequence(words) -> list[MergedWordError]
#   NegationChecker.validate_sequence(words) -> list[NegationError]
#   RegisterChecker.validate_sequence(words) -> list[RegisterError]

aspect_checker = AspectChecker()
errors = aspect_checker.validate_sequence(words)

Common Patterns

Filter by Confidence

def check_with_confidence(checker: SpellChecker, text: str, min_confidence: float = 0.70) -> list:
    """Check grammar, filtering by confidence threshold."""
    result = checker.check(text)

    return [
        e for e in result.errors
        if e.error_type == "grammar_error"
        and e.confidence >= min_confidence
    ]

Grammar-Only Check

def check_grammar_only(text: str, checker: SpellChecker) -> list:
    """Check only grammar, skip spelling."""
    # Grammar checking is integrated - filter grammar errors from result
    result = checker.check(text)
    return [e for e in result.errors if e.error_type == "grammar_error"]

Report Grammar Issues

def generate_grammar_report(checker: SpellChecker, text: str) -> dict:
    """Generate detailed grammar report."""
    result = checker.check(text)

    grammar_errors = [e for e in result.errors if e.error_type == "grammar_error"]

    return {
        "total_errors": len(grammar_errors),
        "by_confidence": {
            "high": len([e for e in grammar_errors if e.confidence >= 0.90]),
            "medium": len([e for e in grammar_errors if 0.70 <= e.confidence < 0.90]),
            "low": len([e for e in grammar_errors if e.confidence < 0.70]),
        },
        "details": [
            {
                "text": e.text,
                "error_type": e.error_type,
                "suggestions": e.suggestions,
                "confidence": e.confidence,
                "position": e.position,
            }
            for e in grammar_errors
        ],
    }

Built-in Rules

POS Sequence Rules

Errors

Pattern	Description	Confidence
P_SENT P_SENT	Double sentence ending particles	0.98
P_PAST P_FUT	Conflicting tense markers	0.98
V P_NEG	Negation prefix after verb (wrong order; should be မ + V)	0.95
P_POSS P_SUBJ	Possessive + subject adjacent (noun missing between them)	0.95
P_POSS P_OBJ	Possessive + object adjacent (noun missing between them)	0.95

Warnings

Pattern	Description	Confidence
P_SUBJ P_OBJ	Subject + object markers adjacent	0.90
P_OBJ P_SUBJ	Object + subject markers adjacent	0.90
P_LOC P_LOC	Multiple location particles	0.85
P V	Particle directly precedes verb	0.85
PPM V	Postpositional marker before verb	0.75
NUM N	Number before noun without classifier	0.75
NUM V	Number before verb without classifier	0.75
N V	Noun + verb without case particle	0.75
ADJ V	Adjective before verb without noun	0.65

Info

Pattern	Description	Confidence
V V	Consecutive verbs (may be serial verb construction)	0.50
P P	Consecutive particles (check compatibility)	0.50
PPM PPM	Consecutive postpositional markers	0.50
PART PART	Consecutive particles	0.50
N N	Consecutive nouns (may be compound noun)	0.40
PART V	Particle before verb (may be auxiliary)	0.40
PPM N	Postpositional marker before noun (may be embedded clause)	0.40
N INT	Noun + interjection (vocative/exclamation)	0.40
INT INT	Consecutive interjections (emphatic)	0.30
INT P	Interjection + particle	0.20

Sentence Boundary Rules

Forbidden Sentence Starts

Particles	Type	Severity	Confidence
`ကို`, `ရဲ့`, `အတွက်`	Object / possessive / benefactive	error	0.90
`မှ`, `ကနေ`, `လို့`, `ဆိုလို့`	Source / causative	error	0.85
`လည်း`, `တော့`, `ပဲ`	Conjunctive	warning	0.70

Forbidden Sentence Ends

Particles	Type	Severity	Confidence
`က`, `ကို`, `နှင့်`, `နဲ့`	Case markers	error	0.90
`ရဲ့`, `၏`, `မှ`, `ကနေ`	Possessive / source	error	0.90
`အတွက်`, `အလို့ငှာ`	Benefactive	warning	0.75

Sentence Completion Rules

Pattern	Required Ending	Severity	Confidence
V$ (verb at sentence end)	`တယ်`, `ပါတယ်`, `သည်`, `ပါသည်`, `မယ်`, `ပါမယ်`, `ပြီ`	warning	0.80
N V (noun before verb)	`က`, `ကို`, `မှာ`, `မှ`, `သည်` (case particle between them)	warning	0.75

Architecture

The Grammar Engine coordinates eight specialized checkers through SyntacticRuleChecker:

Each checker handles a specific grammar domain. See Grammar Checkers for details on each one.

Confidence Scoring

Grammar suggestions include confidence scores based on context:

Factor	Weight	Description
Exact match	0.95	Exact pattern match
Verb context	0.90	After verb validation
Noun context	0.85	After noun validation
Default	0.80	No specific context
Context dependent	0.65	Ambiguous context

Confidence scores are included in the GrammarError objects returned by check():

result = checker.check("ကျောင်း သွား က")
for error in result.errors:
    if error.error_type == "grammar_error":
        print(f"Confidence: {error.confidence}")  # e.g. 0.95

Troubleshooting

Issue: Too many false positives

Cause: POS tagging errors or overly strict rules Solution:

# Use more accurate tagger
config = SpellCheckerConfig(
    pos_tagger=POSTaggerConfig(tagger_type="transformer")
)
provider = SQLiteProvider(database_path="path/to/dictionary.db")
checker = SpellChecker(config=config, provider=provider)

Issue: Missing grammar errors

Cause: Grammar checkers not enabled Solution: Enable grammar checking in config:

config = SpellCheckerConfig(
    use_rule_based_validation=True,
)
provider = SQLiteProvider(database_path="path/to/dictionary.db")
checker = SpellChecker(config=config, provider=provider)

Issue: Slow grammar checking

Cause: Complex rules or many patterns Solution: Raise confidence thresholds to reduce processing via GrammarEngineConfig:

from myspellchecker.core.config.grammar_configs import GrammarEngineConfig

# Higher thresholds mean fewer (but higher-confidence) grammar checks
grammar_config = GrammarEngineConfig(
    default_confidence_threshold=0.85,  # Only report high-confidence errors
)

Next Steps

POS Tagging - Underlying tagging system
Semantic Checking - AI-powered deep checking
Custom Rules - Create your own rules

​Why Grammar Checking?

​How It Works

​Grammar Rules

​Particle Rules

​Subject Particle (က)

​Object Particle (ကို)

​Location Particles (မှာ, တွင်)

​Verb Modifier Rules

​Causative Construction

​Passive Construction

​Sentence Structure Rules

​Configuration

​Enable Grammar Checking

​Grammar Rule Configuration

​Built-in Grammar Checkers

​YAML Rules Configuration

​Error Types and Severity

​Error Response

​API Reference

​SyntacticRuleChecker

​Individual Grammar Checkers

​Common Patterns

​Filter by Confidence

​Grammar-Only Check

​Report Grammar Issues

​Built-in Rules

​POS Sequence Rules

​Errors

​Warnings

​Info

​Sentence Boundary Rules

​Forbidden Sentence Starts

​Forbidden Sentence Ends

​Sentence Completion Rules

​Architecture

​Confidence Scoring

​Troubleshooting

​Issue: Too many false positives

​Issue: Missing grammar errors

​Issue: Slow grammar checking

​Next Steps

Why Grammar Checking?

How It Works

Grammar Rules

Particle Rules

Subject Particle (က)

Object Particle (ကို)

Location Particles (မှာ, တွင်)

Verb Modifier Rules

Causative Construction

Passive Construction

Sentence Structure Rules

Configuration

Enable Grammar Checking

Grammar Rule Configuration

Built-in Grammar Checkers

YAML Rules Configuration

Error Types and Severity

Error Response

API Reference

SyntacticRuleChecker

Individual Grammar Checkers

Common Patterns

Filter by Confidence

Grammar-Only Check

Report Grammar Issues

Built-in Rules

POS Sequence Rules

Errors

Warnings

Info

Sentence Boundary Rules

Forbidden Sentence Starts

Forbidden Sentence Ends

Sentence Completion Rules

Architecture

Confidence Scoring

Troubleshooting

Issue: Too many false positives

Issue: Missing grammar errors

Issue: Slow grammar checking

Next Steps