Skip to main content
Grammar checking validates syntactic correctness using POS tags and rule-based analysis. It operates at Layer 2.5 in the validation pipeline, sitting between word validation and context checking. It catches errors where every word is spelled correctly but the sentence structure is wrong.

Why Grammar Checking?

Myanmar text can have:
  • Correct spelling but wrong particles: “သူ ကို” vs “သူ က”
  • Verb-modifier mismatches: Wrong causative or passive markers
  • Sentence structure errors: Missing required components
Grammar checking catches these errors that:
  • Pass syllable validation (valid syllables)
  • Pass word validation (valid words)
  • Fail syntactic rules

How It Works

1

POS Tagging

Text is tagged with part-of-speech labels:
text = "သူ ကျောင်း သွား သည်"
# Tags: [PRON, N, V, P_SENT]
2

Rule Application

Grammar rules check tag sequences:
# Rule: V should be followed by P_SENT at sentence end
pattern = r"V P_SENT$"
sequence = "N N V P_SENT"  # Valid

# Rule: Subject particle should follow N
pattern = r"N P_SUBJ"
sequence = "V P_SUBJ"  # Invalid - verb can't have subject particle
3

Error Generation

Invalid sequences generate grammar errors:
# Error: "V + P_SUBJ" is invalid
# Suggestion: Change P_SUBJ to appropriate particle

Grammar Rules

Particle Rules

Subject Particle (က)

# Valid: Noun + Subject particle
"သူ က" → Valid

# Invalid: Verb + Subject particle
"သွား က" → Invalid

Object Particle (ကို)

# Valid: Noun + Object particle
"စာအုပ် ကို" → Valid

# Invalid: Adjective + Object particle
"လှ ကို" → Invalid

Location Particles (မှာ, တွင်)

# Valid: Noun + Location particle
"ကျောင်း မှာ" → Valid
"မြို့ တွင်" → Valid

Verb Modifier Rules

Causative Construction

# Valid: V + causative marker
"စား စေ" → Valid (cause to eat)

# Pattern check
if followed_by(V, CAUS) and not is_compatible(V, CAUS):
    error("Verb cannot take causative marker")

Passive Construction

# Valid: V + passive marker (ခံ = undergo/receive)
"ရိုက် ခံ" → Valid (was hit)
"ဆူ ခံ" → Valid (was scolded)

Sentence Structure Rules

# Rule 1: Sentence must end with P_SENT or PUNCT
if not ends_with(sentence, [P_SENT, PUNCT, P_Q]):
    warning("Sentence may be incomplete")

# Rule 3: Missing subject marker after initial noun (3+ word sentences only)
# 2-word "Noun Verb" is valid minimal SOV — rule only fires for longer sentences
if len(words) >= 3 and is_noun(words[0]) and is_verb(words[1]):
    suggest(f"{words[0]}က")  # Suggest adding subject marker

# Rule 4: Question sentences should end with P_Q
if has_question_word(sentence) and not ends_with(sentence, P_Q):
    warning("Question should end with question particle")

Configuration

Enable Grammar Checking

from myspellchecker import SpellChecker
from myspellchecker.core.config import SpellCheckerConfig, POSTaggerConfig
from myspellchecker.providers import SQLiteProvider

config = SpellCheckerConfig(
    use_rule_based_validation=True,
    pos_tagger=POSTaggerConfig(tagger_type="viterbi"),
)
provider = SQLiteProvider(database_path="path/to/dictionary.db")
checker = SpellChecker(config=config, provider=provider)

Grammar Rule Configuration

Grammar checking is configured through GrammarEngineConfig:
from myspellchecker import SpellChecker
from myspellchecker.core.config import SpellCheckerConfig
from myspellchecker.providers import SQLiteProvider

# Configure grammar checking via SpellCheckerConfig
config = SpellCheckerConfig(
    use_rule_based_validation=True,
    # Grammar engine is automatically initialized
)

provider = SQLiteProvider(database_path="path/to/dictionary.db")
checker = SpellChecker(config=config, provider=provider)
For advanced use, access the internal SyntacticRuleChecker:
from myspellchecker.grammar.engine import SyntacticRuleChecker
from myspellchecker.core.config.grammar_configs import GrammarEngineConfig

# Create a custom grammar checker with confidence thresholds
grammar_config = GrammarEngineConfig(
    default_confidence_threshold=0.80,
    high_confidence=0.90,
    medium_confidence=0.85,
)

rule_checker = SyntacticRuleChecker(provider, grammar_config=grammar_config)

Built-in Grammar Checkers

The grammar system includes several specialized checkers:
CheckerDescription
AspectCheckerValidates verb aspect markers
ClassifierCheckerValidates numeral classifiers
CompoundCheckerValidates compound word patterns
MergedWordCheckerDetects incorrectly merged particle+verb sequences
NegationCheckerValidates negation patterns
RegisterCheckerDetects formal/informal register mixing

YAML Rules Configuration

Grammar rules are defined in YAML files located in src/myspellchecker/rules/:
FilePurpose
grammar_rules.yamlCore syntactic grammar rules
typo_corrections.yamlCommon typo patterns and corrections
particles.yamlParticle definitions with POS tags
classifiers.yamlNumeral classifier rules
register.yamlFormal/informal register rules
compounds.yamlCompound word patterns
aspects.yamlVerb aspect rules
negation.yamlNegation pattern rules
pronouns.yamlPronoun definitions
homophones.yamlHomophone confusion pairs
pos_inference.yamlPOS inference patterns
ambiguous_words.yamlAmbiguous word disambiguation
tone_rules.yamlTone mark validation rules
morphology.yamlMorphological patterns
morphotactics.yamlMorphotactic constraints
Load custom rules via GrammarRuleConfig:
from myspellchecker.grammar.config import GrammarRuleConfig

# Load custom rules from your own YAML files
config = GrammarRuleConfig(
    config_path="/path/to/custom_grammar_rules.yaml",
    typo_path="/path/to/custom_typo_corrections.yaml",
    particles_path="/path/to/particles.yaml",
    pronouns_path="/path/to/pronouns.yaml",
    classifiers_path="/path/to/classifiers.yaml",
    register_path="/path/to/register.yaml",
    homophones_path="/path/to/homophones.yaml",
    compounds_path="/path/to/compounds.yaml",
    aspects_path="/path/to/aspects.yaml",
    pos_inference_path="/path/to/pos_inference.yaml",
    ambiguous_words_path="/path/to/ambiguous_words.yaml",
    tone_rules_path="/path/to/tone_rules.yaml",
    negation_path="/path/to/negation.yaml",
    morphology_path="/path/to/morphology.yaml",
)
GrammarRuleConfig does not have a morphotactics_path parameter. The morphotactics.yaml file is loaded internally by the grammar engine.
All paths are optional. When omitted, the built-in YAML files from src/myspellchecker/rules/ are used.

Error Types and Severity

SeverityDescriptionExample
errorDefinite grammatical errorမသွားတယ် (“negation + affirmative ending”) → မသွားဘူး
warningLikely error, may be validပြီနေ (“completed before progressive”), a contradictory aspect sequence
infoStyle suggestionငါသွားပါသည် (“colloquial pronoun + formal ending”), which is register mixing

Error Response

result = checker.check("သွား က")  # Verb with subject particle

for error in result.errors:
    if error.error_type == "grammar_error":
        print(f"Type: {error.error_type}")
        print(f"Text: {error.text}")
        print(f"Suggestions: {error.suggestions}")
        print(f"Confidence: {error.confidence}")

API Reference

SyntacticRuleChecker

from myspellchecker.grammar.engine import SyntacticRuleChecker
from myspellchecker.core.config.grammar_configs import GrammarEngineConfig

# Create checker with provider (required) and optional config
grammar_config = GrammarEngineConfig()
checker = SyntacticRuleChecker(provider, grammar_config=grammar_config)

# Check word sequence for grammar errors (POS tags are looked up internally)
words = ["သူ", "ကျောင်း", "သွား", "သည်"]

corrections = checker.check_sequence(words)

for idx, error_word, suggestion in corrections:
    print(f"Position {idx}: '{error_word}' → '{suggestion}'")

Individual Grammar Checkers

from myspellchecker.grammar.checkers import AspectChecker

# Each checker returns its own specific error type:
#   AspectChecker.validate_sequence(words) -> list[AspectError]
#   ClassifierChecker.validate_sequence(words) -> list[ClassifierError]
#   CompoundChecker.validate_sequence(words) -> list[CompoundError]
#   MergedWordChecker.validate_sequence(words) -> list[MergedWordError]
#   NegationChecker.validate_sequence(words) -> list[NegationError]
#   RegisterChecker.validate_sequence(words) -> list[RegisterError]

aspect_checker = AspectChecker()
errors = aspect_checker.validate_sequence(words)

Common Patterns

Filter by Confidence

def check_with_confidence(checker: SpellChecker, text: str, min_confidence: float = 0.70) -> list:
    """Check grammar, filtering by confidence threshold."""
    result = checker.check(text)

    return [
        e for e in result.errors
        if e.error_type == "grammar_error"
        and e.confidence >= min_confidence
    ]

Grammar-Only Check

def check_grammar_only(text: str, checker: SpellChecker) -> list:
    """Check only grammar, skip spelling."""
    # Grammar checking is integrated - filter grammar errors from result
    result = checker.check(text)
    return [e for e in result.errors if e.error_type == "grammar_error"]

Report Grammar Issues

def generate_grammar_report(checker: SpellChecker, text: str) -> dict:
    """Generate detailed grammar report."""
    result = checker.check(text)

    grammar_errors = [e for e in result.errors if e.error_type == "grammar_error"]

    return {
        "total_errors": len(grammar_errors),
        "by_confidence": {
            "high": len([e for e in grammar_errors if e.confidence >= 0.90]),
            "medium": len([e for e in grammar_errors if 0.70 <= e.confidence < 0.90]),
            "low": len([e for e in grammar_errors if e.confidence < 0.70]),
        },
        "details": [
            {
                "text": e.text,
                "error_type": e.error_type,
                "suggestions": e.suggestions,
                "confidence": e.confidence,
                "position": e.position,
            }
            for e in grammar_errors
        ],
    }

Built-in Rules

POS Sequence Rules

Errors

PatternDescriptionConfidence
P_SENT P_SENTDouble sentence ending particles0.98
P_PAST P_FUTConflicting tense markers0.98
V P_NEGNegation prefix after verb (wrong order; should be မ + V)0.95
P_POSS P_SUBJPossessive + subject adjacent (noun missing between them)0.95
P_POSS P_OBJPossessive + object adjacent (noun missing between them)0.95

Warnings

PatternDescriptionConfidence
P_SUBJ P_OBJSubject + object markers adjacent0.90
P_OBJ P_SUBJObject + subject markers adjacent0.90
P_LOC P_LOCMultiple location particles0.85
P VParticle directly precedes verb0.85
PPM VPostpositional marker before verb0.75
NUM NNumber before noun without classifier0.75
NUM VNumber before verb without classifier0.75
N VNoun + verb without case particle0.75
ADJ VAdjective before verb without noun0.65

Info

PatternDescriptionConfidence
V VConsecutive verbs (may be serial verb construction)0.50
P PConsecutive particles (check compatibility)0.50
PPM PPMConsecutive postpositional markers0.50
PART PARTConsecutive particles0.50
N NConsecutive nouns (may be compound noun)0.40
PART VParticle before verb (may be auxiliary)0.40
PPM NPostpositional marker before noun (may be embedded clause)0.40
N INTNoun + interjection (vocative/exclamation)0.40
INT INTConsecutive interjections (emphatic)0.30
INT PInterjection + particle0.20

Sentence Boundary Rules

Forbidden Sentence Starts

ParticlesTypeSeverityConfidence
ကို, ရဲ့, အတွက်Object / possessive / benefactiveerror0.90
မှ, ကနေ, လို့, ဆိုလို့Source / causativeerror0.85
လည်း, တော့, ပဲConjunctivewarning0.70

Forbidden Sentence Ends

ParticlesTypeSeverityConfidence
က, ကို, နှင့်, နဲ့Case markerserror0.90
ရဲ့, , မှ, ကနေPossessive / sourceerror0.90
အတွက်, အလို့ငှာBenefactivewarning0.75

Sentence Completion Rules

PatternRequired EndingSeverityConfidence
V$ (verb at sentence end)တယ်, ပါတယ်, သည်, ပါသည်, မယ်, ပါမယ်, ပြီwarning0.80
N V (noun before verb)က, ကို, မှာ, မှ, သည် (case particle between them)warning0.75

Architecture

The Grammar Engine coordinates eight specialized checkers through SyntacticRuleChecker: SyntacticRuleChecker architecture with eight specialized checkers: Aspect, Classifier, Compound, Merged Word, Negation, Particle, Tense Agreement, and Register Each checker handles a specific grammar domain. See Grammar Checkers for details on each one.

Confidence Scoring

Grammar suggestions include confidence scores based on context:
FactorWeightDescription
Exact match0.95Exact pattern match
Verb context0.90After verb validation
Noun context0.85After noun validation
Default0.80No specific context
Context dependent0.65Ambiguous context
Confidence scores are included in the GrammarError objects returned by check():
result = checker.check("ကျောင်း သွား က")
for error in result.errors:
    if error.error_type == "grammar_error":
        print(f"Confidence: {error.confidence}")  # e.g. 0.95

Troubleshooting

Issue: Too many false positives

Cause: POS tagging errors or overly strict rules Solution:
# Use more accurate tagger
config = SpellCheckerConfig(
    pos_tagger=POSTaggerConfig(tagger_type="transformer")
)
provider = SQLiteProvider(database_path="path/to/dictionary.db")
checker = SpellChecker(config=config, provider=provider)

Issue: Missing grammar errors

Cause: Grammar checkers not enabled Solution: Enable grammar checking in config:
config = SpellCheckerConfig(
    use_rule_based_validation=True,
)
provider = SQLiteProvider(database_path="path/to/dictionary.db")
checker = SpellChecker(config=config, provider=provider)

Issue: Slow grammar checking

Cause: Complex rules or many patterns Solution: Raise confidence thresholds to reduce processing via GrammarEngineConfig:
from myspellchecker.core.config.grammar_configs import GrammarEngineConfig

# Higher thresholds mean fewer (but higher-confidence) grammar checks
grammar_config = GrammarEngineConfig(
    default_confidence_threshold=0.85,  # Only report high-confidence errors
)

Next Steps