Grammar checking validates syntactic correctness using POS tags and rule-based analysis. It operates at Layer 2.5 in the validation pipeline, sitting between word validation and context checking. It catches errors where every word is spelled correctly but the sentence structure is wrong.
Why Grammar Checking?
Myanmar text can have:
- Correct spelling but wrong particles: “သူ ကို” vs “သူ က”
- Verb-modifier mismatches: Wrong causative or passive markers
- Sentence structure errors: Missing required components
Grammar checking catches these errors that:
- Pass syllable validation (valid syllables)
- Pass word validation (valid words)
- Fail syntactic rules
How It Works
POS Tagging
Text is tagged with part-of-speech labels:text = "သူ ကျောင်း သွား သည်"
# Tags: [PRON, N, V, P_SENT]
Rule Application
Grammar rules check tag sequences:# Rule: V should be followed by P_SENT at sentence end
pattern = r"V P_SENT$"
sequence = "N N V P_SENT" # Valid
# Rule: Subject particle should follow N
pattern = r"N P_SUBJ"
sequence = "V P_SUBJ" # Invalid - verb can't have subject particle
Error Generation
Invalid sequences generate grammar errors:# Error: "V + P_SUBJ" is invalid
# Suggestion: Change P_SUBJ to appropriate particle
Grammar Rules
Particle Rules
Subject Particle (က)
# Valid: Noun + Subject particle
"သူ က" → Valid
# Invalid: Verb + Subject particle
"သွား က" → Invalid
Object Particle (ကို)
# Valid: Noun + Object particle
"စာအုပ် ကို" → Valid
# Invalid: Adjective + Object particle
"လှ ကို" → Invalid
Location Particles (မှာ, တွင်)
# Valid: Noun + Location particle
"ကျောင်း မှာ" → Valid
"မြို့ တွင်" → Valid
Verb Modifier Rules
Causative Construction
# Valid: V + causative marker
"စား စေ" → Valid (cause to eat)
# Pattern check
if followed_by(V, CAUS) and not is_compatible(V, CAUS):
error("Verb cannot take causative marker")
Passive Construction
# Valid: V + passive marker (ခံ = undergo/receive)
"ရိုက် ခံ" → Valid (was hit)
"ဆူ ခံ" → Valid (was scolded)
Sentence Structure Rules
# Rule 1: Sentence must end with P_SENT or PUNCT
if not ends_with(sentence, [P_SENT, PUNCT, P_Q]):
warning("Sentence may be incomplete")
# Rule 3: Missing subject marker after initial noun (3+ word sentences only)
# 2-word "Noun Verb" is valid minimal SOV — rule only fires for longer sentences
if len(words) >= 3 and is_noun(words[0]) and is_verb(words[1]):
suggest(f"{words[0]}က") # Suggest adding subject marker
# Rule 4: Question sentences should end with P_Q
if has_question_word(sentence) and not ends_with(sentence, P_Q):
warning("Question should end with question particle")
Configuration
Enable Grammar Checking
from myspellchecker import SpellChecker
from myspellchecker.core.config import SpellCheckerConfig, POSTaggerConfig
from myspellchecker.providers import SQLiteProvider
config = SpellCheckerConfig(
use_rule_based_validation=True,
pos_tagger=POSTaggerConfig(tagger_type="viterbi"),
)
provider = SQLiteProvider(database_path="path/to/dictionary.db")
checker = SpellChecker(config=config, provider=provider)
Grammar Rule Configuration
Grammar checking is configured through GrammarEngineConfig:
from myspellchecker import SpellChecker
from myspellchecker.core.config import SpellCheckerConfig
from myspellchecker.providers import SQLiteProvider
# Configure grammar checking via SpellCheckerConfig
config = SpellCheckerConfig(
use_rule_based_validation=True,
# Grammar engine is automatically initialized
)
provider = SQLiteProvider(database_path="path/to/dictionary.db")
checker = SpellChecker(config=config, provider=provider)
For advanced use, access the internal SyntacticRuleChecker:
from myspellchecker.grammar.engine import SyntacticRuleChecker
from myspellchecker.core.config.grammar_configs import GrammarEngineConfig
# Create a custom grammar checker with confidence thresholds
grammar_config = GrammarEngineConfig(
default_confidence_threshold=0.80,
high_confidence=0.90,
medium_confidence=0.85,
)
rule_checker = SyntacticRuleChecker(provider, grammar_config=grammar_config)
Built-in Grammar Checkers
The grammar system includes several specialized checkers:
| Checker | Description |
|---|
AspectChecker | Validates verb aspect markers |
ClassifierChecker | Validates numeral classifiers |
CompoundChecker | Validates compound word patterns |
MergedWordChecker | Detects incorrectly merged particle+verb sequences |
NegationChecker | Validates negation patterns |
RegisterChecker | Detects formal/informal register mixing |
YAML Rules Configuration
Grammar rules are defined in YAML files located in src/myspellchecker/rules/:
| File | Purpose |
|---|
grammar_rules.yaml | Core syntactic grammar rules |
typo_corrections.yaml | Common typo patterns and corrections |
particles.yaml | Particle definitions with POS tags |
classifiers.yaml | Numeral classifier rules |
register.yaml | Formal/informal register rules |
compounds.yaml | Compound word patterns |
aspects.yaml | Verb aspect rules |
negation.yaml | Negation pattern rules |
pronouns.yaml | Pronoun definitions |
homophones.yaml | Homophone confusion pairs |
pos_inference.yaml | POS inference patterns |
ambiguous_words.yaml | Ambiguous word disambiguation |
tone_rules.yaml | Tone mark validation rules |
morphology.yaml | Morphological patterns |
morphotactics.yaml | Morphotactic constraints |
Load custom rules via GrammarRuleConfig:
from myspellchecker.grammar.config import GrammarRuleConfig
# Load custom rules from your own YAML files
config = GrammarRuleConfig(
config_path="/path/to/custom_grammar_rules.yaml",
typo_path="/path/to/custom_typo_corrections.yaml",
particles_path="/path/to/particles.yaml",
pronouns_path="/path/to/pronouns.yaml",
classifiers_path="/path/to/classifiers.yaml",
register_path="/path/to/register.yaml",
homophones_path="/path/to/homophones.yaml",
compounds_path="/path/to/compounds.yaml",
aspects_path="/path/to/aspects.yaml",
pos_inference_path="/path/to/pos_inference.yaml",
ambiguous_words_path="/path/to/ambiguous_words.yaml",
tone_rules_path="/path/to/tone_rules.yaml",
negation_path="/path/to/negation.yaml",
morphology_path="/path/to/morphology.yaml",
)
GrammarRuleConfig does not have a morphotactics_path parameter. The morphotactics.yaml file is loaded internally by the grammar engine.
All paths are optional. When omitted, the built-in YAML files from src/myspellchecker/rules/ are used.
Error Types and Severity
| Severity | Description | Example |
|---|
error | Definite grammatical error | မသွားတယ် (“negation + affirmative ending”) → မသွားဘူး |
warning | Likely error, may be valid | ပြီနေ (“completed before progressive”), a contradictory aspect sequence |
info | Style suggestion | ငါသွားပါသည် (“colloquial pronoun + formal ending”), which is register mixing |
Error Response
result = checker.check("သွား က") # Verb with subject particle
for error in result.errors:
if error.error_type == "grammar_error":
print(f"Type: {error.error_type}")
print(f"Text: {error.text}")
print(f"Suggestions: {error.suggestions}")
print(f"Confidence: {error.confidence}")
API Reference
SyntacticRuleChecker
from myspellchecker.grammar.engine import SyntacticRuleChecker
from myspellchecker.core.config.grammar_configs import GrammarEngineConfig
# Create checker with provider (required) and optional config
grammar_config = GrammarEngineConfig()
checker = SyntacticRuleChecker(provider, grammar_config=grammar_config)
# Check word sequence for grammar errors (POS tags are looked up internally)
words = ["သူ", "ကျောင်း", "သွား", "သည်"]
corrections = checker.check_sequence(words)
for idx, error_word, suggestion in corrections:
print(f"Position {idx}: '{error_word}' → '{suggestion}'")
Individual Grammar Checkers
from myspellchecker.grammar.checkers import AspectChecker
# Each checker returns its own specific error type:
# AspectChecker.validate_sequence(words) -> list[AspectError]
# ClassifierChecker.validate_sequence(words) -> list[ClassifierError]
# CompoundChecker.validate_sequence(words) -> list[CompoundError]
# MergedWordChecker.validate_sequence(words) -> list[MergedWordError]
# NegationChecker.validate_sequence(words) -> list[NegationError]
# RegisterChecker.validate_sequence(words) -> list[RegisterError]
aspect_checker = AspectChecker()
errors = aspect_checker.validate_sequence(words)
Common Patterns
Filter by Confidence
def check_with_confidence(checker: SpellChecker, text: str, min_confidence: float = 0.70) -> list:
"""Check grammar, filtering by confidence threshold."""
result = checker.check(text)
return [
e for e in result.errors
if e.error_type == "grammar_error"
and e.confidence >= min_confidence
]
Grammar-Only Check
def check_grammar_only(text: str, checker: SpellChecker) -> list:
"""Check only grammar, skip spelling."""
# Grammar checking is integrated - filter grammar errors from result
result = checker.check(text)
return [e for e in result.errors if e.error_type == "grammar_error"]
Report Grammar Issues
def generate_grammar_report(checker: SpellChecker, text: str) -> dict:
"""Generate detailed grammar report."""
result = checker.check(text)
grammar_errors = [e for e in result.errors if e.error_type == "grammar_error"]
return {
"total_errors": len(grammar_errors),
"by_confidence": {
"high": len([e for e in grammar_errors if e.confidence >= 0.90]),
"medium": len([e for e in grammar_errors if 0.70 <= e.confidence < 0.90]),
"low": len([e for e in grammar_errors if e.confidence < 0.70]),
},
"details": [
{
"text": e.text,
"error_type": e.error_type,
"suggestions": e.suggestions,
"confidence": e.confidence,
"position": e.position,
}
for e in grammar_errors
],
}
Built-in Rules
POS Sequence Rules
Errors
| Pattern | Description | Confidence |
|---|
| P_SENT P_SENT | Double sentence ending particles | 0.98 |
| P_PAST P_FUT | Conflicting tense markers | 0.98 |
| V P_NEG | Negation prefix after verb (wrong order; should be မ + V) | 0.95 |
| P_POSS P_SUBJ | Possessive + subject adjacent (noun missing between them) | 0.95 |
| P_POSS P_OBJ | Possessive + object adjacent (noun missing between them) | 0.95 |
Warnings
| Pattern | Description | Confidence |
|---|
| P_SUBJ P_OBJ | Subject + object markers adjacent | 0.90 |
| P_OBJ P_SUBJ | Object + subject markers adjacent | 0.90 |
| P_LOC P_LOC | Multiple location particles | 0.85 |
| P V | Particle directly precedes verb | 0.85 |
| PPM V | Postpositional marker before verb | 0.75 |
| NUM N | Number before noun without classifier | 0.75 |
| NUM V | Number before verb without classifier | 0.75 |
| N V | Noun + verb without case particle | 0.75 |
| ADJ V | Adjective before verb without noun | 0.65 |
Info
| Pattern | Description | Confidence |
|---|
| V V | Consecutive verbs (may be serial verb construction) | 0.50 |
| P P | Consecutive particles (check compatibility) | 0.50 |
| PPM PPM | Consecutive postpositional markers | 0.50 |
| PART PART | Consecutive particles | 0.50 |
| N N | Consecutive nouns (may be compound noun) | 0.40 |
| PART V | Particle before verb (may be auxiliary) | 0.40 |
| PPM N | Postpositional marker before noun (may be embedded clause) | 0.40 |
| N INT | Noun + interjection (vocative/exclamation) | 0.40 |
| INT INT | Consecutive interjections (emphatic) | 0.30 |
| INT P | Interjection + particle | 0.20 |
Sentence Boundary Rules
Forbidden Sentence Starts
| Particles | Type | Severity | Confidence |
|---|
ကို, ရဲ့, အတွက် | Object / possessive / benefactive | error | 0.90 |
မှ, ကနေ, လို့, ဆိုလို့ | Source / causative | error | 0.85 |
လည်း, တော့, ပဲ | Conjunctive | warning | 0.70 |
Forbidden Sentence Ends
| Particles | Type | Severity | Confidence |
|---|
က, ကို, နှင့်, နဲ့ | Case markers | error | 0.90 |
ရဲ့, ၏, မှ, ကနေ | Possessive / source | error | 0.90 |
အတွက်, အလို့ငှာ | Benefactive | warning | 0.75 |
Sentence Completion Rules
| Pattern | Required Ending | Severity | Confidence |
|---|
| V$ (verb at sentence end) | တယ်, ပါတယ်, သည်, ပါသည်, မယ်, ပါမယ်, ပြီ | warning | 0.80 |
| N V (noun before verb) | က, ကို, မှာ, မှ, သည် (case particle between them) | warning | 0.75 |
Architecture
The Grammar Engine coordinates eight specialized checkers through SyntacticRuleChecker:
Each checker handles a specific grammar domain. See Grammar Checkers for details on each one.
Confidence Scoring
Grammar suggestions include confidence scores based on context:
| Factor | Weight | Description |
|---|
| Exact match | 0.95 | Exact pattern match |
| Verb context | 0.90 | After verb validation |
| Noun context | 0.85 | After noun validation |
| Default | 0.80 | No specific context |
| Context dependent | 0.65 | Ambiguous context |
Confidence scores are included in the GrammarError objects returned by check():
result = checker.check("ကျောင်း သွား က")
for error in result.errors:
if error.error_type == "grammar_error":
print(f"Confidence: {error.confidence}") # e.g. 0.95
Troubleshooting
Issue: Too many false positives
Cause: POS tagging errors or overly strict rules
Solution:
# Use more accurate tagger
config = SpellCheckerConfig(
pos_tagger=POSTaggerConfig(tagger_type="transformer")
)
provider = SQLiteProvider(database_path="path/to/dictionary.db")
checker = SpellChecker(config=config, provider=provider)
Issue: Missing grammar errors
Cause: Grammar checkers not enabled
Solution: Enable grammar checking in config:
config = SpellCheckerConfig(
use_rule_based_validation=True,
)
provider = SQLiteProvider(database_path="path/to/dictionary.db")
checker = SpellChecker(config=config, provider=provider)
Issue: Slow grammar checking
Cause: Complex rules or many patterns
Solution: Raise confidence thresholds to reduce processing via GrammarEngineConfig:
from myspellchecker.core.config.grammar_configs import GrammarEngineConfig
# Higher thresholds mean fewer (but higher-confidence) grammar checks
grammar_config = GrammarEngineConfig(
default_confidence_threshold=0.85, # Only report high-confidence errors
)
Next Steps