Grammar Checkers - mySpellChecker

mySpellChecker includes eight specialized grammar checkers that target common Myanmar grammatical errors. Each checker focuses on a specific grammar domain:

Checker	Purpose	Error Types
AspectChecker	Verb aspect markers	Typos, invalid sequences
ClassifierChecker	Numeral classifiers	Typos, agreement errors
CompoundChecker	Compound words	Typos, malformed compounds
MergedWordChecker	Merged word detection	Segmenter merge errors
NegationChecker	Negation patterns	Typos, missing endings
ParticleChecker	Particle context validation	Particle misuse
TenseAgreementChecker	Tense-time agreement	Tense mismatch
RegisterChecker	3-way register detection	Mixed register usage

AspectChecker

Validates Myanmar verb aspect markers that modify verbs to express temporal, modal, and aspectual meanings.

Aspect Categories

Category	Markers	Meaning	Example
Completion	ပြီ, ပြီး	Action completed	သွားပြီ (went)
Progressive	နေ	Ongoing action	စားနေ (eating)
Habitual	တတ်	Habitual action	စားတတ် (eats habitually)
Resultative	ထား	Maintained state	ရေးထား (have written)
Directional	လာ, သွား	Motion direction	ပြန်လာ (come back)
Desiderative	ချင်	Desire/want	လာချင် (want to come)
Potential	နိုင်, ရ	Ability/possibility	လုပ်နိုင် (can do)
Immediate	လိုက်	Following action	လိုက်သွား (follow and go)
Experiential	ဖူး	Past experience	ရေးဖူး (have written before)

Usage

from myspellchecker.grammar.checkers.aspect import AspectChecker

checker = AspectChecker()

# Check if word is an aspect marker
checker.is_aspect_marker("ပြီ")  # True
checker.is_aspect_marker("စား")  # False

# Check for typos
checker.is_aspect_typo("ပရီ")  # True (typo for ပြီ)
correction = checker.get_typo_correction("ပရီ")  # "ပြီ"

# Get detailed aspect info
info = checker.get_aspect_info("ပြီ")
print(info.category)     # "completion"
print(info.description)  # "Action completed"
print(info.is_final)     # True (typically at phrase end)

# Validate aspect sequences
errors = checker.validate_sequence(["စား", "ပြီး", "သွား"])
for error in errors:
    print(f"{error.text}: {error.reason}")

ClassifierChecker

Validates Myanmar numeral + classifier patterns. Myanmar uses numeral classifiers similar to Chinese/Japanese.

Pattern: Numeral + Classifier + Noun

Numeral	Classifier	Noun	Meaning
သုံး	ယောက်	(person)	3 people
ငါး	ကောင်	(animal)	5 animals
နှစ်	အုပ်	(book)	2 books
တစ်	လုံး	(round object)	1 (round object)

Usage

from myspellchecker.grammar.checkers.classifier import (
    ClassifierChecker,
    get_classifier_checker,
    is_classifier,
    is_numeral
)

checker = ClassifierChecker()

# Check if word is a numeral
is_numeral("သုံး")  # True (three)
is_numeral("၃")     # True (digit 3)

# Check if word is a classifier
is_classifier("ယောက်")  # True (classifier for people)
is_classifier("လူ")     # False (just a noun)

# Get classifier category
category = checker.get_classifier_category("ကောင်")  # "animals"

# Check for classifier typos
typo_result = checker.check_classifier_typo("ယေက်")
if typo_result:
    correction, confidence = typo_result
    print(f"Correction: {correction}")  # "ယောက်"

# Validate classifier usage
errors = checker.validate_sequence(["သုံး", "ယေက်", "ရှိ"])
for error in errors:
    print(f"{error.word} → {error.suggestion}")

Classifier-Noun Agreement

# Get compatible classifiers for a noun
classifiers = checker.get_compatible_classifiers("ခွေး")  # ["ကောင်"]

# Check classifier-noun agreement
error = checker.check_agreement(classifier="ယောက်", noun="ခွေး")
if error:
    print(error.reason)  # ခွေး (dog) should use ကောင် not ယောက်

CompoundChecker

Detects and validates Myanmar compound word formations.

Compound Types

Type	Pattern	Example	Result
Noun-Noun	N + N	ပန်း + ခြံ	ပန်းခြံ (flower garden)
Verb-Verb	V + V	စား + သောက်	စားသောက် (dine)
Reduplication	X + X	ဖြေး →	ဖြေးဖြေး (slowly)
Affixed	Prefix + Root	အ + လုပ်	အလုပ် (work)

Usage

from myspellchecker.grammar.checkers.compound import (
    CompoundChecker,
    get_compound_checker,
    is_compound,
    is_reduplication,
)

checker = CompoundChecker()

# Check if word is a recognized compound
is_compound("ပန်းခြံ")  # True

# Check for reduplication
is_reduplication("ဖြေးဖြေး")  # True
base = checker.get_reduplication_base("ဖြေးဖြေး")  # "ဖြေး"

# Detect compound pattern
info = checker.detect_compound_pattern("အလုပ်")
if info:
    print(info.compound_type)  # "affixed"
    print(info.components)     # ["အ", "လုပ်"]
    print(info.pattern)        # "PREFIX(nominalization) + STEM"

Analyze Compounds

# Comprehensive compound analysis
result = checker.analyze_word("ပန်းခြံ")
print(result["is_compound"])      # True
print(result["components"])       # ["ပန်း", "ခြံ"]
print(result["has_prefix"])       # False
print(result["is_reduplication"]) # False
print(result["confidence"])       # 0.95

MergedWordChecker

Detects words that the segmenter may have incorrectly merged from a particle + verb sequence into a single compound word.

Problem

Myanmar word segmenters sometimes merge adjacent tokens when the concatenation forms a valid dictionary word:

Input	Intended	Segmented	Issue
သူက စားသောကြောင့်	သူ + က + စား + သောကြောင့်	သူ + ကစား + သောကြောင့်	”က” + “စား” merged to “ကစား” (play)

Detection Strategy

A merged word is flagged ONLY when ALL conditions hold:

The word is in the known ambiguous-merge set (e.g., “ကစား”)
The preceding word is a NOUN or PRONOUN (POS: N, PRON)
The following word is a clause-linking particle or verb-final marker

This three-way evidence requirement prevents false positives on legitimate uses.

Configuration

The checker uses a conservative confidence of 0.80 since this is a heuristic that cannot be 100% certain without semantic understanding.

from myspellchecker.grammar.checkers.merged_word import MergedWordChecker

checker = MergedWordChecker()
errors = checker.validate_sequence(words, pos_tags)

NegationChecker

Validates Myanmar negation patterns. Myanmar negation follows specific structures.

Negation Patterns

Pattern	Structure	Example	Meaning
Standard	မ + verb + ဘူး	မသွားဘူး	don’t go
Polite	မ + verb + ပါဘူး	မသွားပါဘူး	politely don’t go
Prohibition	မ + verb + နဲ့	မလုပ်နဲ့	Don’t do!
Formal	မ + verb + ပါ	မရှိပါ	doesn’t exist (formal)

Usage

from myspellchecker.grammar.checkers.negation import (
    NegationChecker,
    get_negation_checker,
    is_negative_ending,
)

checker = NegationChecker()

# Check for negation prefix
checker.starts_with_negation("မသွား")  # True
checker.starts_with_negation("သွား")   # False

# Check negative endings
is_negative_ending("ဘူး")  # True
is_negative_ending("တယ်")  # False

# Check for ending typos
typo_result = checker.check_ending_typo("ဘူ")
if typo_result:
    correction, confidence = typo_result
    print(f"Correction: {correction}")  # "ဘူး"

# Validate negation patterns
errors = checker.validate_sequence(["မ", "သွား", "ဘူ"])
for error in errors:
    print(f"{error.word} → {error.suggestion}")

Detect Negation Patterns

# Detect negation pattern starting at a given index
pattern = checker.detect_negation_pattern(["မ", "သွား", "ဘူး"], 0)
if pattern:
    print(pattern.pattern_type)  # "standard_negative"
    print(pattern.verb)          # "သွား"
    print(pattern.ending)        # "ဘူး"
    print(pattern.register)      # "colloquial"

ParticleChecker

Validates Myanmar particle usage given verb and noun context. Myanmar particles (postpositions) must agree with the verb type and syntactic role of surrounding words.

Common Misuse Patterns

Pattern	Incorrect	Correct	Explanation
Motion verb + static locative	ကျောင်းမှာ သွားတယ်	ကျောင်းကို သွားတယ်	Use ကို/သို့ with motion verbs
Sequential ပြီ where ပြီး needed	စားပြီ သွားတယ်	စားပြီး သွားတယ်	ပြီး links sequential actions
Negation + affirmative ending	မသွားတယ်	မသွားဘူး	Negated sentences need ဘူး

Features

Particle confusion pair detection from YAML rules (particle_contexts.yaml)
Verb-particle frame checking — validates verb+particle compatibility
POS-tag-aware validation with fallback heuristics when tags unavailable
Configurable confidence thresholds via ParticleCheckerConfig

Usage

from myspellchecker.grammar.checkers.particle import ParticleChecker

checker = ParticleChecker()

# Validate particle usage in a word sequence
errors = checker.validate_sequence(
    words=["ကျောင်း", "ကို", "ရှိ", "တယ်"],
    pos_tags=["N", "PPM", "V", "SFP"]  # Optional POS tags
)

for error in errors:
    print(f"{error.text}: {error.reason}")
    print(f"  Suggestion: {error.suggestions}")
    print(f"  Confidence: {error.confidence}")

Singleton Access

from myspellchecker.grammar.checkers.particle import get_particle_checker

# Thread-safe singleton (loads YAML once)
checker = get_particle_checker()

YAML Configuration

Particle rules are defined in rules/particle_contexts.yaml:

particle_confusions:
  - particle: "ကို"
    confused_with: "မှာ"
    context: "static_location"
    description: "Static locative used with motion verb"
    confidence: 0.70

verb_particle_frames:
  - verbs: ["သွား", "လာ", "ပြန်"]
    required_particles: ["ကို", "သို့"]
    incompatible_particles: ["မှာ", "တွင်"]
    note: "Motion verbs require directional particles"

TenseAgreementChecker

Validates that aspectual particles (sentence-final markers) agree with temporal adverbials in Myanmar sentences. When a temporal adverb indicates a specific tense, the sentence-final particle must match.

Examples

Status	Sentence	Explanation
Correct	မနေ့က သွားခဲ့တယ်	yesterday + past marker
Incorrect	မနေ့က သွားမယ်	yesterday + future marker
Correct	မနက်ဖြန် သွားမယ်	tomorrow + future marker
Incorrect	မနက်ဖြန် သွားခဲ့တယ်	tomorrow + past marker

Usage

from myspellchecker.grammar.checkers.tense_agreement import TenseAgreementChecker

checker = TenseAgreementChecker()

# Validate tense-time agreement
errors = checker.validate_sequence(["မနေ့က", "ကျောင်း", "သွား", "မယ်"])

for error in errors:
    print(f"{error.text}: {error.reason}")
    print(f"  Time adverb: {error.time_adverb}")
    print(f"  Detected tense: {error.detected_tense}")
    print(f"  Suggestion: {error.suggestions}")

Checking Individual Words

# Check if a word is a temporal adverb
checker.is_time_adverb("မနေ့က")  # True
checker.get_adverb_tense("မနေ့က")  # "past"

# Check if a word is an aspect marker
checker.is_aspect_marker("မယ်")  # True
checker.get_marker_tense("မယ်")  # "future"

YAML Configuration

Tense rules are defined in rules/tense_markers.yaml:

tense_agreement_rules:
  past_time_adverbs: ["မနေ့က", "တုန်းက", "အရင်က"]
  future_time_adverbs: ["မနက်ဖြန်", "နောက်နှစ်", "လာမယ့်"]
  past_aspect_markers: ["ခဲ့တယ်", "ခဲ့သည်"]
  future_aspect_markers: ["မယ်", "မည်"]
  incompatible_pairs:
    - time_class: past
      incompatible_aspects: ["မယ်", "မည်"]
      confidence: 0.80

Configuration

from myspellchecker.core.config import TenseAgreementCheckerConfig

config = TenseAgreementCheckerConfig(
    default_confidence=0.75,  # Default confidence for tense mismatches
    high_confidence=0.85,     # When both adverb and marker are unambiguous
)
checker = TenseAgreementChecker(checker_config=config)

RegisterChecker

Validates register consistency across three tiers: formal, polite, and colloquial. Myanmar has distinct register markers at the sentence-final position, and mixing registers within a sentence is a stylistic error.

Three-Tier Register System

Register	Sentence-Final Particles	Pronouns	Use Context
Formal	သည်, ၏	သူသည်	Written prose, news, official documents
Polite	ပါတယ်, ပါမယ်	—	Respectful speech, customer service
Colloquial	တယ်, မယ်	သူ, ငါ	Casual conversation, informal writing

Mixing Severity

Combination	Severity	Confidence
Formal + Colloquial	High (strong mismatch)	0.85
Formal + Polite	Low (formality gap)	0.65
Polite + Colloquial	Medium	0.75

Usage

from myspellchecker.grammar.checkers.register import RegisterChecker

checker = RegisterChecker()

# Get register of a word
info = checker.get_register("သည်")
print(info.register)         # "formal"

info = checker.get_register("ပါတယ်")
print(info.register)         # "polite"

info = checker.get_register("တယ်")
print(info.register)         # "colloquial"

# Check register type
checker.is_formal("သည်")      # True
checker.is_colloquial("တယ်")  # True
checker.is_neutral("စာအုပ်")  # True

Detect Sentence Register

# Detect predominant register (now returns 3-way classification)
register, consistency, infos = checker.detect_sentence_register(
    ["သူ", "သည်", "စာအုပ်", "ဖတ်", "တယ်"]
)
print(register)     # "mixed"
print(consistency)  # 0.5 (50% consistent)

Validate Register Consistency

# Check for mixed register errors
errors = checker.validate_sequence(["သူ", "သည်", "စာအုပ်", "ဖတ်", "တယ်"])
for error in errors:
    print(f"{error.text}: {error.reason}")
    print(f"  Detected: {error.detected_register}")
    print(f"  Expected: {error.expected_register}")
    print(f"  Suggestion: {error.suggestion}")

Configuration

from myspellchecker.core.config import RegisterCheckerConfig

config = RegisterCheckerConfig(
    register_mismatch_confidence=0.85,        # Formal + colloquial mixing
    register_formality_gap_confidence=0.65,    # Formal + polite mixing
)
checker = RegisterChecker(register_config=config)

Integration with SpellChecker

All grammar checkers are automatically used when grammar checking is enabled:

from myspellchecker import SpellChecker
from myspellchecker.core.config import SpellCheckerConfig
from myspellchecker.providers import SQLiteProvider

config = SpellCheckerConfig(
    use_rule_based_validation=True  # Enable all grammar checkers
)

provider = SQLiteProvider(database_path="path/to/dictionary.db")
checker = SpellChecker(config=config, provider=provider)
result = checker.check("သူသည် စာအုပ် ဖတ်တယ်။")

# Grammar errors include all checker types
for error in result.errors:
    if hasattr(error, 'error_type'):
        print(f"Type: {error.error_type}")  # aspect_error, register_error, etc.
        print(f"Word: {error.text}")
        print(f"Reason: {error.reason}")

Error Types Summary

Checker	Error Types
AspectChecker	`aspect_typo`, `invalid_sequence`, `incomplete_aspect`
ClassifierChecker	`typo`, `agreement`, `missing`, `invalid_pattern`
CompoundChecker	`compound_typo`, `invalid_compound`, `incomplete_reduplication`
MergedWordChecker	`merged_word`
NegationChecker	`typo`, `missing_ending`, `invalid_pattern`
ParticleChecker	`particle_misuse`
TenseAgreementChecker	`tense_mismatch`
RegisterChecker	`register_error`

​AspectChecker

​Aspect Categories

​Usage

​ClassifierChecker

​Pattern: Numeral + Classifier + Noun

​Usage

​Classifier-Noun Agreement

​CompoundChecker

​Compound Types

​Usage

​Analyze Compounds

​MergedWordChecker

​Problem

​Detection Strategy

​Configuration

​NegationChecker

​Negation Patterns

​Usage

​Detect Negation Patterns

​ParticleChecker

​Common Misuse Patterns

​Features

​Usage

​Singleton Access

​YAML Configuration

​TenseAgreementChecker

​Examples

​Usage

​Checking Individual Words

​YAML Configuration

​Configuration

​RegisterChecker

​Three-Tier Register System

​Mixing Severity

​Usage

​Detect Sentence Register

​Validate Register Consistency

​Configuration

​Integration with SpellChecker

​Error Types Summary

​See Also

AspectChecker

Aspect Categories

Usage

ClassifierChecker

Pattern: Numeral + Classifier + Noun

Usage

Classifier-Noun Agreement

CompoundChecker

Compound Types

Usage

Analyze Compounds

MergedWordChecker

Problem

Detection Strategy

Configuration

NegationChecker

Negation Patterns

Usage

Detect Negation Patterns

ParticleChecker

Common Misuse Patterns

Features

Usage

Singleton Access

YAML Configuration

TenseAgreementChecker

Examples

Usage

Checking Individual Words

YAML Configuration

Configuration

RegisterChecker

Three-Tier Register System

Mixing Severity

Usage

Detect Sentence Register

Validate Register Consistency

Configuration

Integration with SpellChecker

Error Types Summary

See Also