Skip to main content
mySpellChecker provides a comprehensive suite of features for Myanmar text validation, from basic syllable checking to advanced AI-powered context analysis.

Feature Matrix

Note: Speed ratings are relative comparisons, not validated benchmarks. Actual performance depends on your dictionary size, hardware, and configuration.
FeatureDescriptionSpeedOptional
Syllable ValidationRule-based syllable structure checkingVery FastNo
Word ValidationDictionary lookup with SymSpell suggestionsFastNo
Context CheckingN-gram based context validationModerateYes
Grammar CheckingPOS-based syntactic validationFastYes
Semantic CheckingAI-powered deep context analysisSlowYes
NERNamed entity recognitionVariesYes
MorphologyWord structure analysisVery FastYes
Morphological SynthesisCompound/reduplication validationVery FastYes
Grammar CheckersAspect/Classifier/Compound/MergedWord/Negation/RegisterFastYes
Validation StrategiesComposable validation pipeline (9 strategies)VariesYes
NormalizationUnified text normalization serviceVery FastNo
Batch ProcessingParallel multi-text processingVariesNo
Async APINon-blocking async operations-No
Streaming APIMemory-efficient large file processingVariesNo
SegmentersSyllable/word/sentence segmentationVery FastNo
Suggestion RankingMulti-factor suggestion scoringVery FastNo
Connection PoolThread-safe connection management-No
HomophonesSound-alike word detectionFastYes
Colloquial VariantsInformal/formal spelling detectionVery FastYes
i18n (Localization)Error messages in English/MyanmarVery FastNo

Core Features

Syllable Validation

The foundation of mySpellChecker. Validates Myanmar syllable structure using orthographic rules and dictionary lookup. Key capabilities:
  • Rule-based syllable structure validation
  • Consonant-medial-vowel pattern checking
  • Dictionary syllable lookup
  • O(1) validation performance
# Syllable validation catches ~90% of typos immediately
result = checker.check("မြန်မာ")  # Valid syllables

Word Validation

Validates complete words using dictionary lookup and the SymSpell algorithm for efficient suggestion generation. Key capabilities:
  • Dictionary word lookup
  • SymSpell O(1) suggestions
  • Edit distance calculation
  • Compound word handling
from myspellchecker import SpellChecker
from myspellchecker.providers import SQLiteProvider
from myspellchecker.core.constants import ValidationLevel

# Get word-level suggestions (level specified per-check)
provider = SQLiteProvider(database_path="path/to/dictionary.db")
checker = SpellChecker(provider=provider)
result = checker.check(text, level=ValidationLevel.WORD)

Context Checking

Detects “real-word errors” where a word is spelled correctly but used incorrectly in context. Key capabilities:
  • Bigram probability analysis
  • Trigram context windows
  • Statistical language modeling
  • Real-word error detection
# Detects unnatural word combinations (e.g., "rice go" vs "eat rice")
config = SpellCheckerConfig(use_context_checker=True)

Advanced Features

POS Tagging

Part-of-Speech tagging with multiple backend options for different accuracy/speed trade-offs. Tagger options:
TypeAccuracySpeedDependencies
Rule-based~70%FastNone
Viterbi~85%MediumNone
Transformer~93%Slowtransformers, torch
from myspellchecker.core.config import POSTaggerConfig

config = SpellCheckerConfig(
    pos_tagger=POSTaggerConfig(tagger_type="transformer")
)

Grammar Checking

Rule-based syntactic validation using POS tags to detect grammatical errors. Key capabilities:
  • Particle usage validation
  • Verb-modifier agreement
  • Sentence structure checking
  • Custom grammar rule support
# Detects particle errors like မှာ vs မှ
config = SpellCheckerConfig(use_rule_based_validation=True)

Grammar Engine

Comprehensive syntactic rule checker coordinating six specialized checkers. Key capabilities:
  • Particle typo detection
  • Medial confusion detection (ျ vs ြ)
  • POS sequence validation
  • Verb-particle agreement
  • Configurable confidence thresholds
from myspellchecker.grammar import SyntacticRuleChecker

checker = SyntacticRuleChecker(provider)
corrections = checker.check_sequence(["ကျွန်တော်", "ကျောင်း", "သွားတယ်"])

Semantic Checking

Deep learning-based context analysis using ONNX models for the highest accuracy. Key capabilities:
  • BERT/RoBERTa masked language modeling
  • Semantic context understanding
  • Confidence scoring
  • Quantized CPU inference
# Enable AI-powered checking
config = SpellCheckerConfig(
    semantic=SemanticConfig(
        model_path="path/to/model.onnx",
        tokenizer_path="path/to/tokenizer"
    )
)

Performance Features

Batch Processing

Efficient processing of multiple texts with parallelization. Key capabilities:
  • Cython-optimized processing
  • OpenMP parallelization
  • Batch result aggregation
  • Memory-efficient streaming
# Process thousands of texts efficiently
results = checker.check_batch(texts)

Async API

Non-blocking async operations for web applications. Key capabilities:
  • Native async/await support
  • FastAPI/Starlette integration
  • Concurrent request handling
  • Async batch processing
# Non-blocking spell checking
result = await checker.check_async(text)
results = await checker.check_batch_async(texts)

Integration Features

Connection Pool

Thread-safe database connection management for high-concurrency scenarios. Key capabilities:
  • Configurable min/max pool size
  • Automatic connection health checks
  • Connection aging and recreation
  • Pool statistics and monitoring
from myspellchecker.providers.connection_pool import ConnectionPool

pool = ConnectionPool("/path/to/db.sqlite", min_size=2, max_size=10)
with pool.checkout() as conn:
    cursor = conn.cursor()

Segmenters

Multiple text segmentation strategies for Myanmar text. Segmenter types:
TypeDescriptionUse Case
DefaultSegmenterProduction segmenterGeneral use
RegexSegmenterRule-based syllablesLightweight
from myspellchecker.segmenters import DefaultSegmenter

segmenter = DefaultSegmenter(word_engine="myword")
syllables = segmenter.segment_syllables("မြန်မာစာ")

Homophones Detection

Detects sound-alike words that may be confused in context.
from myspellchecker.core.homophones import HomophoneChecker

checker = HomophoneChecker()
homophones = checker.get_homophones("ကျား")  # Returns set of homophones
has_match = checker.has_homophone("ကြား")     # Returns True if homophones exist

Colloquial Variant Handling

Detects colloquial (informal) spellings and suggests standard forms. Key capabilities:
  • Colloquial form detection
  • Standard form suggestion
  • Configurable strictness levels
from myspellchecker.text.phonetic_data import is_colloquial_variant, get_standard_forms

# Check if word is colloquial
is_colloquial_variant("ကျနော်")  # True

# Get standard form
get_standard_forms("ကျနော်")  # {"ကျွန်တော်"}
Configuration:
from myspellchecker.core.config.validation_configs import ValidationConfig

config = ValidationConfig(
    colloquial_strictness="lenient",  # "strict", "lenient", or "off"
    colloquial_info_confidence=0.3,
)
StrictnessBehavior
strictFlag all colloquial variants as errors
lenientAccept with informational note (default)
offNo special handling

Internationalization (i18n)

Localized error messages in English and Myanmar.
from myspellchecker.core.i18n import set_language, get_message

# Set language to Myanmar
set_language("my")

# Get localized message
get_message("invalid_syllable")
# Output: စာလုံးပေါင်း မမှန်ကန်ပါ
Supported languages: "en" (English), "my" (Myanmar)

Streaming API

Memory-efficient stream processing for large documents with progress callbacks. Key capabilities:
  • Generator-based synchronous streaming
  • Async iteration support
  • Progress callbacks and statistics
  • Memory limits with backpressure
  • Cross-sentence context validation
from myspellchecker.core.streaming import StreamingChecker

streaming = StreamingChecker(checker)
with open("large_file.txt") as f:
    for result in streaming.check_stream(f):
        if result.response.has_errors:
            process(result)

Custom Providers

Pluggable storage backends for different use cases.
from myspellchecker.providers import MemoryProvider, SQLiteProvider

# High-speed in-memory
checker = SpellChecker(provider=MemoryProvider())

# Disk-based for large dictionaries
checker = SpellChecker(provider=SQLiteProvider())

Feature Comparison by Use Case

Real-Time Typing

from myspellchecker import SpellChecker
from myspellchecker.providers import SQLiteProvider
from myspellchecker.core.constants import ValidationLevel

# Fastest: syllable-only validation (level specified per-check)
provider = SQLiteProvider(database_path="path/to/dictionary.db")
checker = SpellChecker(provider=provider)
result = checker.check(text, level=ValidationLevel.SYLLABLE)

Document Checking

from myspellchecker import SpellChecker
from myspellchecker.core.config import SpellCheckerConfig
from myspellchecker.providers import SQLiteProvider
from myspellchecker.core.constants import ValidationLevel

# Balanced: word + context
config = SpellCheckerConfig(
    use_context_checker=True,
    use_rule_based_validation=True,
)
provider = SQLiteProvider(database_path="path/to/dictionary.db")
checker = SpellChecker(config=config, provider=provider)
result = checker.check(text, level=ValidationLevel.WORD)

Quality Assurance

# Thorough: full validation with AI (requires SQLiteProvider from above)
config = SpellCheckerConfig(
    use_context_checker=True,
    semantic=SemanticConfig(model_path="..."),
)
provider = SQLiteProvider(database_path="path/to/dictionary.db")
checker = SpellChecker(config=config, provider=provider)
# Use word-level validation for thorough checking
result = checker.check(text, level=ValidationLevel.WORD, use_semantic=True)

High-Volume Processing

# Optimized for throughput
from myspellchecker import SpellChecker
from myspellchecker.providers import SQLiteProvider
from myspellchecker.core.config import SpellCheckerConfig

config = SpellCheckerConfig(use_context_checker=False)  # Faster
provider = SQLiteProvider(pool_max_size=10)
checker = SpellChecker(config=config, provider=provider)
results = checker.check_batch(texts)

Feature Dependencies

  +------------------------+
  | Syllable Validation    |  [core]
  +-----------+------------+
              |
              v
  +------------------------+
  | Word Validation        |  [core]
  +-----------+------------+
              |
         +----+----+
         |         |
         v         v
  +-------------+  +------------------+
  | Context     |  | Grammar          |  [advanced]
  | Checking    |  | Checking         |
  +------+------+  +--------+---------+
         |                  |
         v                  v
  +-------------+  +------------------+
  | Semantic    |  | POS Tagging      |  [advanced]
  | Checking    |  |                  |
  +-------------+  +------------------+
    [ai]
Legend:
  • Green: Core features (always available)
  • Blue: Advanced features (optional)
  • Purple: AI features (requires extra dependencies)

Text Processing Features

Named Entity Recognition

Identifies names, locations, and organizations to reduce false positives. Key capabilities:
  • Heuristic-based NER (fast, ~70% accuracy)
  • Transformer-based NER (~93% accuracy)
  • Hybrid mode with automatic fallback
  • Entity filtering for spell checking
from myspellchecker.text.ner_model import NERConfig

config = SpellCheckerConfig(
    ner=NERConfig(enabled=True, model_type="heuristic")
)

Morphology Analysis

Word structure analysis for POS inference and OOV recovery. Key capabilities:
  • Suffix-based POS guessing
  • Word decomposition (root + suffixes)
  • Multi-POS support for ambiguous words
  • Numeral detection
  • Productive reduplication validation (AA, AABB, ABAB patterns)
  • Compound word synthesis (DP-based splitting into known morphemes)
  • Morpheme-level suggestions (correct typos inside compounds)
from myspellchecker.text.morphology import MorphologyAnalyzer
from myspellchecker.text.reduplication import ReduplicationEngine
from myspellchecker.text.compound_resolver import CompoundResolver

# OOV analysis (existing)
analyzer = MorphologyAnalyzer()
result = analyzer.analyze_word("စားခဲ့သည်")
print(result.root)      # "စား"
print(result.suffixes)  # ["ခဲ့", "သည်"]

# Reduplication validation (NEW)
engine = ReduplicationEngine(segmenter=segmenter)
result = engine.analyze("ကောင်းကောင်း", dict_check, freq_check, pos_check)
# Valid AA reduplication of "ကောင်း"

# Compound synthesis (NEW)
resolver = CompoundResolver(segmenter=segmenter)
result = resolver.resolve("ကျောင်းသား", dict_check, freq_check, pos_check)
# Valid N+N compound: ["ကျောင်း", "သား"]

Text Utilities

Specialized utilities for Myanmar text processing. Key capabilities:
  • Stemmer: Rule-based suffix stripping with caching
  • Phonetic Hasher: Sound-based fuzzy matching
  • Tone Disambiguator: Context-based tone resolution
  • Zawgyi Detection: Legacy encoding detection
from myspellchecker.text.stemmer import Stemmer
from myspellchecker.text.phonetic import PhoneticHasher

stemmer = Stemmer()
hasher = PhoneticHasher()

Grammar Features

Suggestion Ranking

Multi-factor ranking system for spelling suggestions. Ranker types:
RankerPrimary FactorUse Case
DefaultRankerEdit distance + frequencyGeneral use
FrequencyFirstRankerCorpus frequencyAutocomplete
PhoneticFirstRankerPhonetic similarityMyanmar text
UnifiedRankerMulti-sourceComprehensive
from myspellchecker.algorithms.ranker import FrequencyFirstRanker

ranker = FrequencyFirstRanker()
symspell = SymSpell(provider, ranker=ranker)

Grammar Checkers

Six specialized checkers for Myanmar grammar validation.
CheckerPurpose
AspectCheckerVerb aspect markers
ClassifierCheckerNumeral classifiers
CompoundCheckerCompound words
MergedWordCheckerMerged particle+verb detection
NegationCheckerNegation patterns
RegisterCheckerFormal/colloquial register
from myspellchecker.grammar.checkers.aspect import AspectChecker
from myspellchecker.grammar.checkers.register import RegisterChecker

aspect_checker = AspectChecker()
register_checker = RegisterChecker()

Text Normalization

Unified normalization service for consistent text processing. Key capabilities:
  • Purpose-specific normalization methods
  • Zawgyi detection and conversion
  • Unicode NFC normalization
  • Myanmar diacritic reordering
from myspellchecker.text.normalization_service import get_normalization_service

service = get_normalization_service()
normalized = service.for_spell_checking(text)

Validation Strategies

Strategy-based validation pipeline for composable error detection. Strategies (by priority):
StrategyPriorityPurpose
ToneValidation10Tone mark disambiguation
Orthography15Orthographic error detection
SyntacticRule20Grammar rule checking
POSSequence30POS sequence validation
Question40Question structure
Homophone45Sound-alike detection
NgramContext50N-gram probability
ErrorDetection65AI token classification (ONNX)
Semantic70AI-powered validation

Architecture

Dependency Injection

Lightweight DI system for component management. Key components:
  • ServiceContainer for lazy initialization
  • Factory functions for component creation
  • Singleton and transient service support
  • Thread-safe service resolution

Reference

Rules System

YAML configuration files for linguistic rules. Key files:
  • particles.yaml - 91 linguistic particles
  • typo_corrections.yaml - Common typo patterns
  • morphology.yaml - Suffix/prefix patterns
  • morphotactics.yaml - Compound word POS pattern rules
  • aspects.yaml - Verb aspect markers
  • classifiers.yaml - Numeral classifiers
  • register.yaml - Formal/colloquial mappings

Guides

Configuration Guide

Comprehensive configuration options. Topics:
  • SpellCheckerConfig and nested configs
  • Pre-defined configuration profiles
  • Loading from files and environment

Logging Guide

Centralized logging system. Features:
  • Development and production modes
  • JSON structured logging
  • Module-specific log levels
  • LoggerMixin for classes

Training Features

Training Pipeline

End-to-end pipeline for training custom semantic models. Pipeline stages:
  1. Tokenizer Training (Byte-Level BPE)
  2. Model Training (RoBERTa/BERT MLM)
  3. ONNX Export (quantized)
from myspellchecker.training import TrainingPipeline, TrainingConfig

config = TrainingConfig(
    input_file="corpus.txt",
    output_dir="./models/",
    architecture="roberta",
    epochs=5,
)
pipeline = TrainingPipeline()
model_path = pipeline.run(config)

Text Validation

Comprehensive Myanmar text quality validation with 30+ validation categories. Key capabilities:
  • Structural validation (syllable structure, encoding)
  • Zawgyi artifact detection
  • Quality filtering (fragments, truncation)
  • Known invalid word detection
from myspellchecker.text.validator import validate_word

is_valid = validate_word("ကျောင်း")
if is_valid:
    print("Word is valid")

Next Steps