Feature Matrix
Note: Speed ratings are relative comparisons, not validated benchmarks. Actual performance depends on your dictionary size, hardware, and configuration.
| Feature | Description | Speed | Optional |
|---|---|---|---|
| Syllable Validation | Rule-based syllable structure checking | Very Fast | No |
| Word Validation | Dictionary lookup with SymSpell suggestions | Fast | No |
| Context Checking | N-gram based context validation | Moderate | Yes |
| Grammar Checking | POS-based syntactic validation | Fast | Yes |
| Semantic Checking | AI-powered deep context analysis | Slow | Yes |
| NER | Named entity recognition | Varies | Yes |
| Morphology | Word structure analysis | Very Fast | Yes |
| Morphological Synthesis | Compound/reduplication validation | Very Fast | Yes |
| Grammar Checkers | Aspect/Classifier/Compound/MergedWord/Negation/Register | Fast | Yes |
| Validation Strategies | Composable validation pipeline (9 strategies) | Varies | Yes |
| Normalization | Unified text normalization service | Very Fast | No |
| Batch Processing | Parallel multi-text processing | Varies | No |
| Async API | Non-blocking async operations | - | No |
| Streaming API | Memory-efficient large file processing | Varies | No |
| Segmenters | Syllable/word/sentence segmentation | Very Fast | No |
| Suggestion Ranking | Multi-factor suggestion scoring | Very Fast | No |
| Connection Pool | Thread-safe connection management | - | No |
| Homophones | Sound-alike word detection | Fast | Yes |
| Colloquial Variants | Informal/formal spelling detection | Very Fast | Yes |
| i18n (Localization) | Error messages in English/Myanmar | Very Fast | No |
Core Features
Syllable Validation
The foundation of mySpellChecker. Validates Myanmar syllable structure using orthographic rules and dictionary lookup. Key capabilities:- Rule-based syllable structure validation
- Consonant-medial-vowel pattern checking
- Dictionary syllable lookup
- O(1) validation performance
Word Validation
Validates complete words using dictionary lookup and the SymSpell algorithm for efficient suggestion generation. Key capabilities:- Dictionary word lookup
- SymSpell O(1) suggestions
- Edit distance calculation
- Compound word handling
Context Checking
Detects “real-word errors” where a word is spelled correctly but used incorrectly in context. Key capabilities:- Bigram probability analysis
- Trigram context windows
- Statistical language modeling
- Real-word error detection
Advanced Features
POS Tagging
Part-of-Speech tagging with multiple backend options for different accuracy/speed trade-offs. Tagger options:| Type | Accuracy | Speed | Dependencies |
|---|---|---|---|
| Rule-based | ~70% | Fast | None |
| Viterbi | ~85% | Medium | None |
| Transformer | ~93% | Slow | transformers, torch |
Grammar Checking
Rule-based syntactic validation using POS tags to detect grammatical errors. Key capabilities:- Particle usage validation
- Verb-modifier agreement
- Sentence structure checking
- Custom grammar rule support
Grammar Engine
Comprehensive syntactic rule checker coordinating six specialized checkers. Key capabilities:- Particle typo detection
- Medial confusion detection (ျ vs ြ)
- POS sequence validation
- Verb-particle agreement
- Configurable confidence thresholds
Semantic Checking
Deep learning-based context analysis using ONNX models for the highest accuracy. Key capabilities:- BERT/RoBERTa masked language modeling
- Semantic context understanding
- Confidence scoring
- Quantized CPU inference
Performance Features
Batch Processing
Efficient processing of multiple texts with parallelization. Key capabilities:- Cython-optimized processing
- OpenMP parallelization
- Batch result aggregation
- Memory-efficient streaming
Async API
Non-blocking async operations for web applications. Key capabilities:- Native async/await support
- FastAPI/Starlette integration
- Concurrent request handling
- Async batch processing
Integration Features
Connection Pool
Thread-safe database connection management for high-concurrency scenarios. Key capabilities:- Configurable min/max pool size
- Automatic connection health checks
- Connection aging and recreation
- Pool statistics and monitoring
Segmenters
Multiple text segmentation strategies for Myanmar text. Segmenter types:| Type | Description | Use Case |
|---|---|---|
| DefaultSegmenter | Production segmenter | General use |
| RegexSegmenter | Rule-based syllables | Lightweight |
Homophones Detection
Detects sound-alike words that may be confused in context.Colloquial Variant Handling
Detects colloquial (informal) spellings and suggests standard forms. Key capabilities:- Colloquial form detection
- Standard form suggestion
- Configurable strictness levels
| Strictness | Behavior |
|---|---|
strict | Flag all colloquial variants as errors |
lenient | Accept with informational note (default) |
off | No special handling |
Internationalization (i18n)
Localized error messages in English and Myanmar."en" (English), "my" (Myanmar)
Streaming API
Memory-efficient stream processing for large documents with progress callbacks. Key capabilities:- Generator-based synchronous streaming
- Async iteration support
- Progress callbacks and statistics
- Memory limits with backpressure
- Cross-sentence context validation
Custom Providers
Pluggable storage backends for different use cases.Feature Comparison by Use Case
Real-Time Typing
Document Checking
Quality Assurance
High-Volume Processing
Feature Dependencies
- Green: Core features (always available)
- Blue: Advanced features (optional)
- Purple: AI features (requires extra dependencies)
Text Processing Features
Named Entity Recognition
Identifies names, locations, and organizations to reduce false positives. Key capabilities:- Heuristic-based NER (fast, ~70% accuracy)
- Transformer-based NER (~93% accuracy)
- Hybrid mode with automatic fallback
- Entity filtering for spell checking
Morphology Analysis
Word structure analysis for POS inference and OOV recovery. Key capabilities:- Suffix-based POS guessing
- Word decomposition (root + suffixes)
- Multi-POS support for ambiguous words
- Numeral detection
- Productive reduplication validation (AA, AABB, ABAB patterns)
- Compound word synthesis (DP-based splitting into known morphemes)
- Morpheme-level suggestions (correct typos inside compounds)
Text Utilities
Specialized utilities for Myanmar text processing. Key capabilities:- Stemmer: Rule-based suffix stripping with caching
- Phonetic Hasher: Sound-based fuzzy matching
- Tone Disambiguator: Context-based tone resolution
- Zawgyi Detection: Legacy encoding detection
Grammar Features
Suggestion Ranking
Multi-factor ranking system for spelling suggestions. Ranker types:| Ranker | Primary Factor | Use Case |
|---|---|---|
| DefaultRanker | Edit distance + frequency | General use |
| FrequencyFirstRanker | Corpus frequency | Autocomplete |
| PhoneticFirstRanker | Phonetic similarity | Myanmar text |
| UnifiedRanker | Multi-source | Comprehensive |
Grammar Checkers
Six specialized checkers for Myanmar grammar validation.| Checker | Purpose |
|---|---|
| AspectChecker | Verb aspect markers |
| ClassifierChecker | Numeral classifiers |
| CompoundChecker | Compound words |
| MergedWordChecker | Merged particle+verb detection |
| NegationChecker | Negation patterns |
| RegisterChecker | Formal/colloquial register |
Text Normalization
Unified normalization service for consistent text processing. Key capabilities:- Purpose-specific normalization methods
- Zawgyi detection and conversion
- Unicode NFC normalization
- Myanmar diacritic reordering
Validation Strategies
Strategy-based validation pipeline for composable error detection. Strategies (by priority):| Strategy | Priority | Purpose |
|---|---|---|
| ToneValidation | 10 | Tone mark disambiguation |
| Orthography | 15 | Orthographic error detection |
| SyntacticRule | 20 | Grammar rule checking |
| POSSequence | 30 | POS sequence validation |
| Question | 40 | Question structure |
| Homophone | 45 | Sound-alike detection |
| NgramContext | 50 | N-gram probability |
| ErrorDetection | 65 | AI token classification (ONNX) |
| Semantic | 70 | AI-powered validation |
Architecture
Dependency Injection
Lightweight DI system for component management. Key components:- ServiceContainer for lazy initialization
- Factory functions for component creation
- Singleton and transient service support
- Thread-safe service resolution
Reference
Rules System
YAML configuration files for linguistic rules. Key files:particles.yaml- 91 linguistic particlestypo_corrections.yaml- Common typo patternsmorphology.yaml- Suffix/prefix patternsmorphotactics.yaml- Compound word POS pattern rulesaspects.yaml- Verb aspect markersclassifiers.yaml- Numeral classifiersregister.yaml- Formal/colloquial mappings
Guides
Configuration Guide
Comprehensive configuration options. Topics:- SpellCheckerConfig and nested configs
- Pre-defined configuration profiles
- Loading from files and environment
Logging Guide
Centralized logging system. Features:- Development and production modes
- JSON structured logging
- Module-specific log levels
- LoggerMixin for classes
Training Features
Training Pipeline
End-to-end pipeline for training custom semantic models. Pipeline stages:- Tokenizer Training (Byte-Level BPE)
- Model Training (RoBERTa/BERT MLM)
- ONNX Export (quantized)
Text Validation
Comprehensive Myanmar text quality validation with 30+ validation categories. Key capabilities:- Structural validation (syllable structure, encoding)
- Zawgyi artifact detection
- Quality filtering (fragments, truncation)
- Known invalid word detection
Next Steps
- Explore individual feature documentation
- Configuration Guide for enabling/disabling features
- Performance Tuning for optimization
- Rules System for customization
- Dependency Injection for architecture
- Component Standards for code patterns