The Gap
As of 2025, major spell checking tools do not support Myanmar:| Tool | Myanmar Support | Why It Fails |
|---|---|---|
| Hunspell | No usable dictionary | Requires word boundaries (spaces), which Myanmar lacks |
| LanguageTool | Not supported | No Myanmar grammar rules, Java dependency |
| Grammarly | Not supported | Cloud-only, English-focused |
| Microsoft Editor | Not supported | No Myanmar language pack |
| pyspellchecker | Not supported | ASCII/Latin focused, Levenshtein-based |
| SymSpell | Not supported | Requires pre-segmented words |
Previous Attempts
| Project | Status | Limitation |
|---|---|---|
| mySpellCorrect (2022) | Dormant | GitHub scripts only, not pip-installable, limited to character substitution |
| myoooext (ThanLwinSoft) | Dormant | OpenOffice extension from 2010s era, dictionary from 1918 Judson’s |
| Hunspell Myanmar dict | Unavailable | Broken links, unconfirmed compatibility |
What Makes mySpellChecker Different
mySpellChecker is not a port of an English spell checker. It was designed from the ground up for Myanmar script.Progressive Validation Pipeline
Traditional spell checkers split text on spaces. Myanmar has none, so they fail entirely. mySpellChecker starts from syllables instead, then runs up to 10 validation strategies in layers:End-to-End Pipeline
Everything you need ships in onepip install:
| Capability | What It Does |
|---|---|
| Dictionary Building | Build optimized SQLite dictionaries from your own corpus (CLI + Python API) |
| Syllable Validation | Regex-based syllable segmentation with Cython acceleration |
| Word Validation | SymSpell O(1) lookup with phonetic matching |
| Context Checking | Bigram/trigram N-gram probabilities for real-word error detection |
| Grammar Checking | YAML-based rules with POS sequence validation |
| Homophone Detection | Context-aware homophone confusion resolution |
| POS Tagging | Pluggable: rule-based, Viterbi HMM, or transformer |
| NER | Named Entity Recognition for Myanmar text |
| Morphology | Stemming, reduplication detection, compound analysis |
| Segmentation | Syllable (regex) + word (myword/CRF/transformer) |
| Zawgyi Support | Auto-detection and conversion (via Google’s myanmartools) |
| Text Normalization | NFC normalization, diacritic reordering, zero-width removal |
AI-Powered Validation (BYOM)
Two optional AI strategies that you train on your own corpus:| Strategy | Approach | Speed | Output |
|---|---|---|---|
| Semantic Checking | Train RoBERTa/BERT masked language model | Slow | Error flags + suggestions |
| Neural Reranker | Train MLP on suggestion features | Fast | Re-ranked suggestions |
train-model): you bring the corpus, and it handles tokenizer training, model training, and ONNX export.
Production-Ready
| Feature | Details |
|---|---|
| Async API | check_async(), check_batch_async() for web frameworks |
| Streaming | Memory-bounded processing for large files |
| Batch Processing | check_batch() with parallelization |
| Connection Pooling | Thread-safe SQLite for concurrent access |
| Docker | Multi-stage Dockerfile with GPU support |
| CLI | Full command-line interface for all operations |
| Cython | Optional C extensions for 2-10x performance |
Summary
| mySpellChecker | Generic Spell Checkers | |
|---|---|---|
| Myanmar syllable validation | Built-in | Not possible |
| No-space word segmentation | myword / CRF / transformer | Requires whitespace |
| Context-aware checking | N-gram + semantic AI | LanguageTool only (no Myanmar) |
| Grammar rules | POS-aware, Myanmar-specific | Not for Myanmar |
| Dictionary building | End-to-end pipeline | Manual |
| AI training pipelines | Included | Not applicable |
| Python-native | Yes | Hunspell = C wrapper, LT = Java |
| Open source | MIT | Varies |
Acknowledgments
mySpellChecker integrates tools and research from the Myanmar NLP community:- myWord by Ye Kyaw Thu, word segmentation algorithm
- myPOS by Ye Kyaw Thu, POS corpus used for CRF training
- myanmar-pos-model by Chuu Htet Naing, transformer POS tagger
- myanmar-text-segmentation-model by Chuu Htet Naing, transformer word segmenter
- myanmartools by Google, Zawgyi detection
- SymSpell4Burmese (2021), foundational research on SymSpell for Burmese
See Also
- Architecture - System design deep-dive
- FAQ - Common questions
- Quick Start - Get started in 5 minutes