The Gap
Major spell checking tools do not support Myanmar:| Tool | Myanmar Support | Why It Fails |
|---|---|---|
| Hunspell | No usable dictionary | Requires word boundaries (spaces) — Myanmar has none |
| LanguageTool | Not supported | No Myanmar grammar rules, Java dependency |
| Grammarly | Not supported | Cloud-only, English-focused |
| Microsoft Editor | Not supported | No Myanmar language pack |
| pyspellchecker | Not supported | ASCII/Latin focused, Levenshtein-based |
| SymSpell | Not supported | Requires pre-segmented words |
Previous Attempts
| Project | Status | Limitation |
|---|---|---|
| mySpellCorrect (2022) | Dormant | GitHub scripts only, not pip-installable, limited to character substitution |
| myoooext (ThanLwinSoft) | Dormant | OpenOffice extension from 2010s era, dictionary from 1918 Judson’s |
| Hunspell Myanmar dict | Unavailable | Broken links, unconfirmed compatibility |
What Makes mySpellChecker Different
mySpellChecker is not a port of an English spell checker — it was designed from the ground up for Myanmar script.Syllable-First Architecture
Traditional spell checkers split text by whitespace, which fails entirely for Myanmar. mySpellChecker uses a syllable-first pipeline that works without spaces:End-to-End Pipeline
Everything you need ships in onepip install:
| Capability | What It Does |
|---|---|
| Dictionary Building | Build optimized SQLite dictionaries from your own corpus (CLI + Python API) |
| Syllable Validation | Regex-based syllable segmentation with Cython acceleration |
| Word Validation | SymSpell O(1) lookup with phonetic matching |
| Context Checking | Bigram/trigram N-gram probabilities for real-word error detection |
| Grammar Checking | YAML-based rules with POS sequence validation |
| Homophone Detection | Context-aware homophone confusion resolution |
| POS Tagging | Pluggable: rule-based, Viterbi HMM, or transformer |
| NER | Named Entity Recognition for Myanmar text |
| Morphology | Stemming, reduplication detection, compound analysis |
| Segmentation | Syllable (regex) + word (myword/CRF/transformer) |
| Zawgyi Support | Auto-detection and conversion (via Google’s myanmartools) |
| Text Normalization | NFC normalization, diacritic reordering, zero-width removal |
AI-Powered Validation (BYOM)
Two optional AI strategies that you train on your own corpus:| Strategy | Approach | Speed | Output |
|---|---|---|---|
| Error Detection | Fine-tune XLM-RoBERTa for token classification | ~10ms | Error flags |
| Semantic Checking | Train RoBERTa/BERT masked language model | ~200ms | Error flags + suggestions |
train-model, train-detector) — you bring the corpus, it handles tokenizer training, model training, and ONNX export.
Production-Ready
| Feature | Details |
|---|---|
| Async API | check_async(), check_batch_async() for web frameworks |
| Streaming | Memory-bounded processing for large files |
| Batch Processing | check_batch() with parallelization |
| Connection Pooling | Thread-safe SQLite for concurrent access |
| Docker | Multi-stage Dockerfile with GPU support |
| CLI | Full command-line interface for all operations |
| Cython | Optional C extensions for 2-10x performance |
Summary
| mySpellChecker | Generic Spell Checkers | |
|---|---|---|
| Myanmar syllable validation | Built-in | Not possible |
| No-space word segmentation | myword / CRF / transformer | Requires whitespace |
| Context-aware checking | N-gram + semantic AI | LanguageTool only (no Myanmar) |
| Grammar rules | POS-aware, Myanmar-specific | Not for Myanmar |
| Dictionary building | End-to-end pipeline | Manual |
| AI training pipelines | Included | Not applicable |
| Python-native | Yes | Hunspell = C wrapper, LT = Java |
| Open source | MIT | Varies |
Acknowledgments
mySpellChecker builds on foundational work in Myanmar NLP:- myWord by Ye Kyaw Thu — word segmentation algorithm
- myPOS by Ye Kyaw Thu — POS corpus used for CRF training
- myanmar-pos-model by Chuu Htet Naing — transformer POS tagger
- myanmar-text-segmentation-model by Chuu Htet Naing — transformer word segmenter
- myanmartools by Google — Zawgyi detection
- SymSpell4Burmese (2021) — foundational research on SymSpell for Burmese
See Also
- Architecture - System design deep-dive
- FAQ - Common questions
- Quick Start - Get started in 5 minutes