Skip to main content
Answers to the most common questions about mySpellChecker, from setup and configuration to performance optimization and web framework integration.

General Questions

What is mySpellChecker?

mySpellChecker is a Myanmar (Burmese) text intelligence library with a 12-strategy checking pipeline, dictionary building tools, and AI model training. It starts by breaking text into syllables (since Myanmar has no spaces between words), validates structure cheaply, then progressively moves through dictionary lookups, grammar rules, confusable detection, N-gram context, and ONNX-powered AI inference for the errors that need deeper analysis.

Why was mySpellChecker created?

It all started with an inquisitive mind while I was cleaning one of my legacy pet project’s database, a Burmese poetry site I launched 20 years ago. I started wondering if I could analyze the data to learn how different poets used rhyme patterns, what the most common ones were, and whether I could build a rhyme pattern generator from it. When I went through the data, encoding issues everywhere since we didn’t even have proper Unicode fonts back then, and spelling errors as expected from community-contributed content. Mind being mind, that drifted me away from my original intention and I started wondering why there’s no spellchecker for Burmese. What makes it so difficult that no one pulled their sword, wielded it and made this happen? That’s where my research started and I already sensed how difficult it would be. Turns out, that was just the tip of the iceberg. Lack of clean data, limited linguistic papers online, and the sheer complexity of the language probably drove people away from committing. For this type of project with no guaranteed success, there are only two paths at each phase. Quit early or go crazy. I took the latter and went greedily from one achievement to the next, sweeping away my nights and weekends for about 4 months (all my rest time). Without AI assistance, I would have been a lot lonelier and might have taken double the time. Even though frontier LLMs still have limited Burmese NLP knowledge, I could heavily rely on them for common tasks they already excel at. That’s the biggest win of this AI era, knowing how and where to get assistance from. 33 million people speak Burmese as their first language and another 10 million as their second (per Wikipedia). And yet, not a single proper writing tool exists for the language. Let me proudly put a stop to this. After many hours of trials, errors and validations (just don’t take my word on it, check the benchmark scores), I’m releasing version 1.0 of mySpellChecker under the MIT License. It goes well beyond just spell checking. From rule-based to AI semantic, build your own dictionaries, train your own models. One library that handles it all. If you’re a developer interested in Burmese language or NLP, I’d love for you to check out the library and see what you can build with it. Happy Building!

Is mySpellChecker open source?

Yes, mySpellChecker is open source and available under the MIT license.

What Python versions are supported?

mySpellChecker supports Python 3.10 and later.

Installation

Why does installation take a long time?

mySpellChecker includes Cython extensions that are compiled during installation. This requires a C++ compiler and takes extra time. If you don’t have a compiler, the library will fall back to pure Python implementations (slower but functional).

Do I need a C++ compiler?

No, it’s optional but recommended. Without a compiler:
  • Installation is faster
  • Pure Python fallbacks are used
  • Performance is lower (but still functional)
With a compiler:
  • Installation compiles Cython extensions
  • Performance is significantly better
  • OpenMP parallel processing is available (Linux/macOS)

How do I install on macOS without compiler errors?

# Install Xcode command line tools
xcode-select --install

# For OpenMP support (optional)
brew install libomp

# Then install normally
pip install myspellchecker

Can I install without the optional features?

Yes, install the base package only:
pip install myspellchecker
Optional features are installed separately:
pip install myspellchecker[ai]          # Semantic checking
pip install myspellchecker[transformers] # Transformer POS tagger

Usage

How do I check Myanmar text?

from myspellchecker import SpellChecker

checker = SpellChecker()
result = checker.check("မြန်မာစာ")

if result.has_errors:
    for error in result.errors:
        print(error.text, error.suggestions)

What is the difference between validation levels?

LevelCoverageUse Case
syllableStructural errorsQuick validation
wordStructural + dictionary errorsStandard checking
See benchmarks for measured performance data. Note: Context checking is enabled via use_context_checker=True parameter, not as a separate validation level.

How do I handle Zawgyi text?

from myspellchecker import SpellChecker
from myspellchecker.core.config import SpellCheckerConfig
from myspellchecker.core.config.validation_configs import ValidationConfig

# Zawgyi detection and conversion are enabled by default
# To explicitly configure:
config = SpellCheckerConfig(
    validation=ValidationConfig(
        use_zawgyi_detection=True,  # Detect Zawgyi encoding
        use_zawgyi_conversion=True,  # Convert to Unicode automatically
    )
)
checker = SpellChecker(config=config)

Can I use a custom dictionary?

Yes, you can build a custom dictionary from your own corpus:
myspellchecker build --input my_corpus.txt --output my_dict.db
Then use it:
from myspellchecker import SpellChecker
from myspellchecker.providers import SQLiteProvider

provider = SQLiteProvider(database_path="my_dict.db")
checker = SpellChecker(provider=provider)

How do I add words to the dictionary at runtime?

Currently, the dictionary is read-only at runtime. To add words:
  1. Add them to your corpus file
  2. Rebuild the dictionary
  3. Restart your application
Runtime dictionary modification is not currently supported.

Performance

How can I make spell checking faster?

  1. Use syllable-level validation only (fastest):
    from myspellchecker import SpellChecker
    from myspellchecker.core.constants import ValidationLevel
    
    checker = SpellChecker()
    result = checker.check(text, level=ValidationLevel.SYLLABLE)
    
  2. Disable context checking:
    from myspellchecker.core.config import SpellCheckerConfig
    
    config = SpellCheckerConfig(use_context_checker=False)
    checker = SpellChecker(config=config)
    
  3. Use batch processing:
    results = checker.check_batch(texts)  # More efficient than individual calls
    
  4. Ensure Cython is compiled:
    python setup.py build_ext --inplace
    

Why is the first call slow?

The first call initializes the dictionary and loads models into memory. Subsequent calls are much faster. Consider warming up the checker:
checker = SpellChecker()
checker.check("test")  # Warm-up call

How much memory does mySpellChecker use?

ConfigurationMemory Usage
Basic (SQLite provider)~50MB
Memory provider~200MB
With semantic model~500MB
With transformer POS~1GB

Can I use mySpellChecker in a multi-threaded application?

The SpellChecker instance is not thread-safe by default. For multi-threaded use:
  1. Create separate instances per thread
  2. Or use a connection pool for the SQLite provider
  3. Or use the Memory provider (which is thread-safe for reads)

Accuracy

How accurate is mySpellChecker?

Accuracy depends on the validation level, corpus quality, and domain. Results vary based on your dictionary and text domain. Run your own benchmarks for production accuracy numbers.

Why does it mark valid words as errors?

Common reasons:
  1. Word not in dictionary: Add to custom dictionary
  2. Rare spelling variant: Check corpus coverage
  3. Foreign word: Myanmar text with English/Pali words
  4. Proper noun: Names often flagged as unknown

Why doesn’t it catch obvious errors?

Common reasons:
  1. Real-word error: The misspelling is a valid word (enable use_context_checker=True)
  2. Validation level too low: Increase to word level
  3. Missing grammar rules: Some patterns not covered

How can I improve accuracy?

  1. Build from a larger corpus: More data = better suggestions
  2. Enable context validation: Catches real-word errors
  3. Use semantic checking: Requires AI extras
  4. Report false positives/negatives: Help improve the library

Dictionary Building

How do I build a dictionary?

# From a text file
myspellchecker build --input corpus.txt --output mydict.db

# With POS tagging
myspellchecker build --input corpus.txt --pos-tagger viterbi

What format should my corpus be?

Plain text with Myanmar content:
မြန်မာနိုင်ငံ
ကျေးဇူးတင်ပါသည်
...
Or structured formats (CSV, JSON) with specific columns.

How big should my corpus be?

Corpus size directly determines dictionary quality. Myanmar has a large vocabulary with many compound words and domain-specific terms, and small corpora miss too many of them.
Corpus SizeCoverageRecommendation
Under 1GBMinimalTesting and prototyping only
1GB - 10GBModeratePersonal projects, narrow domains
10GB - 50GBGoodProduction use
50GB+ComprehensiveProfessional, multi-domain coverage
For reference, the production dictionary (~500MB SQLite database) was built from approximately 50GB of corpus data. Don’t underestimate how much raw text you need, especially for N-gram frequency tables and confusable pair mining.

Can I use multiple corpora?

Yes, use incremental building:
myspellchecker build --input corpus1.txt --output dict.db
myspellchecker build --input corpus2.txt --output dict.db --incremental

Integration

Can I use mySpellChecker in a web application?

Yes. The library is tested with FastAPI and provides check_async() for non-blocking use in async endpoints. See the Integration Guide for a FastAPI example with connection pooling and proper lifecycle management.

Is there a REST API?

The library doesn’t include a built-in API server, but it’s straightforward to wrap with FastAPI:
from fastapi import FastAPI
from myspellchecker import SpellChecker

app = FastAPI()
checker = SpellChecker()

@app.post("/check")
async def check(text: str):
    result = await checker.check_async(text)
    return result.to_dict()
A full-featured web application (sarsit, စာစစ်) powered by mySpellChecker is currently in active development and coming soon at sarsit.app.

Can I use it with VS Code?

There is no VS Code extension yet. Currently, you can:
  1. Use the CLI for manual checking
  2. Create a custom script that integrates with your editor

Troubleshooting

See Also

Contributing

How can I contribute?

  1. Report bugs via GitHub issues
  2. Submit pull requests for fixes
  3. Improve documentation
  4. Share your custom dictionaries or corpora

How do I report a bug?

Open a GitHub issue with:
  1. Python version
  2. mySpellChecker version
  3. Minimal reproduction code
  4. Expected vs actual behavior