Documentation Index
Fetch the complete documentation index at: https://docs.myspellchecker.com/llms.txt
Use this file to discover all available pages before exploring further.
Each strategy in the pipeline can be toggled, tuned, or replaced independently. This page summarizes every feature and links to its dedicated guide.
Feature Matrix
Note: Speed ratings are relative comparisons, not validated benchmarks. Actual performance depends on your dictionary size, hardware, and configuration.
| Feature | Description | Speed | Optional |
|---|
| Syllable Validation | Rule-based syllable structure checking | Very Fast | No |
| Word Validation | Dictionary lookup with SymSpell suggestions | Fast | No |
| Context Checking | N-gram based context validation | Moderate | Yes |
| Grammar Checking | POS-based syntactic validation | Fast | Yes |
| Semantic Checking | AI-powered deep context analysis | Slow | Yes |
| NER | Named entity recognition | Varies | Yes |
| Morphology | Word structure analysis | Very Fast | Yes |
| Morphological Synthesis | Compound/reduplication validation | Very Fast | Yes |
| Grammar Checkers | Aspect/Classifier/Compound/MergedWord/Negation/Particle/TenseAgreement/Register | Fast | Yes |
| Validation Strategies | Composable validation pipeline (12 strategies) | Varies | Yes |
| Normalization | Unified text normalization service | Very Fast | No |
| Batch Processing | Parallel multi-text processing | Varies | No |
| Async API | Non-blocking async operations | - | No |
| Streaming API | Memory-efficient large file processing | Varies | No |
| Segmenters | Syllable/word/sentence segmentation | Very Fast | No |
| Suggestion Ranking | Multi-factor suggestion scoring | Very Fast | No |
| Connection Pool | Thread-safe connection management | - | No |
| Homophones | Sound-alike word detection | Fast | Yes |
| Colloquial Variants | Informal/formal spelling detection | Very Fast | Yes |
| i18n (Localization) | Error messages in English/Myanmar | Very Fast | No |
Core Features
The foundation of mySpellChecker. Validates Myanmar syllable structure using orthographic rules and dictionary lookup.
Key capabilities:
- Rule-based syllable structure validation
- Consonant-medial-vowel pattern checking
- Dictionary syllable lookup
- O(1) validation performance
# Syllable validation catches ~90% of typos immediately
result = checker.check("မြန်မာ") # Valid syllables
Validates complete words using dictionary lookup and the SymSpell algorithm for efficient suggestion generation.
Key capabilities:
- Dictionary word lookup
- SymSpell O(1) suggestions
- Edit distance calculation
- Compound word handling
from myspellchecker import SpellChecker
from myspellchecker.providers import SQLiteProvider
from myspellchecker.core.constants import ValidationLevel
# Get word-level suggestions (level specified per-check)
provider = SQLiteProvider(database_path="path/to/dictionary.db")
checker = SpellChecker(provider=provider)
result = checker.check(text, level=ValidationLevel.WORD)
Detects “real-word errors” where a word is spelled correctly but used incorrectly in context.
Key capabilities:
- Bigram probability analysis
- Trigram context windows
- Statistical language modeling
- Real-word error detection
# Detects unnatural word combinations (e.g., "rice go" vs "eat rice")
config = SpellCheckerConfig(use_context_checker=True)
Advanced Features
Part-of-Speech tagging with multiple backend options for different accuracy/speed trade-offs.
Tagger options:
| Type | Accuracy | Speed | Dependencies |
|---|
| Rule-based | ~70% | Fast | None |
| Viterbi | ~85% | Medium | None |
| Transformer | ~93% | Slow | transformers, torch |
from myspellchecker.core.config import POSTaggerConfig
config = SpellCheckerConfig(
pos_tagger=POSTaggerConfig(tagger_type="transformer")
)
Rule-based syntactic validation using POS tags to detect grammatical errors.
Key capabilities:
- Particle usage validation
- Verb-modifier agreement
- Sentence structure checking
- Custom grammar rule support
# Detects particle errors like မှာ vs မှ
config = SpellCheckerConfig(use_rule_based_validation=True)
Comprehensive syntactic rule checker coordinating eight specialized checkers.
Key capabilities:
- Particle typo detection
- Medial confusion detection (ျ vs ြ)
- POS sequence validation
- Verb-particle agreement
- Configurable confidence thresholds
from myspellchecker.grammar import SyntacticRuleChecker
checker = SyntacticRuleChecker(provider)
corrections = checker.check_sequence(["ကျွန်တော်", "ကျောင်း", "သွားတယ်"])
Deep learning-based context analysis using ONNX models for the highest accuracy.
Key capabilities:
- BERT/RoBERTa masked language modeling
- Semantic context understanding
- Confidence scoring
- Quantized CPU inference
from myspellchecker.core.config import SpellCheckerConfig, SemanticConfig
# Enable AI-powered checking
config = SpellCheckerConfig(
semantic=SemanticConfig(
model_path="path/to/model.onnx",
tokenizer_path="path/to/tokenizer"
)
)
Efficient processing of multiple texts with parallelization.
Key capabilities:
- Cython-optimized processing
- OpenMP parallelization
- Batch result aggregation
- Memory-efficient streaming
# Process thousands of texts efficiently
results = checker.check_batch(texts)
Non-blocking async operations for web applications.
Key capabilities:
- Native async/await support
- FastAPI/Starlette integration
- Concurrent request handling
- Async batch processing
# Non-blocking spell checking
result = await checker.check_async(text)
results = await checker.check_batch_async(texts)
Integration Features
Thread-safe database connection management for high-concurrency scenarios.
Key capabilities:
- Configurable min/max pool size
- Automatic connection health checks
- Connection aging and recreation
- Pool statistics and monitoring
from myspellchecker.providers.connection_pool import ConnectionPool
from myspellchecker.core.config import ConnectionPoolConfig
pool_config = ConnectionPoolConfig(min_size=2, max_size=10)
pool = ConnectionPool("/path/to/db.sqlite", pool_config=pool_config)
with pool.checkout() as conn:
cursor = conn.cursor()
Multiple text segmentation strategies for Myanmar text.
Segmenter types:
| Type | Description | Use Case |
|---|
| DefaultSegmenter | Production segmenter | General use |
| RegexSegmenter | Rule-based syllables | Lightweight |
from myspellchecker.segmenters import DefaultSegmenter
segmenter = DefaultSegmenter(word_engine="myword")
syllables = segmenter.segment_syllables("မြန်မာစာ")
Detects sound-alike words that may be confused in context.
from myspellchecker.core.homophones import HomophoneChecker
checker = HomophoneChecker()
homophones = checker.get_homophones("ကျား") # Returns set of homophones
has_match = len(checker.get_homophones("ကြား")) > 0 # Check if homophones exist
Colloquial Variant Handling
Detects colloquial (informal) spellings and suggests standard forms.
Key capabilities:
- Colloquial form detection
- Standard form suggestion
- Configurable strictness levels
from myspellchecker.text.phonetic_data import is_colloquial_variant, get_standard_forms
# Check if word is colloquial
is_colloquial_variant("ကျနော်") # True
# Get standard form
get_standard_forms("ကျနော်") # ["ကျွန်တော်"]
Configuration:
from myspellchecker.core.config.validation_configs import ValidationConfig
config = ValidationConfig(
colloquial_strictness="lenient", # "strict", "lenient", or "off"
colloquial_info_confidence=0.3,
)
| Strictness | Behavior |
|---|
strict | Flag all colloquial variants as errors |
lenient | Accept with informational note (default) |
off | No special handling |
Internationalization (i18n)
Localized error messages in English and Myanmar.
from myspellchecker.core.i18n import set_language, get_message
# Set language to Myanmar
set_language("my")
# Get localized message
get_message("invalid_syllable")
# Output: စာလုံးပေါင်း မမှန်ကန်ပါ
Supported languages: "en" (English), "my" (Myanmar)
Memory-efficient stream processing for large documents with progress callbacks.
Key capabilities:
- Generator-based synchronous streaming
- Async iteration support
- Progress callbacks and statistics
- Memory limits with backpressure
- Cross-sentence context validation
from myspellchecker.core.streaming import StreamingChecker
streaming = StreamingChecker(checker)
with open("large_file.txt") as f:
for result in streaming.check_stream(f):
if result.response.has_errors:
process(result)
Pluggable storage backends for different use cases.
from myspellchecker.providers import MemoryProvider, SQLiteProvider
# High-speed in-memory
checker = SpellChecker(provider=MemoryProvider())
# Disk-based for large dictionaries
checker = SpellChecker(provider=SQLiteProvider())
Feature Comparison by Use Case
Real-Time Typing
from myspellchecker import SpellChecker
from myspellchecker.providers import SQLiteProvider
from myspellchecker.core.constants import ValidationLevel
# Fastest: syllable-only validation (level specified per-check)
provider = SQLiteProvider(database_path="path/to/dictionary.db")
checker = SpellChecker(provider=provider)
result = checker.check(text, level=ValidationLevel.SYLLABLE)
Document Checking
from myspellchecker import SpellChecker
from myspellchecker.core.config import SpellCheckerConfig
from myspellchecker.providers import SQLiteProvider
from myspellchecker.core.constants import ValidationLevel
# Balanced: word + context
config = SpellCheckerConfig(
use_context_checker=True,
use_rule_based_validation=True,
)
provider = SQLiteProvider(database_path="path/to/dictionary.db")
checker = SpellChecker(config=config, provider=provider)
result = checker.check(text, level=ValidationLevel.WORD)
Quality Assurance
# Thorough: full validation with AI (requires SQLiteProvider from above)
config = SpellCheckerConfig(
use_context_checker=True,
semantic=SemanticConfig(model_path="..."),
)
provider = SQLiteProvider(database_path="path/to/dictionary.db")
checker = SpellChecker(config=config, provider=provider)
# Use word-level validation for thorough checking
result = checker.check(text, level=ValidationLevel.WORD, use_semantic=True)
High-Volume Processing
# Optimized for throughput
from myspellchecker import SpellChecker
from myspellchecker.providers import SQLiteProvider
from myspellchecker.core.config import SpellCheckerConfig
config = SpellCheckerConfig(use_context_checker=False) # Faster
provider = SQLiteProvider(pool_max_size=10)
checker = SpellChecker(config=config, provider=provider)
results = checker.check_batch(texts)
Feature Dependencies
+------------------------+
| Syllable Validation | [core]
+-----------+------------+
|
v
+------------------------+
| Word Validation | [core]
+-----------+------------+
|
+----+----+
| |
v v
+-------------+ +------------------+
| Context | | Grammar | [advanced]
| Checking | | Checking |
+------+------+ +--------+---------+
| |
v v
+-------------+ +------------------+
| Semantic | | POS Tagging | [advanced]
| Checking | | |
+-------------+ +------------------+
[ai]
Legend:
- Green: Core features (always available)
- Blue: Advanced features (optional)
- Purple: AI features (requires extra dependencies)
Text Processing Features
Identifies names, locations, and organizations to reduce false positives.
Key capabilities:
- Heuristic-based NER (fast, ~70% accuracy)
- Transformer-based NER (~93% accuracy)
- Hybrid mode with automatic fallback
- Entity filtering for spell checking
from myspellchecker.text.ner_model import NERConfig
config = SpellCheckerConfig(
ner=NERConfig(enabled=True, model_type="heuristic")
)
Word structure analysis for POS inference and OOV recovery.
Key capabilities:
- Suffix-based POS guessing
- Word decomposition (root + suffixes)
- Multi-POS support for ambiguous words
- Numeral detection
- Productive reduplication validation (AA, AABB, ABAB patterns)
- Compound word synthesis (DP-based splitting into known morphemes)
- Morpheme-level suggestions (correct typos inside compounds)
from myspellchecker.text.morphology import MorphologyAnalyzer
from myspellchecker.text.reduplication import ReduplicationEngine
from myspellchecker.text.compound_resolver import CompoundResolver
# OOV analysis (existing)
analyzer = MorphologyAnalyzer()
result = analyzer.analyze_word("စားခဲ့သည်")
print(result.root) # "စား"
print(result.suffixes) # ["ခဲ့", "သည်"]
# Reduplication validationengine = ReduplicationEngine(segmenter=segmenter)
result = engine.analyze("ကောင်းကောင်း", dict_check, freq_check, pos_check)
# Valid AA reduplication of "ကောင်း"
# Compound synthesisresolver = CompoundResolver(segmenter=segmenter)
result = resolver.resolve("ကျောင်းသား", dict_check, freq_check, pos_check)
# Valid N+N compound: ["ကျောင်း", "သား"]
Specialized utilities for Myanmar text processing.
Key capabilities:
- Stemmer: Rule-based suffix stripping with caching
- Phonetic Hasher: Sound-based fuzzy matching
- Tone Disambiguator: Context-based tone resolution
- Zawgyi Detection: Legacy encoding detection
from myspellchecker.text.stemmer import Stemmer
from myspellchecker.text.phonetic import PhoneticHasher
stemmer = Stemmer()
hasher = PhoneticHasher()
Grammar Features
Multi-factor ranking system for spelling suggestions.
Ranker types:
| Ranker | Primary Factor | Use Case |
|---|
| DefaultRanker | Edit distance + frequency | General use |
| FrequencyFirstRanker | Corpus frequency | Autocomplete |
| PhoneticFirstRanker | Phonetic similarity | Myanmar text |
| UnifiedRanker | Multi-source | Comprehensive |
from myspellchecker.algorithms.ranker import FrequencyFirstRanker
ranker = FrequencyFirstRanker()
symspell = SymSpell(provider, ranker=ranker)
Eight specialized checkers for Myanmar grammar validation.
| Checker | Purpose |
|---|
| AspectChecker | Verb aspect markers |
| ClassifierChecker | Numeral classifiers |
| CompoundChecker | Compound words |
| MergedWordChecker | Merged particle+verb detection |
| NegationChecker | Negation patterns |
| ParticleChecker | Particle context validation |
| TenseAgreementChecker | Tense-time agreement |
| RegisterChecker | Formal/colloquial register |
from myspellchecker.grammar.checkers.aspect import AspectChecker
from myspellchecker.grammar.checkers.register import RegisterChecker
aspect_checker = AspectChecker()
register_checker = RegisterChecker()
Unified normalization service for consistent text processing.
Key capabilities:
- Purpose-specific normalization methods
- Zawgyi detection and conversion
- Unicode NFC normalization
- Myanmar diacritic reordering
from myspellchecker.text.normalization_service import get_normalization_service
service = get_normalization_service()
normalized = service.for_spell_checking(text)
Strategy-based validation pipeline for composable error detection.
Strategies (by priority):
| Strategy | Priority | Purpose |
|---|
| ToneValidation | 10 | Tone mark disambiguation |
| Orthography | 15 | Orthographic error detection |
| SyntacticRule | 20 | Grammar rule checking |
| BrokenCompound | 25 | Broken compound detection |
| POSSequence | 30 | POS sequence validation |
| Question | 40 | Question structure |
| Homophone | 45 | Sound-alike detection |
| ConfusableSemantic | 48 | AI confusable detection (opt-in) |
| NgramContext | 50 | N-gram probability |
| Semantic | 70 | AI-powered validation (opt-in) |
Architecture
Lightweight DI system for component management.
Key components:
- ServiceContainer for lazy initialization
- Factory functions for component creation
- Singleton and transient service support
- Thread-safe service resolution
Reference
YAML configuration files for linguistic rules.
Key files:
particles.yaml - 91 linguistic particles
typo_corrections.yaml - Common typo patterns
morphology.yaml - Suffix/prefix patterns
morphotactics.yaml - Compound word POS pattern rules
aspects.yaml - Verb aspect markers
classifiers.yaml - Numeral classifiers
register.yaml - Formal/colloquial mappings
Guides
Comprehensive configuration options.
Topics:
- SpellCheckerConfig and nested configs
- Pre-defined configuration profiles
- Loading from files and environment
Centralized logging system.
Features:
- Development and production modes
- JSON structured logging
- Module-specific log levels
get_logger() for consistent naming
Training Features
End-to-end pipeline for training custom semantic models.
Pipeline stages:
- Tokenizer Training (Byte-Level BPE)
- Model Training (RoBERTa/BERT MLM)
- ONNX Export (quantized)
from myspellchecker.training import TrainingPipeline, TrainingConfig
config = TrainingConfig(
input_file="corpus.txt",
output_dir="./models/",
architecture="roberta",
epochs=5,
)
pipeline = TrainingPipeline()
model_path = pipeline.run(config)
Comprehensive Myanmar text quality validation with 30+ validation categories.
Key capabilities:
- Structural validation (syllable structure, encoding)
- Zawgyi artifact detection
- Quality filtering (fragments, truncation)
- Known invalid word detection
from myspellchecker.text.validator import validate_word
is_valid = validate_word("ကျောင်း")
if is_valid:
print("Word is valid")
Next Steps