Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.myspellchecker.com/llms.txt

Use this file to discover all available pages before exploring further.

Each strategy in the pipeline can be toggled, tuned, or replaced independently. This page summarizes every feature and links to its dedicated guide.

Feature Matrix

Note: Speed ratings are relative comparisons, not validated benchmarks. Actual performance depends on your dictionary size, hardware, and configuration.
FeatureDescriptionSpeedOptional
Syllable ValidationRule-based syllable structure checkingVery FastNo
Word ValidationDictionary lookup with SymSpell suggestionsFastNo
Context CheckingN-gram based context validationModerateYes
Grammar CheckingPOS-based syntactic validationFastYes
Semantic CheckingAI-powered deep context analysisSlowYes
NERNamed entity recognitionVariesYes
MorphologyWord structure analysisVery FastYes
Morphological SynthesisCompound/reduplication validationVery FastYes
Grammar CheckersAspect/Classifier/Compound/MergedWord/Negation/Particle/TenseAgreement/RegisterFastYes
Validation StrategiesComposable validation pipeline (12 strategies)VariesYes
NormalizationUnified text normalization serviceVery FastNo
Batch ProcessingParallel multi-text processingVariesNo
Async APINon-blocking async operations-No
Streaming APIMemory-efficient large file processingVariesNo
SegmentersSyllable/word/sentence segmentationVery FastNo
Suggestion RankingMulti-factor suggestion scoringVery FastNo
Connection PoolThread-safe connection management-No
HomophonesSound-alike word detectionFastYes
Colloquial VariantsInformal/formal spelling detectionVery FastYes
i18n (Localization)Error messages in English/MyanmarVery FastNo

Core Features

Syllable Validation

The foundation of mySpellChecker. Validates Myanmar syllable structure using orthographic rules and dictionary lookup. Key capabilities:
  • Rule-based syllable structure validation
  • Consonant-medial-vowel pattern checking
  • Dictionary syllable lookup
  • O(1) validation performance
# Syllable validation catches ~90% of typos immediately
result = checker.check("မြန်မာ")  # Valid syllables

Word Validation

Validates complete words using dictionary lookup and the SymSpell algorithm for efficient suggestion generation. Key capabilities:
  • Dictionary word lookup
  • SymSpell O(1) suggestions
  • Edit distance calculation
  • Compound word handling
from myspellchecker import SpellChecker
from myspellchecker.providers import SQLiteProvider
from myspellchecker.core.constants import ValidationLevel

# Get word-level suggestions (level specified per-check)
provider = SQLiteProvider(database_path="path/to/dictionary.db")
checker = SpellChecker(provider=provider)
result = checker.check(text, level=ValidationLevel.WORD)

Context Checking

Detects “real-word errors” where a word is spelled correctly but used incorrectly in context. Key capabilities:
  • Bigram probability analysis
  • Trigram context windows
  • Statistical language modeling
  • Real-word error detection
# Detects unnatural word combinations (e.g., "rice go" vs "eat rice")
config = SpellCheckerConfig(use_context_checker=True)

Advanced Features

POS Tagging

Part-of-Speech tagging with multiple backend options for different accuracy/speed trade-offs. Tagger options:
TypeAccuracySpeedDependencies
Rule-based~70%FastNone
Viterbi~85%MediumNone
Transformer~93%Slowtransformers, torch
from myspellchecker.core.config import POSTaggerConfig

config = SpellCheckerConfig(
    pos_tagger=POSTaggerConfig(tagger_type="transformer")
)

Grammar Checking

Rule-based syntactic validation using POS tags to detect grammatical errors. Key capabilities:
  • Particle usage validation
  • Verb-modifier agreement
  • Sentence structure checking
  • Custom grammar rule support
# Detects particle errors like မှာ vs မှ
config = SpellCheckerConfig(use_rule_based_validation=True)

Grammar Engine

Comprehensive syntactic rule checker coordinating eight specialized checkers. Key capabilities:
  • Particle typo detection
  • Medial confusion detection (ျ vs ြ)
  • POS sequence validation
  • Verb-particle agreement
  • Configurable confidence thresholds
from myspellchecker.grammar import SyntacticRuleChecker

checker = SyntacticRuleChecker(provider)
corrections = checker.check_sequence(["ကျွန်တော်", "ကျောင်း", "သွားတယ်"])

Semantic Checking

Deep learning-based context analysis using ONNX models for the highest accuracy. Key capabilities:
  • BERT/RoBERTa masked language modeling
  • Semantic context understanding
  • Confidence scoring
  • Quantized CPU inference
from myspellchecker.core.config import SpellCheckerConfig, SemanticConfig

# Enable AI-powered checking
config = SpellCheckerConfig(
    semantic=SemanticConfig(
        model_path="path/to/model.onnx",
        tokenizer_path="path/to/tokenizer"
    )
)

Performance Features

Batch Processing

Efficient processing of multiple texts with parallelization. Key capabilities:
  • Cython-optimized processing
  • OpenMP parallelization
  • Batch result aggregation
  • Memory-efficient streaming
# Process thousands of texts efficiently
results = checker.check_batch(texts)

Async API

Non-blocking async operations for web applications. Key capabilities:
  • Native async/await support
  • FastAPI/Starlette integration
  • Concurrent request handling
  • Async batch processing
# Non-blocking spell checking
result = await checker.check_async(text)
results = await checker.check_batch_async(texts)

Integration Features

Connection Pool

Thread-safe database connection management for high-concurrency scenarios. Key capabilities:
  • Configurable min/max pool size
  • Automatic connection health checks
  • Connection aging and recreation
  • Pool statistics and monitoring
from myspellchecker.providers.connection_pool import ConnectionPool
from myspellchecker.core.config import ConnectionPoolConfig

pool_config = ConnectionPoolConfig(min_size=2, max_size=10)
pool = ConnectionPool("/path/to/db.sqlite", pool_config=pool_config)
with pool.checkout() as conn:
    cursor = conn.cursor()

Segmenters

Multiple text segmentation strategies for Myanmar text. Segmenter types:
TypeDescriptionUse Case
DefaultSegmenterProduction segmenterGeneral use
RegexSegmenterRule-based syllablesLightweight
from myspellchecker.segmenters import DefaultSegmenter

segmenter = DefaultSegmenter(word_engine="myword")
syllables = segmenter.segment_syllables("မြန်မာစာ")

Homophones Detection

Detects sound-alike words that may be confused in context.
from myspellchecker.core.homophones import HomophoneChecker

checker = HomophoneChecker()
homophones = checker.get_homophones("ကျား")  # Returns set of homophones
has_match = len(checker.get_homophones("ကြား")) > 0  # Check if homophones exist

Colloquial Variant Handling

Detects colloquial (informal) spellings and suggests standard forms. Key capabilities:
  • Colloquial form detection
  • Standard form suggestion
  • Configurable strictness levels
from myspellchecker.text.phonetic_data import is_colloquial_variant, get_standard_forms

# Check if word is colloquial
is_colloquial_variant("ကျနော်")  # True

# Get standard form
get_standard_forms("ကျနော်")  # ["ကျွန်တော်"]
Configuration:
from myspellchecker.core.config.validation_configs import ValidationConfig

config = ValidationConfig(
    colloquial_strictness="lenient",  # "strict", "lenient", or "off"
    colloquial_info_confidence=0.3,
)
StrictnessBehavior
strictFlag all colloquial variants as errors
lenientAccept with informational note (default)
offNo special handling

Internationalization (i18n)

Localized error messages in English and Myanmar.
from myspellchecker.core.i18n import set_language, get_message

# Set language to Myanmar
set_language("my")

# Get localized message
get_message("invalid_syllable")
# Output: စာလုံးပေါင်း မမှန်ကန်ပါ
Supported languages: "en" (English), "my" (Myanmar)

Streaming API

Memory-efficient stream processing for large documents with progress callbacks. Key capabilities:
  • Generator-based synchronous streaming
  • Async iteration support
  • Progress callbacks and statistics
  • Memory limits with backpressure
  • Cross-sentence context validation
from myspellchecker.core.streaming import StreamingChecker

streaming = StreamingChecker(checker)
with open("large_file.txt") as f:
    for result in streaming.check_stream(f):
        if result.response.has_errors:
            process(result)

Custom Providers

Pluggable storage backends for different use cases.
from myspellchecker.providers import MemoryProvider, SQLiteProvider

# High-speed in-memory
checker = SpellChecker(provider=MemoryProvider())

# Disk-based for large dictionaries
checker = SpellChecker(provider=SQLiteProvider())

Feature Comparison by Use Case

Real-Time Typing

from myspellchecker import SpellChecker
from myspellchecker.providers import SQLiteProvider
from myspellchecker.core.constants import ValidationLevel

# Fastest: syllable-only validation (level specified per-check)
provider = SQLiteProvider(database_path="path/to/dictionary.db")
checker = SpellChecker(provider=provider)
result = checker.check(text, level=ValidationLevel.SYLLABLE)

Document Checking

from myspellchecker import SpellChecker
from myspellchecker.core.config import SpellCheckerConfig
from myspellchecker.providers import SQLiteProvider
from myspellchecker.core.constants import ValidationLevel

# Balanced: word + context
config = SpellCheckerConfig(
    use_context_checker=True,
    use_rule_based_validation=True,
)
provider = SQLiteProvider(database_path="path/to/dictionary.db")
checker = SpellChecker(config=config, provider=provider)
result = checker.check(text, level=ValidationLevel.WORD)

Quality Assurance

# Thorough: full validation with AI (requires SQLiteProvider from above)
config = SpellCheckerConfig(
    use_context_checker=True,
    semantic=SemanticConfig(model_path="..."),
)
provider = SQLiteProvider(database_path="path/to/dictionary.db")
checker = SpellChecker(config=config, provider=provider)
# Use word-level validation for thorough checking
result = checker.check(text, level=ValidationLevel.WORD, use_semantic=True)

High-Volume Processing

# Optimized for throughput
from myspellchecker import SpellChecker
from myspellchecker.providers import SQLiteProvider
from myspellchecker.core.config import SpellCheckerConfig

config = SpellCheckerConfig(use_context_checker=False)  # Faster
provider = SQLiteProvider(pool_max_size=10)
checker = SpellChecker(config=config, provider=provider)
results = checker.check_batch(texts)

Feature Dependencies

  +------------------------+
  | Syllable Validation    |  [core]
  +-----------+------------+
              |
              v
  +------------------------+
  | Word Validation        |  [core]
  +-----------+------------+
              |
         +----+----+
         |         |
         v         v
  +-------------+  +------------------+
  | Context     |  | Grammar          |  [advanced]
  | Checking    |  | Checking         |
  +------+------+  +--------+---------+
         |                  |
         v                  v
  +-------------+  +------------------+
  | Semantic    |  | POS Tagging      |  [advanced]
  | Checking    |  |                  |
  +-------------+  +------------------+
    [ai]
Legend:
  • Green: Core features (always available)
  • Blue: Advanced features (optional)
  • Purple: AI features (requires extra dependencies)

Text Processing Features

Named Entity Recognition

Identifies names, locations, and organizations to reduce false positives. Key capabilities:
  • Heuristic-based NER (fast, ~70% accuracy)
  • Transformer-based NER (~93% accuracy)
  • Hybrid mode with automatic fallback
  • Entity filtering for spell checking
from myspellchecker.text.ner_model import NERConfig

config = SpellCheckerConfig(
    ner=NERConfig(enabled=True, model_type="heuristic")
)

Morphology Analysis

Word structure analysis for POS inference and OOV recovery. Key capabilities:
  • Suffix-based POS guessing
  • Word decomposition (root + suffixes)
  • Multi-POS support for ambiguous words
  • Numeral detection
  • Productive reduplication validation (AA, AABB, ABAB patterns)
  • Compound word synthesis (DP-based splitting into known morphemes)
  • Morpheme-level suggestions (correct typos inside compounds)
from myspellchecker.text.morphology import MorphologyAnalyzer
from myspellchecker.text.reduplication import ReduplicationEngine
from myspellchecker.text.compound_resolver import CompoundResolver

# OOV analysis (existing)
analyzer = MorphologyAnalyzer()
result = analyzer.analyze_word("စားခဲ့သည်")
print(result.root)      # "စား"
print(result.suffixes)  # ["ခဲ့", "သည်"]

# Reduplication validationengine = ReduplicationEngine(segmenter=segmenter)
result = engine.analyze("ကောင်းကောင်း", dict_check, freq_check, pos_check)
# Valid AA reduplication of "ကောင်း"

# Compound synthesisresolver = CompoundResolver(segmenter=segmenter)
result = resolver.resolve("ကျောင်းသား", dict_check, freq_check, pos_check)
# Valid N+N compound: ["ကျောင်း", "သား"]

Text Utilities

Specialized utilities for Myanmar text processing. Key capabilities:
  • Stemmer: Rule-based suffix stripping with caching
  • Phonetic Hasher: Sound-based fuzzy matching
  • Tone Disambiguator: Context-based tone resolution
  • Zawgyi Detection: Legacy encoding detection
from myspellchecker.text.stemmer import Stemmer
from myspellchecker.text.phonetic import PhoneticHasher

stemmer = Stemmer()
hasher = PhoneticHasher()

Grammar Features

Suggestion Ranking

Multi-factor ranking system for spelling suggestions. Ranker types:
RankerPrimary FactorUse Case
DefaultRankerEdit distance + frequencyGeneral use
FrequencyFirstRankerCorpus frequencyAutocomplete
PhoneticFirstRankerPhonetic similarityMyanmar text
UnifiedRankerMulti-sourceComprehensive
from myspellchecker.algorithms.ranker import FrequencyFirstRanker

ranker = FrequencyFirstRanker()
symspell = SymSpell(provider, ranker=ranker)

Grammar Checkers

Eight specialized checkers for Myanmar grammar validation.
CheckerPurpose
AspectCheckerVerb aspect markers
ClassifierCheckerNumeral classifiers
CompoundCheckerCompound words
MergedWordCheckerMerged particle+verb detection
NegationCheckerNegation patterns
ParticleCheckerParticle context validation
TenseAgreementCheckerTense-time agreement
RegisterCheckerFormal/colloquial register
from myspellchecker.grammar.checkers.aspect import AspectChecker
from myspellchecker.grammar.checkers.register import RegisterChecker

aspect_checker = AspectChecker()
register_checker = RegisterChecker()

Text Normalization

Unified normalization service for consistent text processing. Key capabilities:
  • Purpose-specific normalization methods
  • Zawgyi detection and conversion
  • Unicode NFC normalization
  • Myanmar diacritic reordering
from myspellchecker.text.normalization_service import get_normalization_service

service = get_normalization_service()
normalized = service.for_spell_checking(text)

Validation Strategies

Strategy-based validation pipeline for composable error detection. Strategies (by priority):
StrategyPriorityPurpose
ToneValidation10Tone mark disambiguation
Orthography15Orthographic error detection
SyntacticRule20Grammar rule checking
BrokenCompound25Broken compound detection
POSSequence30POS sequence validation
Question40Question structure
Homophone45Sound-alike detection
ConfusableSemantic48AI confusable detection (opt-in)
NgramContext50N-gram probability
Semantic70AI-powered validation (opt-in)

Architecture

Dependency Injection

Lightweight DI system for component management. Key components:
  • ServiceContainer for lazy initialization
  • Factory functions for component creation
  • Singleton and transient service support
  • Thread-safe service resolution

Reference

Rules System

YAML configuration files for linguistic rules. Key files:
  • particles.yaml - 91 linguistic particles
  • typo_corrections.yaml - Common typo patterns
  • morphology.yaml - Suffix/prefix patterns
  • morphotactics.yaml - Compound word POS pattern rules
  • aspects.yaml - Verb aspect markers
  • classifiers.yaml - Numeral classifiers
  • register.yaml - Formal/colloquial mappings

Guides

Configuration Guide

Comprehensive configuration options. Topics:
  • SpellCheckerConfig and nested configs
  • Pre-defined configuration profiles
  • Loading from files and environment

Logging Guide

Centralized logging system. Features:
  • Development and production modes
  • JSON structured logging
  • Module-specific log levels
  • get_logger() for consistent naming

Training Features

Training Pipeline

End-to-end pipeline for training custom semantic models. Pipeline stages:
  1. Tokenizer Training (Byte-Level BPE)
  2. Model Training (RoBERTa/BERT MLM)
  3. ONNX Export (quantized)
from myspellchecker.training import TrainingPipeline, TrainingConfig

config = TrainingConfig(
    input_file="corpus.txt",
    output_dir="./models/",
    architecture="roberta",
    epochs=5,
)
pipeline = TrainingPipeline()
model_path = pipeline.run(config)

Text Validation

Comprehensive Myanmar text quality validation with 30+ validation categories. Key capabilities:
  • Structural validation (syllable structure, encoding)
  • Zawgyi artifact detection
  • Quality filtering (fragments, truncation)
  • Known invalid word detection
from myspellchecker.text.validator import validate_word

is_valid = validate_word("ကျောင်း")
if is_valid:
    print("Word is valid")

Next Steps