# mySpellChecker ## Docs - [Context-Aware & Grammar Validation](https://docs.myspellchecker.com/algorithms/context-aware.md): Syntactic grammar checking and N-gram probability analysis for detecting real-word errors in Myanmar text. - [Edit Distance Algorithms](https://docs.myspellchecker.com/algorithms/edit-distance.md): This document describes the edit distance algorithms used in mySpellChecker for generating spelling suggestions. - [Syntactic Grammar Validation](https://docs.myspellchecker.com/algorithms/grammar-rules.md): The Syntactic Rule Checker applies POS-based linguistic rules to catch grammatical errors that statistical models miss. - [Overview](https://docs.myspellchecker.com/algorithms/index.md): Core algorithms and implementations powering mySpellChecker. - [Joint Segment Tagger](https://docs.myspellchecker.com/algorithms/joint-segment-tagger.md): Unified Viterbi algorithm that performs word segmentation and POS tagging simultaneously for better accuracy. - [Morpheme & Medial Suggestions](https://docs.myspellchecker.com/algorithms/morpheme-suggestion.md): Morpheme-level correction for compound word typos and medial consonant swap candidate generation for Myanmar's most common error type. - [Named Entity Recognition (NER)](https://docs.myspellchecker.com/algorithms/ner.md): Heuristic-based Named Entity Recognition to detect proper nouns and reduce false positives on valid names. - [Neural Reranker](https://docs.myspellchecker.com/algorithms/neural-reranker.md): ONNX-based MLP and tree-based suggestion reranker that scores spelling correction candidates using 19 extracted features for optimal ranking. - [N-gram Algorithm](https://docs.myspellchecker.com/algorithms/ngram.md): The N-gram algorithm provides context-aware spell checking by analyzing word sequence probabilities. - [Text Normalization](https://docs.myspellchecker.com/algorithms/normalization.md): Normalization pipeline for Myanmar text covering Zawgyi conversion, Unicode NFC, zero-width removal, and diacritic reordering. - [Phonetic Matching](https://docs.myspellchecker.com/algorithms/phonetic.md): Myanmar is a phonetic language, and many spelling errors are due to words that sound the same but are spelled differently. - [POS Disambiguator](https://docs.myspellchecker.com/algorithms/pos-disambiguator.md): The POS Disambiguator resolves ambiguous Part-of-Speech tags for Myanmar words using five context-based linguistic rules (R1-R5). - [Segmentation](https://docs.myspellchecker.com/algorithms/segmentation.md): Myanmar text does not use whitespace to separate words. Therefore, segmentation (breaking text into units) is the first and most critical step in the pipeline. - [Semantic Validation (MLM)](https://docs.myspellchecker.com/algorithms/semantic.md): Deep learning validation using Masked Language Models to detect semantic errors that N-gram methods miss. - [Suggestion Ranking](https://docs.myspellchecker.com/algorithms/suggestion-ranking.md): Multi-factor scoring system that ranks spelling corrections by edit distance, frequency, phonetic similarity, and source confidence. - [Suggestion Strategy](https://docs.myspellchecker.com/algorithms/suggestion-strategy.md): Pluggable interface for implementing and composing different spelling suggestion algorithms at runtime. - [Syllable Segmentation Algorithm](https://docs.myspellchecker.com/algorithms/syllable-segmentation.md): How RegexSegmenter splits continuous Myanmar text into syllables using three complementary regex patterns. - [SymSpell Algorithm](https://docs.myspellchecker.com/algorithms/symspell.md): How the SymSpell symmetric delete algorithm provides O(1) spelling correction for Myanmar text. - [Tone Disambiguation](https://docs.myspellchecker.com/algorithms/tone-disambiguation.md): Context-aware tone mark disambiguation for Myanmar words where tone marks change meaning entirely. - [Viterbi Algorithm](https://docs.myspellchecker.com/algorithms/viterbi.md): The Viterbi algorithm provides efficient POS tagging and word segmentation using Hidden Markov Models (HMM). - [Overview](https://docs.myspellchecker.com/api-reference/index.md): Complete API documentation for mySpellChecker. - [Provider Capabilities Matrix](https://docs.myspellchecker.com/api-reference/provider-capabilities.md): This document describes the capabilities of each DictionaryProvider implementation. - [Tokenizers API](https://docs.myspellchecker.com/api-reference/tokenizers.md): Low-level text splitting utilities for Myanmar text, including syllable, word, and transformer-based tokenizers. - [Component Diagram](https://docs.myspellchecker.com/architecture/component-diagram.md): Visual representation of mySpellChecker's component architecture. - [Data Flow](https://docs.myspellchecker.com/architecture/data-flow.md): How data flows through mySpellChecker during spell checking operations. - [Dependency Injection](https://docs.myspellchecker.com/architecture/dependency-injection.md): Lightweight dependency injection system for managing component lifecycles, enabling loose coupling and easier testing. - [Extension Points](https://docs.myspellchecker.com/architecture/extension-points.md): Guide to customizing and extending mySpellChecker. - [Overview](https://docs.myspellchecker.com/architecture/index.md): Multi-layer validation architecture with a 12-strategy pipeline, from deterministic syllable rules to ONNX-powered AI inference. - [System Design](https://docs.myspellchecker.com/architecture/system-design.md): This document details the component architecture and design decisions in mySpellChecker. - [Validation Pipeline](https://docs.myspellchecker.com/architecture/validation-pipeline.md): The validation pipeline is the core of mySpellChecker, implementing a multi-layer approach that progressively validates text from syllables to context. - [Overview](https://docs.myspellchecker.com/cli/index.md): Command reference for spell checking, dictionary building, model training, and text segmentation via the CLI. - [Configuration](https://docs.myspellchecker.com/core/configuration.md): Configure SpellChecker behavior with SpellCheckerConfig, including presets, thresholds, and algorithm settings. - [Overview](https://docs.myspellchecker.com/core/data-pipeline.md): Subsystem for ingesting raw Myanmar text corpora and building the optimized SQLite dictionary database. - [Overview](https://docs.myspellchecker.com/core/index.md): Technical documentation for mySpellChecker's core modules. - [SpellChecker API](https://docs.myspellchecker.com/core/spellchecker.md): The SpellChecker class is the main entry point for the library. - [Syllable Validation](https://docs.myspellchecker.com/core/syllable-validation.md): The syllable validation system forms the foundational layer of mySpellChecker's progressive validation pipeline. - [Semantic Training Pipeline](https://docs.myspellchecker.com/core/training.md): Train custom deep learning models for semantic spell checking using the built-in tokenizer, MLM, and ONNX export pipeline. - [Word Validation](https://docs.myspellchecker.com/core/word-validation.md): This document describes the word validation system in mySpellChecker, which is Layer 2 of the validation pipeline. - [Building Dictionaries](https://docs.myspellchecker.com/data-pipeline/building.md): Complete guide to building spell checking dictionaries using the CLI and Python API, with POS tagging, curated lexicons, and incremental updates. - [Corpus Format Specification](https://docs.myspellchecker.com/data-pipeline/corpus-format.md): Input format specifications for the data pipeline: TXT, CSV, TSV, JSON, JSONL, and Parquet with encoding and structure requirements. - [Database Schema](https://docs.myspellchecker.com/data-pipeline/database-schema.md): This document describes the SQLite database schema used by mySpellChecker. - [Overview](https://docs.myspellchecker.com/data-pipeline/index.md): Build optimized spell checking dictionaries from text corpora using the data pipeline CLI or Python API. - [Ingestion Stage](https://docs.myspellchecker.com/data-pipeline/ingestion.md): The ingestion stage reads and parses input corpus files into a standardized format for processing. - [Pipeline Optimization](https://docs.myspellchecker.com/data-pipeline/optimization.md): Speed up dictionary building with DuckDB acceleration (3-15x), Cython parallelization, and memory tuning. - [Pipeline Reporter](https://docs.myspellchecker.com/data-pipeline/pipeline-reporter.md): Abstraction layer for reporting progress during data pipeline execution with console, logging, and test support. - [POS Inference Manager](https://docs.myspellchecker.com/data-pipeline/pos-inference.md): The POS Inference Manager applies rule-based POS inference to words during database building, increasing POS tag coverage beyond the seed data. - [Processing Stage](https://docs.myspellchecker.com/data-pipeline/processing.md): The processing stage segments text into syllables and words, preparing data for frequency analysis. - [Dictionary Providers](https://docs.myspellchecker.com/data-pipeline/providers.md): DictionaryProvider implementations for SQLite, memory, JSON, and CSV storage backends with usage examples. - [Schema Management](https://docs.myspellchecker.com/data-pipeline/schema-management.md): Centralized database schema definitions and management for SQLite table creation, indexing, and migrations. - [Segmentation Repair](https://docs.myspellchecker.com/data-pipeline/segmentation-repair.md): The Segmentation Repair module fixes incorrectly segmented Myanmar words by merging broken syllables that were split across word boundaries during tokenization. - [Benchmark Suite](https://docs.myspellchecker.com/development/benchmarks.md): 1,138-sentence accuracy benchmark with per-tier evaluation, composite scoring, and ablation utilities. - [Contributing Guide](https://docs.myspellchecker.com/development/contributing.md): Thank you for your interest in contributing to mySpellChecker! This guide will help you get started. - [Cython Development Guide](https://docs.myspellchecker.com/development/cython-guide.md): This guide covers how to work with the Cython (.pyx) modules in mySpellChecker. - [Development](https://docs.myspellchecker.com/development/index.md): Guide for developers working on mySpellChecker. - [Naming Conventions](https://docs.myspellchecker.com/development/naming-conventions.md): This document establishes naming conventions for the myspellchecker codebase to ensure consistency and clarity. - [Development Setup](https://docs.myspellchecker.com/development/setup.md): This guide helps you set up a development environment for contributing to mySpellChecker. - [Testing Guide](https://docs.myspellchecker.com/development/testing.md): This guide covers how to run and write tests for mySpellChecker. - [Async API](https://docs.myspellchecker.com/features/async-api.md): Non-blocking spell checking with check_async() and check_batch_async() for web frameworks and concurrent workloads. - [Batch Processing](https://docs.myspellchecker.com/features/batch-processing.md): Check multiple texts at once with check_batch(), parallelization via thread/process pools, and Cython acceleration. - [Compound Resolution & Reduplication](https://docs.myspellchecker.com/features/compound-resolution.md): DP-based compound word resolution and productive reduplication validation for handling OOV Myanmar words formed through compounding and repetition patterns. - [Confusable Detection](https://docs.myspellchecker.com/features/confusable-detection.md): Multi-layer confusable word detection using statistical bigram analysis, MLP classification, and MLM-powered semantic inference to catch valid-word substitution errors. - [Context Checking](https://docs.myspellchecker.com/features/context-checking.md): Context checking is the third validation layer that detects real-word errors, which are words spelled correctly but used incorrectly in context. - [Grammar Checkers](https://docs.myspellchecker.com/features/grammar-checkers.md): Eight specialized checkers for aspect markers, numeral classifiers, compound words, merged words, negation patterns, particle context, tense agreement, and register consistency. - [Overview](https://docs.myspellchecker.com/features/grammar-checking.md): Rule-based grammar checking using POS tags to catch syntactic errors like wrong particles, verb-modifier mismatches, and incomplete sentences. - [Grammar Engine](https://docs.myspellchecker.com/features/grammar-engine.md): Syntactic rule-based spell checking using POS tags, operating at Layer 2.5 of the validation pipeline to catch errors that N-gram models miss. - [Homophones Detection](https://docs.myspellchecker.com/features/homophones.md): mySpellChecker includes a homophone checker to detect "Real-Word Errors" - words that are spelled correctly but confused with similar-sounding words. - [Overview](https://docs.myspellchecker.com/features/index.md): mySpellChecker provides a 12-strategy text checking pipeline for Myanmar, from syllable rules through grammar checking to AI-powered inference. - [Loan Word Variants](https://docs.myspellchecker.com/features/loan-words.md): Bidirectional lookup for Myanmar loan word transliteration variants from English, Pali/Sanskrit, and other languages. - [Morphology Analysis](https://docs.myspellchecker.com/features/morphology.md): The morphology module provides word structure analysis for Myanmar text, enabling POS inference for out-of-vocabulary (OOV) words and word decomposition. - [Named Entity Recognition (NER)](https://docs.myspellchecker.com/features/ner.md): mySpellChecker includes a Named Entity Recognition (NER) module to reduce false positives by identifying names, locations, and organizations in Myanmar text. - [Text Normalization](https://docs.myspellchecker.com/features/normalization.md): mySpellChecker provides a unified NormalizationService that consolidates all text normalization logic into a single, consistent interface. - [POS Tagging System](https://docs.myspellchecker.com/features/pos-tagging.md): Pluggable POS tagging with rule-based inference, transformer models, and Viterbi HMM for Myanmar text analysis. - [Segmenters](https://docs.myspellchecker.com/features/segmenters.md): Two-level text segmentation for Myanmar: rule-based syllable splitting and dictionary-backed word tokenization with myword, CRF, or transformer engines. - [Semantic Checking](https://docs.myspellchecker.com/features/semantic-checking.md): Optional MLM-based deep context analysis that masks each word and predicts alternatives, providing the highest accuracy for detecting real-word errors. - [Streaming API](https://docs.myspellchecker.com/features/streaming.md): Memory-efficient streaming spell checking for large files using generators, async iterators, progress callbacks, and backpressure control. - [Syllable Validation](https://docs.myspellchecker.com/features/syllable-validation.md): Layer 1 validation that checks Myanmar syllable structure using deterministic rules before word-level analysis. - [Text Utilities](https://docs.myspellchecker.com/features/text-utilities.md): Stemmer, phonetic matching, and encoding utilities for Myanmar text processing challenges. - [Text Validation](https://docs.myspellchecker.com/features/text-validation.md): Myanmar text quality validation with 30+ categories for structural issues, encoding problems, and Zawgyi artifacts. - [Validation Strategies](https://docs.myspellchecker.com/features/validation-strategies.md): Strategy-based validation pipeline where each strategy handles a specific concern, executed in priority order from tone checking to AI-powered analysis. - [Word Validation](https://docs.myspellchecker.com/features/word-validation.md): Layer 2 validation that verifies valid syllables form recognized words and provides intelligent correction suggestions via SymSpell. - [Algorithm Factory](https://docs.myspellchecker.com/guides/algorithm-factory.md): Centralized creation and caching of SymSpell and semantic checker algorithm instances. - [Caching Guide](https://docs.myspellchecker.com/guides/caching.md): Configure LRU cache sizes for syllable, word, frequency, and N-gram lookups to optimize spell checking performance. - [CLI Formatting](https://docs.myspellchecker.com/guides/cli-formatting.md): Rich-based terminal formatting internals: themes, error tables, stats panels, and Myanmar text display for the CLI. - [Configuration Guide](https://docs.myspellchecker.com/guides/configuration.md): mySpellChecker provides extensive configuration options to customize behavior for your specific use case. - [Connection Pool](https://docs.myspellchecker.com/guides/connection-pool.md): Thread-safe SQLite connection pooling with auto-scaling, health checks, and connection aging for multi-threaded applications. - [Custom Dictionaries](https://docs.myspellchecker.com/guides/custom-dictionaries.md): Build domain-specific dictionaries from text corpora, curated lexicons, CSV, or JSON using the data pipeline. - [Custom Grammar Rules Guide](https://docs.myspellchecker.com/guides/custom-grammar-rules.md): Create and customize YAML-based grammar rules for syntactic validation including POS sequences, particles, and register checks. - [Overview](https://docs.myspellchecker.com/guides/customization.md): Extend mySpellChecker with custom segmenters, dictionary providers, validation strategies, and configuration patterns. - [Cython Guide](https://docs.myspellchecker.com/guides/cython.md): This guide covers building, using, and developing the Cython extensions that provide 2-20x performance improvements for mySpellChecker. - [Docker Guide](https://docs.myspellchecker.com/guides/docker.md): Run mySpellChecker in Docker with multi-stage builds, docker-compose profiles, GPU support, and production deployment. - [Installation Guide](https://docs.myspellchecker.com/guides/installation.md): This guide covers all installation methods for mySpellChecker, from basic pip installation to development setup with all optional features. - [Integration & Examples](https://docs.myspellchecker.com/guides/integration.md): Integrate mySpellChecker with FastAPI, including REST endpoints, WebSocket, deployment patterns, and caching examples. - [I/O Utilities](https://docs.myspellchecker.com/guides/io-utilities.md): File handling and system check functions for the data pipeline, including disk space verification. - [Logging Guide](https://docs.myspellchecker.com/guides/logging.md): Configure centralized logging with debug mode, JSON output for log aggregators, and per-module log levels. - [Performance Tuning](https://docs.myspellchecker.com/guides/performance-tuning.md): Key configuration choices that impact spell checking speed, memory usage, and throughput. - [Quick Start Guide](https://docs.myspellchecker.com/guides/quickstart.md): Get up and running with mySpellChecker in 5 minutes. This guide covers the essential concepts and most common use cases. - [Resource Caching](https://docs.myspellchecker.com/guides/resource-caching.md): Automatic downloading and caching of word segmentation resources from HuggingFace for the myword and CRF engines. - [Training Custom Models](https://docs.myspellchecker.com/guides/training.md): Train custom semantic and neural reranker models for Myanmar text using mySpellChecker's training pipelines. - [Zawgyi Support](https://docs.myspellchecker.com/guides/zawgyi-support.md): Detect and convert legacy Zawgyi-encoded Myanmar text to Unicode using Google's myanmartools detector and python-myanmar converter. - [Overview](https://docs.myspellchecker.com/introduction.md): Myanmar (Burmese) text intelligence library with a 12-strategy checking pipeline, dictionary building, and AI model training, from O(1) SymSpell lookups to ONNX-powered inference. - [Myanmar Text Intelligence Landscape](https://docs.myspellchecker.com/reference/comparisons.md): mySpellChecker is the first production-grade, pip-installable text intelligence library built specifically for Myanmar language. - [Constants Reference](https://docs.myspellchecker.com/reference/constants.md): Myanmar Unicode constants and character sets used in mySpellChecker. - [Error Codes Reference](https://docs.myspellchecker.com/reference/error-codes.md): Complete reference of error codes returned by mySpellChecker. - [Error Types Reference](https://docs.myspellchecker.com/reference/error-types.md): mySpellChecker uses a hierarchy of error types for spell checking results and exceptions for system errors. - [Frequently Asked Questions](https://docs.myspellchecker.com/reference/faq.md): Common questions about installation, usage, performance, accuracy, dictionary building, and integration. - [Glossary](https://docs.myspellchecker.com/reference/glossary.md): Terms and definitions used throughout mySpellChecker documentation, covering Myanmar script, validation pipeline, algorithms, and tooling. - [Overview](https://docs.myspellchecker.com/reference/index.md): Technical reference documentation for mySpellChecker. - [Phonetic Data](https://docs.myspellchecker.com/reference/phonetic-data.md): Static data structures for phonetic hashing, including similarity groups, visual confusability, and tonal variants. - [Rules System](https://docs.myspellchecker.com/reference/rules-system.md): YAML-based linguistic rule configuration for particles, grammar, morphology, classifiers, and tone marks. - [Troubleshooting Guide](https://docs.myspellchecker.com/reference/troubleshooting.md): Solutions for installation failures, database issues, performance problems, validation errors, and encoding quirks. ## OpenAPI Specs - [openapi](https://docs.myspellchecker.com/api-reference/openapi.json)