# mySpellChecker

## Docs

- [Context-Aware & Grammar Validation](https://docs.myspellchecker.com/algorithms/context-aware.md): Syntactic grammar checking and N-gram probability analysis for detecting real-word errors in Myanmar text.
- [Edit Distance Algorithms](https://docs.myspellchecker.com/algorithms/edit-distance.md): This document describes the edit distance algorithms used in mySpellChecker for generating spelling suggestions.
- [Syntactic Grammar Validation](https://docs.myspellchecker.com/algorithms/grammar-rules.md): The Syntactic Rule Checker applies POS-based linguistic rules to catch grammatical errors that statistical models miss.
- [Overview](https://docs.myspellchecker.com/algorithms/index.md): Core algorithms and implementations powering mySpellChecker.
- [Joint Segment Tagger](https://docs.myspellchecker.com/algorithms/joint-segment-tagger.md): Unified Viterbi algorithm that performs word segmentation and POS tagging simultaneously for better accuracy.
- [Morpheme & Medial Suggestions](https://docs.myspellchecker.com/algorithms/morpheme-suggestion.md): Morpheme-level correction for compound word typos and medial consonant swap candidate generation for Myanmar's most common error type.
- [Named Entity Recognition (NER)](https://docs.myspellchecker.com/algorithms/ner.md): Heuristic-based Named Entity Recognition to detect proper nouns and reduce false positives on valid names.
- [Neural Reranker](https://docs.myspellchecker.com/algorithms/neural-reranker.md): ONNX-based MLP and tree-based suggestion reranker that scores spelling correction candidates using 19 extracted features for optimal ranking.
- [N-gram Algorithm](https://docs.myspellchecker.com/algorithms/ngram.md): The N-gram algorithm provides context-aware spell checking by analyzing word sequence probabilities.
- [Text Normalization](https://docs.myspellchecker.com/algorithms/normalization.md): Normalization pipeline for Myanmar text covering Zawgyi conversion, Unicode NFC, zero-width removal, and diacritic reordering.
- [Phonetic Matching](https://docs.myspellchecker.com/algorithms/phonetic.md): Myanmar is a phonetic language, and many spelling errors are due to words that sound the same but are spelled differently.
- [POS Disambiguator](https://docs.myspellchecker.com/algorithms/pos-disambiguator.md): The POS Disambiguator resolves ambiguous Part-of-Speech tags for Myanmar words using five context-based linguistic rules (R1-R5).
- [Segmentation](https://docs.myspellchecker.com/algorithms/segmentation.md): Myanmar text does not use whitespace to separate words. Therefore, segmentation (breaking text into units) is the first and most critical step in the pipeline.
- [Semantic Validation (MLM)](https://docs.myspellchecker.com/algorithms/semantic.md): Deep learning validation using Masked Language Models to detect semantic errors that N-gram methods miss.
- [Suggestion Ranking](https://docs.myspellchecker.com/algorithms/suggestion-ranking.md): Multi-factor scoring system that ranks spelling corrections by edit distance, frequency, phonetic similarity, and source confidence.
- [Suggestion Strategy](https://docs.myspellchecker.com/algorithms/suggestion-strategy.md): Pluggable interface for implementing and composing different spelling suggestion algorithms at runtime.
- [Syllable Segmentation Algorithm](https://docs.myspellchecker.com/algorithms/syllable-segmentation.md): How RegexSegmenter splits continuous Myanmar text into syllables using three complementary regex patterns.
- [SymSpell Algorithm](https://docs.myspellchecker.com/algorithms/symspell.md): How the SymSpell symmetric delete algorithm provides O(1) spelling correction for Myanmar text.
- [Tone Disambiguation](https://docs.myspellchecker.com/algorithms/tone-disambiguation.md): Context-aware tone mark disambiguation for Myanmar words where tone marks change meaning entirely.
- [Viterbi Algorithm](https://docs.myspellchecker.com/algorithms/viterbi.md): The Viterbi algorithm provides efficient POS tagging and word segmentation using Hidden Markov Models (HMM).
- [Overview](https://docs.myspellchecker.com/api-reference/index.md): Complete API documentation for mySpellChecker.
- [Provider Capabilities Matrix](https://docs.myspellchecker.com/api-reference/provider-capabilities.md): This document describes the capabilities of each DictionaryProvider implementation.
- [Tokenizers API](https://docs.myspellchecker.com/api-reference/tokenizers.md): Low-level text splitting utilities for Myanmar text, including syllable, word, and transformer-based tokenizers.
- [Component Diagram](https://docs.myspellchecker.com/architecture/component-diagram.md): Visual representation of mySpellChecker's component architecture.
- [Data Flow](https://docs.myspellchecker.com/architecture/data-flow.md): How data flows through mySpellChecker during spell checking operations.
- [Dependency Injection](https://docs.myspellchecker.com/architecture/dependency-injection.md): Lightweight dependency injection system for managing component lifecycles, enabling loose coupling and easier testing.
- [Extension Points](https://docs.myspellchecker.com/architecture/extension-points.md): Guide to customizing and extending mySpellChecker.
- [Overview](https://docs.myspellchecker.com/architecture/index.md): Multi-layer validation architecture with a 12-strategy pipeline, from deterministic syllable rules to ONNX-powered AI inference.
- [System Design](https://docs.myspellchecker.com/architecture/system-design.md): This document details the component architecture and design decisions in mySpellChecker.
- [Validation Pipeline](https://docs.myspellchecker.com/architecture/validation-pipeline.md): The validation pipeline is the core of mySpellChecker, implementing a multi-layer approach that progressively validates text from syllables to context.
- [Overview](https://docs.myspellchecker.com/cli/index.md): Command reference for spell checking, dictionary building, model training, and text segmentation via the CLI.
- [Configuration](https://docs.myspellchecker.com/core/configuration.md): Configure SpellChecker behavior with SpellCheckerConfig, including presets, thresholds, and algorithm settings.
- [Overview](https://docs.myspellchecker.com/core/data-pipeline.md): Subsystem for ingesting raw Myanmar text corpora and building the optimized SQLite dictionary database.
- [Overview](https://docs.myspellchecker.com/core/index.md): Technical documentation for mySpellChecker's core modules.
- [SpellChecker API](https://docs.myspellchecker.com/core/spellchecker.md): The SpellChecker class is the main entry point for the library.
- [Syllable Validation](https://docs.myspellchecker.com/core/syllable-validation.md): The syllable validation system forms the foundational layer of mySpellChecker's progressive validation pipeline.
- [Semantic Training Pipeline](https://docs.myspellchecker.com/core/training.md): Train custom deep learning models for semantic spell checking using the built-in tokenizer, MLM, and ONNX export pipeline.
- [Word Validation](https://docs.myspellchecker.com/core/word-validation.md): This document describes the word validation system in mySpellChecker, which is Layer 2 of the validation pipeline.
- [Building Dictionaries](https://docs.myspellchecker.com/data-pipeline/building.md): Complete guide to building spell checking dictionaries using the CLI and Python API, with POS tagging, curated lexicons, and incremental updates.
- [Corpus Format Specification](https://docs.myspellchecker.com/data-pipeline/corpus-format.md): Input format specifications for the data pipeline: TXT, CSV, TSV, JSON, JSONL, and Parquet with encoding and structure requirements.
- [Database Schema](https://docs.myspellchecker.com/data-pipeline/database-schema.md): This document describes the SQLite database schema used by mySpellChecker.
- [Overview](https://docs.myspellchecker.com/data-pipeline/index.md): Build optimized spell checking dictionaries from text corpora using the data pipeline CLI or Python API.
- [Ingestion Stage](https://docs.myspellchecker.com/data-pipeline/ingestion.md): The ingestion stage reads and parses input corpus files into a standardized format for processing.
- [Pipeline Optimization](https://docs.myspellchecker.com/data-pipeline/optimization.md): Speed up dictionary building with DuckDB acceleration (3-15x), Cython parallelization, and memory tuning.
- [Pipeline Reporter](https://docs.myspellchecker.com/data-pipeline/pipeline-reporter.md): Abstraction layer for reporting progress during data pipeline execution with console, logging, and test support.
- [POS Inference Manager](https://docs.myspellchecker.com/data-pipeline/pos-inference.md): The POS Inference Manager applies rule-based POS inference to words during database building, increasing POS tag coverage beyond the seed data.
- [Processing Stage](https://docs.myspellchecker.com/data-pipeline/processing.md): The processing stage segments text into syllables and words, preparing data for frequency analysis.
- [Dictionary Providers](https://docs.myspellchecker.com/data-pipeline/providers.md): DictionaryProvider implementations for SQLite, memory, JSON, and CSV storage backends with usage examples.
- [Schema Management](https://docs.myspellchecker.com/data-pipeline/schema-management.md): Centralized database schema definitions and management for SQLite table creation, indexing, and migrations.
- [Segmentation Repair](https://docs.myspellchecker.com/data-pipeline/segmentation-repair.md): The Segmentation Repair module fixes incorrectly segmented Myanmar words by merging broken syllables that were split across word boundaries during tokenization.
- [Benchmark Suite](https://docs.myspellchecker.com/development/benchmarks.md): 1,138-sentence accuracy benchmark with per-tier evaluation, composite scoring, and ablation utilities.
- [Contributing Guide](https://docs.myspellchecker.com/development/contributing.md): Thank you for your interest in contributing to mySpellChecker! This guide will help you get started.
- [Cython Development Guide](https://docs.myspellchecker.com/development/cython-guide.md): This guide covers how to work with the Cython (.pyx) modules in mySpellChecker.
- [Development](https://docs.myspellchecker.com/development/index.md): Guide for developers working on mySpellChecker.
- [Naming Conventions](https://docs.myspellchecker.com/development/naming-conventions.md): This document establishes naming conventions for the myspellchecker codebase to ensure consistency and clarity.
- [Development Setup](https://docs.myspellchecker.com/development/setup.md): This guide helps you set up a development environment for contributing to mySpellChecker.
- [Testing Guide](https://docs.myspellchecker.com/development/testing.md): This guide covers how to run and write tests for mySpellChecker.
- [Async API](https://docs.myspellchecker.com/features/async-api.md): Non-blocking spell checking with check_async() and check_batch_async() for web frameworks and concurrent workloads.
- [Batch Processing](https://docs.myspellchecker.com/features/batch-processing.md): Check multiple texts at once with check_batch(), parallelization via thread/process pools, and Cython acceleration.
- [Compound Resolution & Reduplication](https://docs.myspellchecker.com/features/compound-resolution.md): DP-based compound word resolution and productive reduplication validation for handling OOV Myanmar words formed through compounding and repetition patterns.
- [Confusable Detection](https://docs.myspellchecker.com/features/confusable-detection.md): Multi-layer confusable word detection using statistical bigram analysis, MLP classification, and MLM-powered semantic inference to catch valid-word substitution errors.
- [Context Checking](https://docs.myspellchecker.com/features/context-checking.md): Context checking is the third validation layer that detects real-word errors, which are words spelled correctly but used incorrectly in context.
- [Grammar Checkers](https://docs.myspellchecker.com/features/grammar-checkers.md): Eight specialized checkers for aspect markers, numeral classifiers, compound words, merged words, negation patterns, particle context, tense agreement, and register consistency.
- [Overview](https://docs.myspellchecker.com/features/grammar-checking.md): Rule-based grammar checking using POS tags to catch syntactic errors like wrong particles, verb-modifier mismatches, and incomplete sentences.
- [Grammar Engine](https://docs.myspellchecker.com/features/grammar-engine.md): Syntactic rule-based spell checking using POS tags, operating at Layer 2.5 of the validation pipeline to catch errors that N-gram models miss.
- [Homophones Detection](https://docs.myspellchecker.com/features/homophones.md): mySpellChecker includes a homophone checker to detect "Real-Word Errors" - words that are spelled correctly but confused with similar-sounding words.
- [Overview](https://docs.myspellchecker.com/features/index.md): mySpellChecker provides a 12-strategy text checking pipeline for Myanmar, from syllable rules through grammar checking to AI-powered inference.
- [Loan Word Variants](https://docs.myspellchecker.com/features/loan-words.md): Bidirectional lookup for Myanmar loan word transliteration variants from English, Pali/Sanskrit, and other languages.
- [Morphology Analysis](https://docs.myspellchecker.com/features/morphology.md): The morphology module provides word structure analysis for Myanmar text, enabling POS inference for out-of-vocabulary (OOV) words and word decomposition.
- [Named Entity Recognition (NER)](https://docs.myspellchecker.com/features/ner.md): mySpellChecker includes a Named Entity Recognition (NER) module to reduce false positives by identifying names, locations, and organizations in Myanmar text.
- [Text Normalization](https://docs.myspellchecker.com/features/normalization.md): mySpellChecker provides a unified NormalizationService that consolidates all text normalization logic into a single, consistent interface.
- [POS Tagging System](https://docs.myspellchecker.com/features/pos-tagging.md): Pluggable POS tagging with rule-based inference, transformer models, and Viterbi HMM for Myanmar text analysis.
- [Segmenters](https://docs.myspellchecker.com/features/segmenters.md): Two-level text segmentation for Myanmar: rule-based syllable splitting and dictionary-backed word tokenization with myword, CRF, or transformer engines.
- [Semantic Checking](https://docs.myspellchecker.com/features/semantic-checking.md): Optional MLM-based deep context analysis that masks each word and predicts alternatives, providing the highest accuracy for detecting real-word errors.
- [Streaming API](https://docs.myspellchecker.com/features/streaming.md): Memory-efficient streaming spell checking for large files using generators, async iterators, progress callbacks, and backpressure control.
- [Syllable Validation](https://docs.myspellchecker.com/features/syllable-validation.md): Layer 1 validation that checks Myanmar syllable structure using deterministic rules before word-level analysis.
- [Text Utilities](https://docs.myspellchecker.com/features/text-utilities.md): Stemmer, phonetic matching, and encoding utilities for Myanmar text processing challenges.
- [Text Validation](https://docs.myspellchecker.com/features/text-validation.md): Myanmar text quality validation with 30+ categories for structural issues, encoding problems, and Zawgyi artifacts.
- [Validation Strategies](https://docs.myspellchecker.com/features/validation-strategies.md): Strategy-based validation pipeline where each strategy handles a specific concern, executed in priority order from tone checking to AI-powered analysis.
- [Word Validation](https://docs.myspellchecker.com/features/word-validation.md): Layer 2 validation that verifies valid syllables form recognized words and provides intelligent correction suggestions via SymSpell.
- [Algorithm Factory](https://docs.myspellchecker.com/guides/algorithm-factory.md): Centralized creation and caching of SymSpell and semantic checker algorithm instances.
- [Caching Guide](https://docs.myspellchecker.com/guides/caching.md): Configure LRU cache sizes for syllable, word, frequency, and N-gram lookups to optimize spell checking performance.
- [CLI Formatting](https://docs.myspellchecker.com/guides/cli-formatting.md): Rich-based terminal formatting internals: themes, error tables, stats panels, and Myanmar text display for the CLI.
- [Configuration Guide](https://docs.myspellchecker.com/guides/configuration.md): mySpellChecker provides extensive configuration options to customize behavior for your specific use case.
- [Connection Pool](https://docs.myspellchecker.com/guides/connection-pool.md): Thread-safe SQLite connection pooling with auto-scaling, health checks, and connection aging for multi-threaded applications.
- [Custom Dictionaries](https://docs.myspellchecker.com/guides/custom-dictionaries.md): Build domain-specific dictionaries from text corpora, curated lexicons, CSV, or JSON using the data pipeline.
- [Custom Grammar Rules Guide](https://docs.myspellchecker.com/guides/custom-grammar-rules.md): Create and customize YAML-based grammar rules for syntactic validation including POS sequences, particles, and register checks.
- [Overview](https://docs.myspellchecker.com/guides/customization.md): Extend mySpellChecker with custom segmenters, dictionary providers, validation strategies, and configuration patterns.
- [Cython Guide](https://docs.myspellchecker.com/guides/cython.md): This guide covers building, using, and developing the Cython extensions that provide 2-20x performance improvements for mySpellChecker.
- [Docker Guide](https://docs.myspellchecker.com/guides/docker.md): Run mySpellChecker in Docker with multi-stage builds, docker-compose profiles, GPU support, and production deployment.
- [Installation Guide](https://docs.myspellchecker.com/guides/installation.md): This guide covers all installation methods for mySpellChecker, from basic pip installation to development setup with all optional features.
- [Integration & Examples](https://docs.myspellchecker.com/guides/integration.md): Integrate mySpellChecker with FastAPI, including REST endpoints, WebSocket, deployment patterns, and caching examples.
- [I/O Utilities](https://docs.myspellchecker.com/guides/io-utilities.md): File handling and system check functions for the data pipeline, including disk space verification.
- [Logging Guide](https://docs.myspellchecker.com/guides/logging.md): Configure centralized logging with debug mode, JSON output for log aggregators, and per-module log levels.
- [Performance Tuning](https://docs.myspellchecker.com/guides/performance-tuning.md): Key configuration choices that impact spell checking speed, memory usage, and throughput.
- [Quick Start Guide](https://docs.myspellchecker.com/guides/quickstart.md): Get up and running with mySpellChecker in 5 minutes. This guide covers the essential concepts and most common use cases.
- [Resource Caching](https://docs.myspellchecker.com/guides/resource-caching.md): Automatic downloading and caching of word segmentation resources from HuggingFace for the myword and CRF engines.
- [Training Custom Models](https://docs.myspellchecker.com/guides/training.md): Train custom semantic and neural reranker models for Myanmar text using mySpellChecker's training pipelines.
- [Zawgyi Support](https://docs.myspellchecker.com/guides/zawgyi-support.md): Detect and convert legacy Zawgyi-encoded Myanmar text to Unicode using Google's myanmartools detector and python-myanmar converter.
- [Overview](https://docs.myspellchecker.com/introduction.md): Myanmar (Burmese) text intelligence library with a 12-strategy checking pipeline, dictionary building, and AI model training, from O(1) SymSpell lookups to ONNX-powered inference.
- [Myanmar Text Intelligence Landscape](https://docs.myspellchecker.com/reference/comparisons.md): mySpellChecker is the first production-grade, pip-installable text intelligence library built specifically for Myanmar language.
- [Constants Reference](https://docs.myspellchecker.com/reference/constants.md): Myanmar Unicode constants and character sets used in mySpellChecker.
- [Error Codes Reference](https://docs.myspellchecker.com/reference/error-codes.md): Complete reference of error codes returned by mySpellChecker.
- [Error Types Reference](https://docs.myspellchecker.com/reference/error-types.md): mySpellChecker uses a hierarchy of error types for spell checking results and exceptions for system errors.
- [Frequently Asked Questions](https://docs.myspellchecker.com/reference/faq.md): Common questions about installation, usage, performance, accuracy, dictionary building, and integration.
- [Glossary](https://docs.myspellchecker.com/reference/glossary.md): Terms and definitions used throughout mySpellChecker documentation, covering Myanmar script, validation pipeline, algorithms, and tooling.
- [Overview](https://docs.myspellchecker.com/reference/index.md): Technical reference documentation for mySpellChecker.
- [Phonetic Data](https://docs.myspellchecker.com/reference/phonetic-data.md): Static data structures for phonetic hashing, including similarity groups, visual confusability, and tonal variants.
- [Rules System](https://docs.myspellchecker.com/reference/rules-system.md): YAML-based linguistic rule configuration for particles, grammar, morphology, classifiers, and tone marks.
- [Troubleshooting Guide](https://docs.myspellchecker.com/reference/troubleshooting.md): Solutions for installation failures, database issues, performance problems, validation errors, and encoding quirks.

## OpenAPI Specs

- [openapi](https://docs.myspellchecker.com/api-reference/openapi.json)