Skip to main content
mySpellChecker uses a “Syllable-First Architecture” — instead of trying to segment misspelled text into words (unreliable), it validates syllable structure first, then progressively validates words, grammar, and context.

Design Philosophy

Traditional spell checkers split by whitespace, which fails for Myanmar (no spaces). mySpellChecker inverts the process:
  1. Break into syllables (deterministic, fast)
  2. Validate syllables (catches ~90% of errors)
  3. Assemble into words (only with valid syllables)
  4. Check grammar and context (only with valid words)
This “fail-fast” approach catches obvious typos immediately without wasting resources on deeper analysis.

High-Level Architecture

  +------------------+
  |   User Input     |
  +--------+---------+
           |
           v
  +------------------+
  |    Segmenter     |
  | (Syllable/Word)  |
  +--------+---------+
           |
           +----------+----------+----------+
           |          |          |          |
           v          v          v          v
  +--------+  +------+--+  +---+----+  +--+-------+
  | Layer 1 |  | Layer 2  |  | L 2.5  |  | Layer 3  |
  |Syllable |  | Word     |  |Grammar |  | Context  |
  |Validator|  |Validator |  |Rules   |  |Validator |
  +---------+  +---------+  +--------+  +----------+
           |          |          |          |
           +----------+----------+----------+
           |
           v
  +------------------+       +------------------+
  | Dictionary       |       |    Response       |
  | Provider         | ----> | (errors,          |
  | (SQLite/Memory)  |       |  suggestions,     |
  +------------------+       |  corrected_text)  |
                              +------------------+

Core Components

ComponentPurpose
SpellCheckerMain coordinator — orchestrates all validation layers
SpellCheckerBuilderFluent interface for constructing SpellChecker instances
DictionaryProviderPluggable storage backend (SQLite, Memory, JSON)
SegmenterText segmentation (syllable + word)
SyllableValidatorLayer 1 — syllable structure validation
WordValidatorLayer 2 — word lookup + SymSpell suggestions
ContextValidatorLayer 3 — N-gram + validation strategies

Validation Strategies

The context validation layer uses a Strategy pattern for modular, priority-ordered validation:
ValidationStrategy (interface)
├── ToneValidationStrategy (10)
├── OrthographyValidationStrategy (15)
├── SyntacticValidationStrategy (20)
├── POSSequenceValidationStrategy (30)
├── QuestionStructureValidationStrategy (40)
├── HomophoneValidationStrategy (45)
├── NgramContextValidationStrategy (50)
├── ErrorDetectionStrategy (65)        — AI, opt-in
└── SemanticValidationStrategy (70)    — AI, opt-in
Each strategy can be enabled/disabled via configuration. See Validation Strategies.

Offline Systems

Data Pipeline

Transforms raw corpus into optimized dictionary database:
Raw Corpus → Ingester → Segmenter → Frequency Builder → Packager → SQLite DB

Training Pipeline

Creates AI models for semantic checking (BYOM):
Corpus → Tokenizer Training → Pre-training (MLM) → Export (ONNX) → Quantization

Design Principles

  1. Fail Fast — Catch errors at the earliest possible layer
  2. Layered Validation — Each layer adds accuracy at a cost
  3. Pluggable Components — Swap providers, segmenters, taggers
  4. Graceful Degradation — Continue working even if optional components fail
  5. Performance First — Optimize hot paths with Cython/OpenMP

Architecture Documents