Overview - mySpellChecker

The core design principle is fail fast, go deeper only when needed. Cheap deterministic checks (syllable structure, dictionary lookups) run first and reject ~90% of errors before expensive operations (N-gram context, grammar rules, AI inference) are ever invoked. Each layer receives only the output that passed the layer below.

Design Philosophy

Myanmar text has no spaces between words, so splitting on whitespace doesn’t work. Instead, the pipeline starts from syllables and builds up:

Break into syllables (deterministic, fast)
Validate syllables (catches ~90% of errors)
Assemble into words (only with valid syllables)
Check grammar and context (only with valid words)

This “fail-fast” approach catches obvious typos immediately without wasting resources on deeper analysis.

High-Level Architecture

  +------------------+
  |   User Input     |
  +--------+---------+
           |
           v
  +------------------+
  |    Segmenter     |
  | (Syllable/Word)  |
  +--------+---------+
           |
           +----------+----------+----------+
           |          |          |          |
           v          v          v          v
  +--------+  +------+--+  +---+----+  +--+-------+
  | Layer 1 |  | Layer 2  |  | L 2.5  |  | Layer 3  |
  |Syllable |  | Word     |  |Grammar |  | Context  |
  |Validator|  |Validator |  |Rules   |  |Validator |
  +---------+  +---------+  +--------+  +----------+
           |          |          |          |
           +----------+----------+----------+
           |
           v
  +------------------+       +------------------+
  | Dictionary       |       |    Response       |
  | Provider         | ----> | (errors,          |
  | (SQLite/Memory)  |       |  suggestions,     |
  +------------------+       |  corrected_text)  |
                              +------------------+

Core Components

Component	Purpose
SpellChecker	Main coordinator that orchestrates all validation layers
SpellCheckerBuilder	Fluent interface for constructing SpellChecker instances
DictionaryProvider	Pluggable storage backend (SQLite, Memory, JSON)
Segmenter	Text segmentation (syllable + word)
SyllableValidator	Layer 1: syllable structure validation
WordValidator	Layer 2: word lookup + SymSpell suggestions
ContextValidator	Layer 3: N-gram + validation strategies

Validation Strategies

The context validation layer uses a Strategy pattern for modular, priority-ordered validation:

ValidationStrategy (interface)

ToneValidationStrategy (10)

OrthographyValidationStrategy (15)

SyntacticValidationStrategy (20)

StatisticalConfusableStrategy (24)

BrokenCompoundStrategy (25)

POSSequenceValidationStrategy (30)

QuestionStructureValidationStrategy (40)

HomophoneValidationStrategy (45)

ConfusableCompoundClassifierStrategy (47), AI, opt-in

ConfusableSemanticStrategy (48), AI, opt-in

NgramContextValidationStrategy (50)

SemanticValidationStrategy (70), AI, opt-in

Each strategy can be enabled/disabled via configuration. See Validation Strategies.

Offline Systems

Data Pipeline

Transforms raw corpus into optimized dictionary database:

Raw Corpus

Ingester

Segmenter

Frequency Builder

Packager

SQLite DB

Training Pipeline

Creates AI models for semantic checking (BYOM):

Corpus

Tokenizer Training

Pre-training (MLM)

Export (ONNX)

Quantization

Design Principles

Fail Fast: Catch errors at the earliest possible layer
Layered Validation: Each layer adds accuracy at a cost
Pluggable Components: Swap providers, segmenters, taggers
Graceful Degradation: Continue working even if optional components fail
Performance First: Optimize hot paths with Cython/OpenMP

Architecture Documents

System Design

Detailed component architecture and class responsibilities

Validation Pipeline

Pipeline deep-dive with execution flow

Component Diagram

Visual component relationships

Data Flow

Data flow through the system

Extension Points

How to extend the system

Dependency Injection

DI container system

​Design Philosophy

​High-Level Architecture

​Core Components

​Validation Strategies

​Offline Systems

​Data Pipeline

​Training Pipeline

​Design Principles

​Architecture Documents