Skip to main content
The core design principle is fail fast, go deeper only when needed. Cheap deterministic checks (syllable structure, dictionary lookups) run first and reject ~90% of errors before expensive operations (N-gram context, grammar rules, AI inference) are ever invoked. Each layer receives only the output that passed the layer below.

Design Philosophy

Myanmar text has no spaces between words, so splitting on whitespace doesn’t work. Instead, the pipeline starts from syllables and builds up:
  1. Break into syllables (deterministic, fast)
  2. Validate syllables (catches ~90% of errors)
  3. Assemble into words (only with valid syllables)
  4. Check grammar and context (only with valid words)
This “fail-fast” approach catches obvious typos immediately without wasting resources on deeper analysis.

High-Level Architecture

  +------------------+
  |   User Input     |
  +--------+---------+
           |
           v
  +------------------+
  |    Segmenter     |
  | (Syllable/Word)  |
  +--------+---------+
           |
           +----------+----------+----------+
           |          |          |          |
           v          v          v          v
  +--------+  +------+--+  +---+----+  +--+-------+
  | Layer 1 |  | Layer 2  |  | L 2.5  |  | Layer 3  |
  |Syllable |  | Word     |  |Grammar |  | Context  |
  |Validator|  |Validator |  |Rules   |  |Validator |
  +---------+  +---------+  +--------+  +----------+
           |          |          |          |
           +----------+----------+----------+
           |
           v
  +------------------+       +------------------+
  | Dictionary       |       |    Response       |
  | Provider         | ----> | (errors,          |
  | (SQLite/Memory)  |       |  suggestions,     |
  +------------------+       |  corrected_text)  |
                              +------------------+

Core Components

ComponentPurpose
SpellCheckerMain coordinator that orchestrates all validation layers
SpellCheckerBuilderFluent interface for constructing SpellChecker instances
DictionaryProviderPluggable storage backend (SQLite, Memory, JSON)
SegmenterText segmentation (syllable + word)
SyllableValidatorLayer 1: syllable structure validation
WordValidatorLayer 2: word lookup + SymSpell suggestions
ContextValidatorLayer 3: N-gram + validation strategies

Validation Strategies

The context validation layer uses a Strategy pattern for modular, priority-ordered validation:
ValidationStrategy (interface)
ToneValidationStrategy (10)
OrthographyValidationStrategy (15)
SyntacticValidationStrategy (20)
StatisticalConfusableStrategy (24)
BrokenCompoundStrategy (25)
POSSequenceValidationStrategy (30)
QuestionStructureValidationStrategy (40)
HomophoneValidationStrategy (45)
ConfusableCompoundClassifierStrategy (47), AI, opt-in
ConfusableSemanticStrategy (48), AI, opt-in
NgramContextValidationStrategy (50)
SemanticValidationStrategy (70), AI, opt-in
Each strategy can be enabled/disabled via configuration. See Validation Strategies.

Offline Systems

Data Pipeline

Transforms raw corpus into optimized dictionary database:
1

Raw Corpus

2

Ingester

3

Segmenter

4

Frequency Builder

5

Packager

6

SQLite DB

Training Pipeline

Creates AI models for semantic checking (BYOM):
1

Corpus

2

Tokenizer Training

3

Pre-training (MLM)

4

Export (ONNX)

5

Quantization

Design Principles

  1. Fail Fast: Catch errors at the earliest possible layer
  2. Layered Validation: Each layer adds accuracy at a cost
  3. Pluggable Components: Swap providers, segmenters, taggers
  4. Graceful Degradation: Continue working even if optional components fail
  5. Performance First: Optimize hot paths with Cython/OpenMP

Architecture Documents

System Design

Detailed component architecture and class responsibilities

Validation Pipeline

Pipeline deep-dive with execution flow

Component Diagram

Visual component relationships

Data Flow

Data flow through the system

Extension Points

How to extend the system

Dependency Injection

DI container system