Design Philosophy
Myanmar text has no spaces between words, so splitting on whitespace doesn’t work. Instead, the pipeline starts from syllables and builds up:- Break into syllables (deterministic, fast)
- Validate syllables (catches ~90% of errors)
- Assemble into words (only with valid syllables)
- Check grammar and context (only with valid words)
High-Level Architecture
Core Components
| Component | Purpose |
|---|---|
| SpellChecker | Main coordinator that orchestrates all validation layers |
| SpellCheckerBuilder | Fluent interface for constructing SpellChecker instances |
| DictionaryProvider | Pluggable storage backend (SQLite, Memory, JSON) |
| Segmenter | Text segmentation (syllable + word) |
| SyllableValidator | Layer 1: syllable structure validation |
| WordValidator | Layer 2: word lookup + SymSpell suggestions |
| ContextValidator | Layer 3: N-gram + validation strategies |
Validation Strategies
The context validation layer uses a Strategy pattern for modular, priority-ordered validation:ValidationStrategy (interface)
ToneValidationStrategy (10)
OrthographyValidationStrategy (15)
SyntacticValidationStrategy (20)
StatisticalConfusableStrategy (24)
BrokenCompoundStrategy (25)
POSSequenceValidationStrategy (30)
QuestionStructureValidationStrategy (40)
HomophoneValidationStrategy (45)
ConfusableCompoundClassifierStrategy (47), AI, opt-in
ConfusableSemanticStrategy (48), AI, opt-in
NgramContextValidationStrategy (50)
SemanticValidationStrategy (70), AI, opt-in
Offline Systems
Data Pipeline
Transforms raw corpus into optimized dictionary database:Training Pipeline
Creates AI models for semantic checking (BYOM):Design Principles
- Fail Fast: Catch errors at the earliest possible layer
- Layered Validation: Each layer adds accuracy at a cost
- Pluggable Components: Swap providers, segmenters, taggers
- Graceful Degradation: Continue working even if optional components fail
- Performance First: Optimize hot paths with Cython/OpenMP
Architecture Documents
System Design
Detailed component architecture and class responsibilities
Validation Pipeline
Pipeline deep-dive with execution flow
Component Diagram
Visual component relationships
Data Flow
Data flow through the system
Extension Points
How to extend the system
Dependency Injection
DI container system