Component Architecture
Core Components
The core components are organized in layers: Configuration (Builder, Factory) → Validators (Syllable, Word, Context) → Algorithms (SymSpell, N-gram, POS, Semantic) → Infrastructure (Segmenter, Provider, Normalizer).Component Responsibilities
| Component | Responsibility |
|---|---|
| SpellChecker | Orchestrate validation, manage lifecycle. Uses mixin decomposition: PreNormalizationDetectorsMixin, PostNormalizationDetectorsMixin, SentenceDetectorsMixin, SuggestionPipelineMixin, ErrorSuppressionMixin |
| Configuration | Store settings, validation levels, thresholds |
| Builder | Fluent construction of SpellChecker |
| ComponentFactory | Create and wire components |
| SyllableValidator | Layer 1: syllable-level validation |
| WordValidator | Layer 2: word-level validation |
| ContextValidator | Layer 3: context validation via strategy pattern (created by ContextValidatorFactory; orchestrates 12 strategies: Tone, Orthography, Syntactic, StatisticalConfusable, BrokenCompound, POS Sequence, Question, Homophone, ConfusableCompoundClassifier, ConfusableSemantic, N-gram Context, and Semantic) |
| SymSpell | O(1) suggestion generation |
| NgramContextChecker | Context probability calculation (created by ContextCheckerFactory, distinct from ContextValidatorFactory) |
| POSTagger | Part-of-speech tagging |
| SemanticChecker | Deep context analysis |
| Segmenter | Text segmentation |
| Provider | Dictionary data access |
| Normalizer | Text normalization |
| TokenRefinement | Token boundary refinement that exposes hidden error spans in merged tokens (particle attachment, negation attachment) using a lattice-based scoring pass |
| NeuralReranker | Optional MLP-based suggestion re-ranking using ONNX (19-feature vector, runs as final pipeline step) |
| MedialSwapSuggestionStrategy | Generates medial swap candidates (ျ↔ြ, ွ↔ှ) that SymSpell’s delete-distance model cannot find |
| Response | Result dataclass (text, errors, metadata) |
Design Patterns
Builder Pattern
SpellCheckerBuilder provides fluent construction:
Factory Pattern
ComponentFactory creates configured components. It takes only config (not provider) —
the provider and segmenter are passed to create_all():
Note:SyntacticRuleCheckeris not a separate validator. It is wrapped as aSyntacticValidationStrategyand passed intoContextValidator.strategies. The ContextValidator orchestrates the 12 strategies wired byComponentFactory: Tone (10), Orthography (15), Syntactic (20), Statistical Confusable (24), Broken Compound (25), POS Sequence (30), Question Structure (40), Homophone (45), Confusable Compound Classifier (47), Confusable Semantic (48), N-gram Context (50), and Semantic (70).
Strategy Pattern
Pluggable components implement common interfaces:Chain of Responsibility
Validators form a chain, augmented by 38 post-normalization detectors inherited from mixins (see Component Diagram for the full mixin architecture and detection registry):Provider Architecture
Interface
Implementations
Cython Integration
Wrapper Pattern
Python wrappers with Cython fallback:Cython Source
Error Handling
Graceful Degradation
Components fail gracefully:Exception Hierarchy
Thread Safety
Connection Pooling
Performance Considerations
Eager Initialization via ComponentFactory
SpellChecker.__init__ creates all components eagerly via ComponentFactory.create_all().
There are no lazy properties for core components:
TYPE_CHECKING guards, deferred
import inside methods) to avoid circular imports and heavy dependencies, but
component instances are created eagerly during __init__.
Caching
Caching is implemented at the provider level, not the validator level.ComponentFactory
creates cached wrapper objects around the provider using LRU caches:
SpellCheckerConfig.cache (AlgorithmCacheConfig).
Next Steps
- Validation Pipeline - Pipeline details
- Extension Points - How to extend
- API Reference - Complete API docs