High-Level Architecture
Copy
Ask AI
+-------------------------------+
| User Application |
| [Application Code] |
+---------------+---------------+
|
v
+-------------------------------+
| SpellChecker (Facade) |
| |
| SpellCheckerBuilder <------> SpellChecker
| | |
| | +--> SpellCheckerConfig
+------+------------+----------+
| | |
v v v
+-----------+ +-----------+ +-----------+
| Validation| | Data | | Utility |
| Layer | | Layer | | Layer |
| | | | | |
| Syllable | | Dictionary| | Normalize |
| Validator | | Provider | | Segment |
| Word | | | | | Edit Dist |
| Validator | | +->SQLite| | |
| Context | | +->Memory| | |
| Validator | | | | |
+-----------+ +-----------+ +-----------+
Validation Layer Components
Copy
Ask AI
Validation Layer
================
SyllableValidator
├── SyllableRuleValidator
│ • Structure rules
│ • Medial order
│ • Vowel compatibility
└── Dictionary Lookup
• Syllable exists?
• Frequency lookup
|
v
WordValidator
├── Dictionary Lookup
│ • Word exists?
│ • Get frequency
│ • Get POS
└── SymSpell Algorithm
• Generate deletes
• Find suggestions
• Rank by distance
|
v
ContextValidator (Strategy-based)
├── SyntacticValidationStrategy (Layer 2.5)
│ ├── POS Tagger
│ │ • Viterbi HMM
│ │ • Transformer
│ │ • Rule-based
│ └── SyntacticRuleChecker
│ • Particle rules
│ • Sequence rules
│ • Linguistic rules
├── N-gram Checker
│ • Bigram probs
│ • Trigram probs
│ • Smoothing
└── Semantic Checker (Optional)
• ONNX model
• Embedding lookup
Data Layer Components
Copy
Ask AI
DictionaryProvider (Abstract)
│
│ Methods:
│ • is_valid_syllable(syllable) -> bool
│ • is_valid_word(word) -> bool
│ • get_word_frequency(word) -> int
│ • get_bigram_probability(prev, curr) -> float
│
├── SQLiteProvider (disk-based, indexed, default)
├── MemoryProvider (RAM-based, fast, high mem)
├── JSONProvider (testing, simple)
└── CSVProvider (testing, simple)
Algorithm Components
Copy
Ask AI
Algorithm Layer
===============
SymSpell N-gram Model
+--------------------------+ +--------------------------+
| Input: misspelled word | | Input: word sequence |
| Output: suggestions | | Output: probability |
| | | |
| • Delete Dict | | • Bigram Probs |
| (word -> deletes) | | P(word2 | word1) |
| • Prefix Index | | • Trigram Probs |
| (fast lookup) | | P(word3 | word1,word2) |
| | | |
| Complexity: O(1) | | Complexity: O(1) |
+--------------------------+ +--------------------------+
Viterbi POS Edit Distance (Cython)
+--------------------------+ +--------------------------+
| Input: word sequence | | • Levenshtein |
| Output: POS tags | | • Damerau-Levenshtein |
| | | • Optimized C |
| • Transition Probs | +--------------------------+
| P(tag | prev_tag) |
| • Emission Probs | Semantic Model (ONNX)
| P(word | tag) | +--------------------------+
| | | • Word embeddings |
| Complexity: O(nT^2) | | • Cosine similarity |
+--------------------------+ | • Neural network |
+--------------------------+
Data Pipeline Components
Copy
Ask AI
+------------------+ +---------------------+ +------------------+ +------------------+
| CorpusIngester | --> | CorpusSegmenter | --> | FrequencyBuilder | --> | DatabasePackager |
| | | (Cython) | | | | |
| • Read files | | • Normalize | | • Count tokens | | • Create SQLite |
| • Parse formats | | • Segment | | • N-gram stats | | • Build indexes |
| • Validate | | • Parallel | | • Build tables | | • Optimize |
| • Stream | | | | | | |
+------------------+ +---------------------+ +------------------+ +------------------+
Component Interactions
Check Operation Flow
See Data Flow for detailed check operation flow.Suggestion Generation Flow
Copy
Ask AI
+------------------+
| Unknown word |
+--------+---------+
|
v
+----------------------------------+
| SymSpell |
| |
| 1. Generate deletes from input |
| | |
| v |
| 2. Look up each delete in |
| pre-computed dictionary |
| | |
| v |
| 3. Find candidate words within |
| edit distance |
| | |
| v |
| 4. Rank by (edit_distance, |
| frequency) |
+------------+---------------------+
|
v
+----------------------------------+
| Suggestions [word1, word2, ...] |
+----------------------------------+
Dependency Graph
Copy
Ask AI
SpellChecker
├──> SyllableValidator ──> DictionaryProvider
├──> WordValidator ──────> DictionaryProvider
│ └──> SymSpell ──────> DictionaryProvider
└──> ContextValidator ──> DictionaryProvider
|
+-----+-----+
| |
v v
SQLiteProvider MemoryProvider
See Also
- Architecture Overview - High-level architecture
- Data Flow - Data flow diagrams
- Extension Points - Customization guide