Skip to main content
The diagrams below show how validation, data, utility, and algorithm layers interact within the system.

High-Level Architecture

  +-------------------------------+
  |       User Application        |
  |       [Application Code]      |
  +---------------+---------------+
                  |
                  v
  +-------------------------------+
  |    SpellChecker (Facade)      |
  |                               |
  |  SpellCheckerBuilder <------> SpellChecker
  |                               |    |
  |                               |    +--> SpellCheckerConfig
  +------+------------+----------+
         |            |          |
         v            v          v
  +-----------+ +-----------+ +-----------+
  | Validation| | Data      | | Utility   |
  | Layer     | | Layer     | | Layer     |
  |           | |           | |           |
  | Syllable  | | Dictionary| | Normalize |
  | Validator | | Provider  | | Segment   |
  | Word      | |   |       | | Edit Dist |
  | Validator | |   +->SQLite| |           |
  | Context   | |   +->Memory| |           |
  | Validator | |           | |           |
  +-----------+ +-----------+ +-----------+

Validation Layer Components

  Validation Layer
  ================

  SyllableValidator
  ├── SyllableRuleValidator
  │   • Structure rules
  │   • Medial order
  │   • Vowel compatibility
  └── Dictionary Lookup
      • Syllable exists?
      • Frequency lookup
          |
          v
  WordValidator
  ├── Dictionary Lookup
  │   • Word exists?
  │   • Get frequency
  │   • Get POS
  └── SymSpell Algorithm
      • Generate deletes
      • Find suggestions
      • Rank by distance
          |
          v
  ContextValidator (Strategy-based)
  ├── SyntacticValidationStrategy (Layer 2.5)
  │   ├── POS Tagger
  │   │   • Viterbi HMM
  │   │   • Transformer
  │   │   • Rule-based
  │   └── SyntacticRuleChecker
  │       • Particle rules
  │       • Sequence rules
  │       • Linguistic rules
  ├── N-gram Checker
  │   • Bigram probs
  │   • Trigram probs
  │   • Smoothing
  └── Semantic Checker (Optional)
ONNX model
      • Embedding lookup

Data Layer Components

  DictionaryProvider (Abstract)

  │  Methods:
  │  • is_valid_syllable(syllable) -> bool
  │  • is_valid_word(word) -> bool
  │  • get_word_frequency(word) -> int
  │  • get_bigram_probability(prev, curr) -> float

  ├── SQLiteProvider (disk-based, indexed, default)
  ├── MemoryProvider (RAM-based, fast, high mem)
  ├── JSONProvider (testing, simple)
  └── CSVProvider (testing, simple)

Algorithm Components

  Algorithm Layer
  ===============

  SymSpell                        N-gram Model
  +--------------------------+    +--------------------------+
  | Input:  misspelled word  |    | Input:  word sequence    |
  | Output: suggestions      |    | Output: probability      |
  |                          |    |                          |
  | • Delete Dict            |    | • Bigram Probs           |
  |   (word -> deletes)      |    |   P(word2 | word1)       |
  | • Prefix Index           |    | • Trigram Probs          |
  |   (fast lookup)          |    |   P(word3 | word1,word2) |
  |                          |    |                          |
  | Complexity: O(1)         |    | Complexity: O(1)         |
  +--------------------------+    +--------------------------+

  Viterbi POS                     Edit Distance (Cython)
  +--------------------------+    +--------------------------+
  | Input:  word sequence    |    | • Levenshtein            |
  | Output: POS tags         |    | • Damerau-Levenshtein    |
  |                          |    | • Optimized C            |
  | • Transition Probs       |    +--------------------------+
  |   P(tag | prev_tag)      |
  | • Emission Probs         |    Semantic Model (ONNX)
  |   P(word | tag)          |    +--------------------------+
  |                          |    | • Word embeddings        |
  | Complexity: O(nT^2)      |    | • Cosine similarity      |
  +--------------------------+    | • Neural network         |
                                  +--------------------------+

Data Pipeline Components

  +------------------+     +---------------------+     +------------------+     +------------------+
  | CorpusIngester   | --> | CorpusSegmenter     | --> | FrequencyBuilder | --> | DatabasePackager |
  |                  |     | (Cython)            |     |                  |     |                  |
  | • Read files     |     | • Normalize         |     | • Count tokens   |     | • Create SQLite  |
  | • Parse formats  |     | • Segment           |     | • N-gram stats   |     | • Build indexes  |
  | • Validate       |     | • Parallel          |     | • Build tables   |     | • Optimize       |
  | • Stream         |     |                     |     |                  |     |                  |
  +------------------+     +---------------------+     +------------------+     +------------------+

Component Interactions

Check Operation Flow

See Data Flow for detailed check operation flow.

Suggestion Generation Flow

  +------------------+
  | Unknown word     |
  +--------+---------+
           |
           v
  +----------------------------------+
  | SymSpell                         |
  |                                  |
  | 1. Generate deletes from input   |
  |            |                     |
  |            v                     |
  | 2. Look up each delete in        |
  |    pre-computed dictionary       |
  |            |                     |
  |            v                     |
  | 3. Find candidate words within   |
  |    edit distance                 |
  |            |                     |
  |            v                     |
  | 4. Rank by (edit_distance,       |
  |    frequency)                    |
  +------------+---------------------+
               |
               v
  +----------------------------------+
  | Suggestions [word1, word2, ...]  |
  +----------------------------------+

Dependency Graph

  SpellChecker
  ├──> SyllableValidator ──> DictionaryProvider
  ├──> WordValidator ──────> DictionaryProvider
  │    └──> SymSpell ──────> DictionaryProvider
  └──> ContextValidator ──> DictionaryProvider
                                    |
                              +-----+-----+
                              |           |
                              v           v
                        SQLiteProvider  MemoryProvider

See Also