Skip to main content
The following diagrams trace text through preprocessing, segmentation, multi-layer validation, and response assembly.

Overview

The data flow follows: Input Text → Preprocessing → Validation → Response

Detailed Data Flow

1

Input Processing

  +-----------------+     +-------------------+     +---------------------+     +--------------+
  | Raw Text        | --> | Zawgyi Detection  | --> | Normalize           | --> | 'မြန်မာစာ'  |
  | 'မြန်​မာစာ'    |     |                   |     | - Remove zero-width |     |              |
  +-----------------+     +-------------------+     | - Unicode NFC       |     +--------------+
                                                    | - Whitespace        |
                                                    +---------------------+
2

Segmentation

  +--------------+     +----------------------+     +------------------------+     +--------------------+
  | 'မြန်မာစာ'  | --> | Syllable Segmenter   | --> | ['မြန်', 'မာ', 'စာ']  | --> | Word Assembly      |
  |              |     | - Consonant bounds    |     |                        |     | - Dictionary lookup|
  +--------------+     | - Handle stacking     |     +------------------------+     | - Statistical model|
                       +----------------------+                                     +--------+-----------+
                                                                                             |
                                                                                             v
                                                                                    +--------------------+
                                                                                    | ['မြန်မာ', 'စာ']  |
                                                                                    +--------------------+
3

Validation Pipeline

  Client       SpellChecker    Normalizer     Segmenter    SyllableVal    WordVal     ContextVal    Provider
    |               |               |              |             |            |            |            |
    |-- check() --->|               |              |             |            |            |            |
    |               |               |              |             |            |            |            |
    |               |  [Pre-processing]            |             |            |            |            |
    |               |-- normalize() -->            |             |            |            |            |
    |               |<- "မြန်မာစာ" --|             |             |            |            |            |
    |               |               |              |             |            |            |            |
    |               |  [Segmentation]              |             |            |            |            |
    |               |-- segment_syllables() ------>|             |            |            |            |
    |               |<- ["မြန်","မာ","စာ"] --------|             |            |            |            |
    |               |               |              |             |            |            |            |
    |               |  [Layer 1: Syllable Validation]            |            |            |            |
    |               |-- validate(text) ---------------------->  |            |            |            |
    |               |               |              |             |-- is_valid_syllable("မြန်") ------->|
    |               |               |              |             |<- True (freq: 5000) ----------------|
    |               |               |              |             |-- is_valid_syllable("မာ") --------->|
    |               |               |              |             |<- True (freq: 3000) ----------------|
    |               |               |              |             |-- is_valid_syllable("စာ") --------->|
    |               |               |              |             |<- True (freq: 4000) ----------------|
    |               |<- [] (no errors) -------------------------.|            |            |            |
    |               |               |              |             |            |            |            |
    |               |  [Layer 2: Word Validation]  |             |            |            |            |
    |               |-- validate(text) --------------------------------------->|            |            |
    |               |               |              |             |            |-- is_valid_word ------->|
    |               |               |              |             |            |<- True (freq,POS) -----|
    |               |<- [] (no errors) ----------------------------------------|            |            |
    |               |               |              |             |            |            |            |
    |               |  [Layer 2.5: Grammar Checking]             |            |            |            |
    |               |  SyntacticValidationStrategy calls SyntacticRuleChecker.check_sequence(words)    |
    |               |  Grammar rules loaded from YAML config     |            |            |            |
    |               |               |              |             |            |            |            |
    |               |  [Layer 3: Context Validation]             |            |            |            |
    |               |-- validate(text) ------------------------------------------------------------->  |
    |               |               |              |             |            |            |-- bigram ->|
    |               |               |              |             |            |            |<-- 0.15 ---|
    |               |<- [] (no errors) ------------------------------------------------------------|
    |               |               |              |             |            |            |            |
    |<-- Response(has_errors=false) |              |             |            |            |            |
    |               |               |              |             |            |            |            |
4

Response Generation

  +------------------------+     +------------------+     +-------------------+
  | Validation Results     | --> | Response Builder | --> | Response          |
  | - syllable_errors      |     |                  |     | - text, errors    |
  | - word_errors          |     +------------------+     | - syllables, words|
  | - grammar_errors       |                              | - stats           |
  | - context_errors       |                              +-------------------+
  +------------------------+

Error Flow Example

  Client       SpellChecker    SyllableVal    RuleValidator     SymSpell       Provider
    |               |               |               |               |              |
    |-- check() --->|               |               |               |              |
    |               |               |               |               |              |
    |               |  [Layer 1: Syllable Validation]               |              |
    |               |-- validate() -->              |               |              |
    |               |               |-- validate("ကျြောင်") ----->|              |
    |               |               |               |               |              |
    |               |               |    Check medial compatibility  |              |
    |               |               |    Found: both ျ and|              |
    |               |               |    No dictionary match        |              |
    |               |               |               |               |              |
    |               |               |<- INVALID (not in dict) -----|              |
    |               |               |               |               |              |
    |               |               |  [Generate Suggestions]      |              |
    |               |               |-- lookup(max_dist=2) ----------------->     |
    |               |               |               |               |-- similar ->|
    |               |               |               |               |<- cands ----|
    |               |               |<- ["ကြောင်", "ကျောင်း"] ------------|     |
    |               |               |               |               |              |
    |               |<- SyllableError(suggestions=["ကြောင်", "ကျောင်း"])        |
    |               |               |               |               |              |
    |<-- Response(has_errors=true, errors=[SyllableError(error_type="invalid_syllable", ...)])
    |               |               |               |               |              |

Batch Processing Flow

  +----------------------------+     +--------------------------------------+     +----------------------------------+
  | [text1, text2, ..., textN] | --> | CorpusSegmenter (Cython + OpenMP)    | --> | [result1, result2, ..., resultN] |
  +----------------------------+     |                                      |     +----------------------------------+
                                     |  +----------+  +----------+         |
                                     |  | Thread 1 |  | Thread 2 |         |
                                     |  +----------+  +----------+         |
                                     |  +----------+  +----------+         |
                                     |  | Thread 3 |  | Thread 4 |         |
                                     |  +----------+  +----------+         |
                                     +--------------------------------------+

See Also