Overview
Word validation extends syllable validation to handle multi-syllable words. It includes:- Dictionary lookup for complete words
- Compound word validation (SymSpell)
- Productive reduplication validation (ReduplicationEngine)
- Compound word synthesis via DP segmentation (CompoundResolver)
- OOV (Out-of-Vocabulary) recovery via morphological analysis
- Context-aware suggestion ranking
- Morpheme-level suggestion correction (MorphemeSuggestionStrategy)
- Colloquial variant detection
Architecture
WordValidator
Initialization
Factory Method
Basic Usage
Validation Process
Step 1: Word Segmentation
Text is segmented into words using the configured segmenter:Step 2: Dictionary Lookup
Each word is checked against the word repository:Step 3: Compound Validation
Words not found directly may be valid compounds:Step 4: Reduplication Validation
Words not found in the dictionary or via compound check may be productive reduplications of known words:- AA: Simple repetition (ကောင်းကောင်း “well”)
- AABB: Each syllable doubles (သေသေချာချာ “carefully”)
- ABAB: Whole word repeats (ခဏခဏ “frequently”)
- RHYME: Known rhyme pairs from
grammar/patterns.py
Step 5: Compound Synthesis
Words not matching any previous check may be valid compounds formed from known dictionary morphemes:Step 6: OOV Recovery (Morphology)
For unknown words, attempt morphological analysis:Step 7: Suggestion Generation
Suggestions are generated via the unified strategy pipeline:Step 8: Context Ranking
Suggestions are ranked using bidirectional context:OOV Recovery Details
Morphological Analysis
The morphology module decomposes unknown words:Enhanced Suggestions
OOV recovery improves suggestion quality:Colloquial Variant Detection
Configuration
Behavior by Mode
| Mode | Behavior |
|---|---|
strict | Flag colloquial as error, suggest standard form |
lenient | Info note with low confidence, not counted as error |
off | No special handling |
Examples
Interface Segregation
WordValidator uses narrow repository interfaces:WordRepository Interface
SyllableRepository Interface
- Reduces coupling to full DictionaryProvider
- Makes testing easier with minimal mocks
- Allows different storage backends
Error Types
WordValidator returnsWordError objects:
Error Type Values
| Type | Description |
|---|---|
invalid_word | Unknown word, not in dictionary |
colloquial_variant | Colloquial spelling (strict mode) |
colloquial_info | Colloquial spelling (lenient mode) |
Configuration Options
Via SpellCheckerConfig
Usage with SpellChecker
Validation Level
Performance Comparison
| Level | Speed | Coverage |
|---|---|---|
| SYLLABLE | ~10ms | ~90% of errors |
| WORD | ~50ms | ~95% of errors |
Suggestion Strategy
Composite Strategy Pipeline
Testing
Unit Test Example
See Also
- Syllable Validation - Layer 1 validation
- Context Checking - N-gram context validation
- SymSpell Algorithm - Edit distance suggestions
- Morphology Analysis - Word decomposition
- Suggestion Strategy - Unified suggestion pipeline