Overview
SuggestionData
All ranking input is encapsulated inSuggestionData:
Data Fields
| Field | Type | Description |
|---|---|---|
term | str | The suggested word |
edit_distance | int | Damerau-Levenshtein distance |
frequency | int | Word frequency in corpus |
phonetic_score | float | Phonetic similarity (0.0-1.0) |
syllable_distance | float | Myanmar syllable-aware distance |
weighted_distance | float | Myanmar-weighted edit distance using substitution costs |
is_nasal_variant | bool | Nasal ending variant (န်↔ံ) |
has_same_nasal_ending | bool | Same nasal consonant ending |
source | str | Suggestion origin |
confidence | float | Source confidence (0.0-1.0) |
strategy_score | float | Strategy-level score for blending (optional) |
score_breakdown | dict | Debug info with component scores (optional) |
Ranking Strategies
DefaultRanker
Balanced ranking considering multiple factors:| Bonus | Range | Description |
|---|---|---|
freq_bonus | 0.0-0.5 | Higher frequency reduces score |
phonetic_bonus | 0.0-0.4 | Phonetic similarity bonus (weight=0.4) |
syllable_bonus | 0.0-0.3 | Medial confusion detection (weight=0.3) |
nasal_bonus | 0.0-0.15 | Nasal variant matching (weight=0.15) |
same_nasal_bonus | 0.0-0.25 | Same nasal ending (weight=0.25) |
weighted_bonus | 0.0-0.35 | Myanmar-weighted distance bonus (weight=0.35) |
FrequencyFirstRanker
Prioritizes common words over edit distance:EditDistanceOnlyRanker
Simple ranking by edit distance only:PhoneticFirstRanker
Prioritizes phonetically similar words:UnifiedRanker
Consolidates suggestions from multiple sources:| Source | Default Weight | Description |
|---|---|---|
particle_typo | 1.2 | Grammar rule match |
semantic | 1.15 | Semantic model |
context | 1.15 | Context-aware re-ranking |
medial_confusion | 1.1 | Ya-pin/Ra-yit swap |
symspell | 1.0 | Statistical (baseline) |
question_structure | 1.0 | Question structure |
compound | 0.95 | Compound word splitting |
morphology | 0.9 | Morphological analysis |
pos_sequence | 0.85 | POS sequence |
Configuration
RankerConfig
Integration with SymSpell
UnifiedRanker Features
Deduplication
Batch Ranking
Nasal Variant Handling
Myanmar has multiple nasal endings that are often confused:| Ending | Phonetic | Example |
|---|---|---|
| န် | /n/ | ကန် |
| ံ | /n/ (anusvara) | ကံ |
| မ် | /m/ | ကမ် |
| င် | /ŋ/ | ကင် |
Custom Rankers
Implement custom ranking strategy:Performance
| Ranker | Score Time | Notes |
|---|---|---|
| EditDistanceOnly | ~0.1μs | Fastest |
| DefaultRanker | ~1μs | Balanced |
| FrequencyFirst | ~0.5μs | Log calculation |
| PhoneticFirst | ~0.5μs | Simple formula |
| UnifiedRanker | ~2μs | Source lookup + base score |
See Also
- SymSpell Algorithm - Suggestion generation
- Edit Distance - Distance calculations
- Phonetic Matching - Phonetic scoring
- Configuration Guide - RankerConfig options