Skip to main content
The suggestion ranking system determines how spelling corrections are scored and ordered. Multiple ranking strategies are available, from simple edit distance to sophisticated multi-factor ranking.

Overview

from myspellchecker.algorithms.ranker import DefaultRanker, SuggestionData

ranker = DefaultRanker()

# Score a suggestion
data = SuggestionData(
    term="မြန်မာ",
    edit_distance=1,
    frequency=10000,
    phonetic_score=0.8,
)
score = ranker.score(data)
print(f"Score: {score}")  # Lower is better

SuggestionData

All ranking input is encapsulated in SuggestionData:
from myspellchecker.algorithms.ranker import SuggestionData

data = SuggestionData(
    term="မြန်မာ",              # Suggested correction
    edit_distance=1,           # Levenshtein distance
    frequency=10000,           # Corpus frequency
    phonetic_score=0.8,        # Phonetic similarity (0-1)
    syllable_distance=0.5,     # Myanmar syllable-aware distance
    weighted_distance=0.8,     # Myanmar-weighted edit distance
    is_nasal_variant=False,    # True if nasal ending difference
    has_same_nasal_ending=True,# Same nasal consonant
    source="symspell",         # Origin: symspell, particle_typo, etc.
    confidence=1.0,            # Source-specific confidence
    strategy_score=None,       # Strategy-level score (optional)
    score_breakdown=None,      # Debug info with component scores (optional)
)

Data Fields

FieldTypeDescription
termstrThe suggested word
edit_distanceintDamerau-Levenshtein distance
frequencyintWord frequency in corpus
phonetic_scorefloatPhonetic similarity (0.0-1.0)
syllable_distancefloatMyanmar syllable-aware distance
weighted_distancefloatMyanmar-weighted edit distance using substitution costs
is_nasal_variantboolNasal ending variant (န်↔ံ)
has_same_nasal_endingboolSame nasal consonant ending
sourcestrSuggestion origin
confidencefloatSource confidence (0.0-1.0)
strategy_scorefloatStrategy-level score for blending (optional)
score_breakdowndictDebug info with component scores (optional)

Ranking Strategies

DefaultRanker

Balanced ranking considering multiple factors:
from myspellchecker.algorithms.ranker import DefaultRanker

ranker = DefaultRanker()
Scoring Formula:
score = edit_distance - freq_bonus - phonetic_bonus - syllable_bonus
        - weighted_bonus - nasal_bonus - same_nasal_bonus
Bonuses:
BonusRangeDescription
freq_bonus0.0-0.5Higher frequency reduces score
phonetic_bonus0.0-0.4Phonetic similarity bonus (weight=0.4)
syllable_bonus0.0-0.3Medial confusion detection (weight=0.3)
nasal_bonus0.0-0.15Nasal variant matching (weight=0.15)
same_nasal_bonus0.0-0.25Same nasal ending (weight=0.25)
weighted_bonus0.0-0.35Myanmar-weighted distance bonus (weight=0.35)

FrequencyFirstRanker

Prioritizes common words over edit distance:
from myspellchecker.algorithms.ranker import FrequencyFirstRanker

ranker = FrequencyFirstRanker()
Scoring Formula:
score = edit_distance * edit_weight - log1p(frequency) * freq_scale
Use Case: Autocomplete-style suggestions where common words are preferred.

EditDistanceOnlyRanker

Simple ranking by edit distance only:
from myspellchecker.algorithms.ranker import EditDistanceOnlyRanker

ranker = EditDistanceOnlyRanker()
score = ranker.score(data)  # Returns edit_distance directly
Use Case: Testing, debugging, or when frequency data is unavailable.

PhoneticFirstRanker

Prioritizes phonetically similar words:
from myspellchecker.algorithms.ranker import PhoneticFirstRanker

ranker = PhoneticFirstRanker()
Scoring Formula:
score = edit_distance * edit_weight - phonetic_score * phonetic_weight
Use Case: Myanmar text with common phonetic confusions (medial swaps).

UnifiedRanker

Consolidates suggestions from multiple sources:
from myspellchecker.algorithms.ranker import UnifiedRanker

ranker = UnifiedRanker()

# Score with source awareness
data = SuggestionData(
    term="ကြောင်း",
    edit_distance=1,
    frequency=5000,
    source="medial_confusion",  # High-priority source
    confidence=0.95,
)
score = ranker.score(data)  # Boosted by source weight
Source Weights:
SourceDefault WeightDescription
particle_typo1.2Grammar rule match
semantic1.15Semantic model
context1.15Context-aware re-ranking
medial_confusion1.1Ya-pin/Ra-yit swap
symspell1.0Statistical (baseline)
question_structure1.0Question structure
compound0.95Compound word splitting
morphology0.9Morphological analysis
pos_sequence0.85POS sequence

Configuration

RankerConfig

from myspellchecker.core.config import RankerConfig

config = RankerConfig(
    # DefaultRanker parameters
    frequency_denominator=10000.0,
    phonetic_bonus_weight=0.4,
    syllable_bonus_weight=0.3,
    nasal_bonus_weight=0.15,
    same_nasal_bonus_weight=0.25,
    weighted_distance_bonus_weight=0.35,

    # FrequencyFirstRanker parameters
    frequency_first_edit_weight=0.5,
    frequency_first_scale=0.1,

    # PhoneticFirstRanker parameters
    phonetic_first_weight=1.0,
    phonetic_first_edit_weight=0.3,

    # UnifiedRanker source weights
    source_weight_particle_typo=1.2,
    source_weight_medial_confusion=1.1,
    source_weight_semantic=1.15,
    source_weight_symspell=1.0,
    source_weight_morphology=0.9,
    source_weight_compound=0.95,
    source_weight_context=1.15,
    source_weight_question_structure=1.0,
    source_weight_pos_sequence=0.85,

    # Strategy score blending
    strategy_score_weight=0.5,
)

ranker = DefaultRanker(ranker_config=config)

Integration with SymSpell

from myspellchecker.algorithms.symspell import SymSpell
from myspellchecker.algorithms.ranker import FrequencyFirstRanker

# Use custom ranker with SymSpell
ranker = FrequencyFirstRanker()
symspell = SymSpell(provider, ranker=ranker)

# Suggestions are ranked by the custom ranker
suggestions = symspell.lookup("မျန်မာ", level='word')

UnifiedRanker Features

Deduplication

ranker = UnifiedRanker()

suggestions = [
    SuggestionData(term="ကြောင်း", source="symspell", confidence=0.8),
    SuggestionData(term="ကြောင်း", source="medial_confusion", confidence=0.95),
]

# Keeps highest-confidence version
ranked = ranker.rank_suggestions(suggestions, deduplicate=True)
# Result: [SuggestionData(term="ကြောင်း", source="medial_confusion")]

Batch Ranking

suggestions = [
    SuggestionData(term="word1", ...),
    SuggestionData(term="word2", ...),
    SuggestionData(term="word3", ...),
]

# Rank and sort all suggestions
ranked = ranker.rank_suggestions(suggestions)
# Returns: Sorted list, best first

Nasal Variant Handling

Myanmar has multiple nasal endings that are often confused:
EndingPhoneticExample
န်/n/ကန်
/n/ (anusvara)ကံ
မ်/m/ကမ်
င်/ŋ/ကင်
# Nasal variants get bonus
data1 = SuggestionData(
    term="ကန်",
    edit_distance=1,
    is_nasal_variant=True,  # True for န် ↔ ံ
)

data2 = SuggestionData(
    term="ကမ်",
    edit_distance=1,
    is_nasal_variant=False,  # Different nasal
)

# data1 gets nasal_bonus, scores lower (better)

Custom Rankers

Implement custom ranking strategy:
from myspellchecker.algorithms.ranker import SuggestionRanker, SuggestionData

class CustomRanker(SuggestionRanker):
    @property
    def name(self) -> str:
        return "custom"

    def score(self, data: SuggestionData) -> float:
        # Custom scoring logic
        base = float(data.edit_distance)

        # Boost exact syllable structure matches
        if data.syllable_distance == 0:
            base -= 0.5

        # Heavy frequency penalty for rare words
        if data.frequency < 100:
            base += 0.3

        return base

# Use custom ranker
symspell = SymSpell(provider, ranker=CustomRanker())

Performance

RankerScore TimeNotes
EditDistanceOnly~0.1μsFastest
DefaultRanker~1μsBalanced
FrequencyFirst~0.5μsLog calculation
PhoneticFirst~0.5μsSimple formula
UnifiedRanker~2μsSource lookup + base score

See Also