Skip to main content
The neural reranker is the final stage of the suggestion pipeline, scoring candidates using a trained ONNX model after rule-based and N-gram ranking. It supports both MLP (multi-layer perceptron) and GBT (gradient-boosted tree) model types, auto-detected from the ONNX model’s input shape.

Overview

Suggestion Pipeline:
  SymSpell candidates → Rule-based ranking → N-gram reranking → Neural reranking
                                                                  ↑ you are here
The reranker takes the candidate list with extracted features and produces a score for each candidate, reordering them by the model’s prediction of which correction is most likely correct.

Model Types

TypeInput ShapeNormalizationBest For
MLP(batch, candidates, features)Z-score (requires stats file)Production deployment
GBT(N, features)None (scale-invariant)Experimentation
Model type is auto-detected from the ONNX input shape — no manual configuration needed.

Feature Vector

The reranker uses 19 features extracted from each candidate:
#FeatureDescription
0edit_distanceRaw Damerau-Levenshtein distance
1weighted_distanceMyanmar-weighted edit distance
2log_frequencylog1p(word_frequency)
3phonetic_scorePhonetic similarity [0, 1]
4syllable_count_diffAbsolute syllable count difference
5plausibility_ratioweighted_dist / raw_dist
6span_length_ratiolen(candidate) / len(error)
7mlm_logitMLM logit from semantic checker
8ngram_left_probLeft context N-gram probability
9ngram_right_probRight context N-gram probability
10is_confusable1.0 if Myanmar confusable variant
11relative_log_freqlog_freq / max(log_freq) within candidates
12char_length_difflen(candidate) - len(error), signed
13is_substring1.0 if substring relationship exists
14original_rank1/(1+rank) prior ranking signal
15ngram_improvement_ratiolog(P_cand_ctx / P_error_ctx)
16edit_type_subst1.0 if primary edit is substitution
17edit_type_delete1.0 if primary edit is deletion/insertion
18char_dice_coeffCharacter bigram Dice coefficient

MLP v3 Feature Transforms

For MLP models with feature_schema == "mlp_v3", the reranker automatically applies transforms at inference time:
  • Drops original_rank feature (index 14) to prevent ranking leakage
  • Computes cross-features as configured in the stats file

Usage

Basic Usage

from myspellchecker.algorithms.neural_reranker import NeuralReranker

reranker = NeuralReranker(
    model_path="path/to/reranker.onnx",
    stats_path="path/to/reranker_stats.json",  # Required for MLP
)

# Score candidates
features = [
    [1.0, 0.8, 9.2, 0.9, 0, ...],  # candidate 1 features (19 values)
    [2.0, 1.5, 7.1, 0.6, 1, ...],  # candidate 2 features
]

scores = reranker.score_candidates(features)
# Returns: [0.92, 0.45] — higher is better

With SpellCheckerBuilder

from myspellchecker.core import SpellCheckerBuilder
from myspellchecker.core.config import SpellCheckerConfig
from myspellchecker.core.config.algorithm_configs import RankerConfig

config = SpellCheckerConfig(
    ranker=RankerConfig(
        reranker_model_path="path/to/reranker.onnx",
        reranker_stats_path="path/to/reranker_stats.json",
    )
)

checker = (
    SpellCheckerBuilder()
    .with_config(config)
    .build()
)

Stats File Format

The JSON stats file contains normalization parameters for MLP models:
{
  "feature_schema": "mlp_v3",
  "feature_means": [1.2, 0.9, 8.5, ...],
  "feature_stds": [0.8, 0.4, 2.1, ...],
  "drop_original_rank": true,
  "cross_features": ["freq_x_edit"]
}
FieldDescription
feature_schemaSchema version (e.g., “mlp_v3”)
feature_meansPer-feature means for z-score normalization
feature_stdsPer-feature standard deviations
drop_original_rankWhether to drop the original_rank feature
cross_featuresList of cross-features to compute

Training

Reranker models are trained offline using the training pipeline:
  1. Generate training data (training/reranker_data.py): Extracts feature vectors from benchmark examples
  2. Train model (training/reranker_trainer.py): Trains MLP and exports to ONNX with quantization
  3. Evaluate: Compare MRR and Top-1 accuracy against the baseline ranker
See the Training Guide for details on the training pipeline.

Performance

  • Latency: ~0.5ms per candidate batch (ONNX optimized)
  • Memory: ~2-5MB model size (quantized)
  • Dependencies: Requires onnxruntime (pip install myspellchecker[ai])

See Also