Overview
The Viterbi algorithm finds the most likely sequence of hidden states (POS tags) given observed emissions (words). It uses dynamic programming with beam search, running in O(N × B² × T) time where N is the sequence length, B is the beam width, and T is the tagset size.Requirements
The Viterbi POS tagger requires probability tables in the database to function effectively:| Table | Purpose | Required? |
|---|---|---|
pos_unigrams | P(tag) - Tag frequency distribution | Recommended |
pos_bigrams | P(tag₂ | tag₁) - Tag transition probabilities | Required |
pos_trigrams | P(tag₃ | tag₁, tag₂) - Trigram transitions | Recommended |
Building Database with POS Probabilities
POS probability tables are populated when you build a database with a POS tagger:Checking Database for POS Tables
Fallback Behavior
If POS probability tables are empty or missing:- Viterbi falls back to morphological analysis for tag prediction
- Accuracy drops from ~85% to ~70% (similar to rule-based tagger)
- A warning is logged: “Provider does not support bigram probabilities”
How It Works
Trigram Hidden Markov Model
The implementation uses a trigram HMM with deleted interpolation smoothing:Algorithm Steps
- Initialization: Set probabilities for first position
- Recursion: For each position, compute best path using trigram transitions with beam pruning
- Termination: Find best final state
- Backtracking: Reconstruct optimal path
Implementation
Python Wrapper
Cython Acceleration
The library includes a Cython implementation for 2-5x speedup:Configuration
Note: Unknown word handling is controlled via themin_prob(default:1e-10) andunknown_word_tag(default:"UNK") parameters on theViterbiTaggerconstructor, not via a separate penalty parameter.
Config resolution: When both constructor arguments and aPOSTaggerConfigare provided, explicit constructor arguments always take precedence, even if they match the default value. This meansViterbiTagger(beam_width=10, config=config)uses10regardless of whatconfig.viterbi_beam_widthsays. Omitted parameters fall back to the config, then to built-in defaults.
POSTaggerConfig Viterbi Fields
ThePOSTaggerConfig class exposes Viterbi-specific tuning parameters. These are separate from the constructor parameters on ViterbiTagger — they control the config-driven pipeline integration.
| Field | Type | Default | Description |
|---|---|---|---|
viterbi_min_prob | float | 1e-10 | Minimum probability floor to prevent underflow |
viterbi_beam_width | int | 10 | Default beam search width for pruning |
viterbi_emission_weight | float | 1.2 | Weight for emission probabilities in scoring |
viterbi_lambda_unigram | float | 0.1 | Deleted interpolation weight for unigram |
viterbi_lambda_bigram | float | 0.3 | Deleted interpolation weight for bigram |
viterbi_lambda_trigram | float | 0.6 | Deleted interpolation weight for trigram |
viterbi_emission_cache_size | int | 4096 | LRU cache size for emission score lookups |
viterbi_transition_cache_size | int | 2048 | LRU cache size for transition probability lookups |
viterbi_min_beam_width | int | 5 | Minimum beam width for adaptive beam mode |
viterbi_max_beam_width | int | 20 | Maximum beam width for adaptive beam mode |
viterbi_short_sequence_threshold | int | 5 | Sequences at or below this length use max beam |
viterbi_long_sequence_threshold | int | 20 | Sequences above this length use min beam |
lambda_* weights control deleted interpolation smoothing and must sum to 1.0. Higher lambda_trigram gives more weight to trigram context (better for well-trained models), while higher lambda_unigram adds robustness when training data is sparse.
Beam Search
Beam search prunes unlikely paths for efficiency:| Beam Width | Speed | Accuracy |
|---|---|---|
| 5 | Fast | ~83% |
| 10 | Balanced (default) | ~85% |
| 20+ | Slow | ~86% |
Joint Segmentation + Tagging
The Viterbi algorithm can jointly segment and tag text:Emission Score Logic
For the first token (t=1), emission scores are handled specially:Performance
| Operation | Cython | Python | Speedup |
|---|---|---|---|
| Tag 10 words | 1.5ms | 5ms | ~3x |
| Tag 100 words | 10ms | 30ms | ~3x |
| Tag 1000 words | 80ms | 250ms | ~3x |
Integration
See Also
- POS Tagging - Feature documentation
- Joint Segment Tagger - Combined segmentation + tagging
- Transformer POS - Neural POS tagger