Configuration - mySpellChecker

The SpellChecker behavior is controlled by the SpellCheckerConfig object. You can adjust performance thresholds, toggle features, and fine-tune algorithm sensitivity through nested configuration classes.

Usage

The easiest way to configure the spell checker is using ConfigPresets with the SpellCheckerBuilder.

from myspellchecker.core.builder import SpellCheckerBuilder, ConfigPresets

# Use a preset
checker = (
    SpellCheckerBuilder()
    .with_config(ConfigPresets.ACCURATE)
    .build()
)

# Customize specific options on top of a preset
config = ConfigPresets.DEFAULT
config.max_suggestions = 10

checker = (
    SpellCheckerBuilder()
    .with_config(config)
    .with_phonetic(False)  # Override phonetic setting
    .build()
)

Configuration Presets

Preset	Description
`ConfigPresets.DEFAULT`	Balanced configuration suitable for most use cases.
`ConfigPresets.FAST`	Optimized for speed. Disables context checking and reduces search depth.
`ConfigPresets.ACCURATE`	Optimized for quality. Max edit distance 3, strict thresholds.
`ConfigPresets.MINIMAL`	Dictionary-only checking. Disables phonetic, context, NER, and rule-based validation. Lowest resource usage.
`ConfigPresets.STRICT`	Sensitive thresholds that catch more potential errors. May increase false positives. Suitable for formal documents.

Configuration Profiles

For environment-specific configurations, use get_profile() which returns fully-configured SpellCheckerConfig objects tuned for specific use cases.

from myspellchecker.core.config import get_profile

# Use a profile directly
config = get_profile("production")

# Available profiles
config = get_profile("development")  # Fast iteration, minimal validation
config = get_profile("production")   # Balanced accuracy and performance (default)
config = get_profile("testing")      # Deterministic, reproducible results
config = get_profile("fast")         # Maximum speed, reduced accuracy
config = get_profile("accurate")     # Maximum accuracy, slower performance

Profile	POS Tagger	Context	NER	Semantic	Zawgyi	Key Tuning
`development`	rule_based	Off	Off	Off	On	Small caches, `prefix_length=5`
`production`	viterbi	On	On	Refinement	On	Standard caches, `prefix_length=7`
`testing`	rule_based	On	On	Off	On	Small caches for determinism
`fast`	rule_based	Off	Off	Off	Off	`max_edit_distance=1`, `count_threshold=200`
`accurate`	viterbi	On	On	Proactive	On	`max_edit_distance=3`, `beam_width=100`, large caches

ConfigPresets (from SpellCheckerBuilder) and get_profile() are separate configuration systems. Presets are simpler toggles; profiles provide fully-tuned configurations including SymSpell, N-gram, POS tagger, and provider settings.

Configuration Files

You can load configuration from a YAML file instead of code. This is useful for deploying the spell checker in different environments. Load Order:

Explicit path loaded via ConfigLoader().load(config_file="path/to/config.yml").
myspellchecker.yaml, myspellchecker.yml, or myspellchecker.json in the current directory.
~/.config/myspellchecker/myspellchecker.yaml, myspellchecker.yml, or myspellchecker.json (User global config).

Example myspellchecker.yaml:

preset: accurate
max_suggestions: 10
use_phonetic: true

symspell:
  prefix_length: 10

Configuration Parameters

General Settings

max_edit_distance

int

default:"2"

Maximum edit distance for suggestions (1-3). Higher values find more suggestions but are slower.

max_suggestions

int

default:"5"

Maximum number of correction suggestions to return per error.

max_text_length

int

default:"100000"

Maximum input text length in characters. Prevents resource exhaustion on very large inputs.

use_phonetic

bool

default:"True"

Enable phonetic matching (Myanmar Soundex-like) for finding sound-alike corrections.

use_context_checker

bool

default:"True"

Enable N-gram context validation for detecting real-word errors.

use_ner

bool

default:"True"

Enable Named Entity Recognition heuristics to skip proper names.

use_rule_based_validation

bool

default:"True"

Enable algorithmic syllable structure checks.

word_engine

str

default:"myword"

Word segmentation engine: "myword", "crf", or "transformer".

seg_model

str | None

default:"None"

Custom model name or path for transformer word segmentation. Only used when word_engine="transformer". Defaults to chuuhtetnaing/myanmar-text-segmentation-model.

seg_device

int

default:"-1"

Device for transformer word segmentation inference. -1 for CPU, 0+ for GPU index. Only used when word_engine="transformer".

fallback_to_empty_provider

bool

default:"False"

Silently use empty provider if database not found (instead of raising error).

Nested Configuration Objects

symspell

SymSpellConfig

SymSpell algorithm configuration (edit distance, prefix length, beam width).

ngram_context

NgramContextConfig

N-gram context checker configuration (thresholds, smoothing, scoring weights).

phonetic

PhoneticConfig

Phonetic matching configuration (code length, suggestion thresholds).

semantic

SemanticConfig

Semantic model configuration (model path, tokenizer, inference settings).

pos_tagger

POSTaggerConfig

POS tagger configuration (tagger type, model name, device).

joint

JointConfig

Joint segmentation-tagging configuration (beam width, emission weight).

validation

ValidationConfig

Validation behavior configuration (confidence thresholds, feature toggles).

provider_config

ProviderConfig

Provider caching and query configuration (cache size, timeout).

cache

AlgorithmCacheConfig

Unified cache size configuration for all algorithm lookup caches.

ranker

RankerConfig

Suggestion ranking weights and strategy selection.

frequency_guards

FrequencyGuardConfig

Centralized frequency thresholds that suppress false positives across validators (colloquial, homophone, N-gram, semantic).

compound_resolver

CompoundResolverConfig

Compound word synthesis and broken compound detection settings.

reduplication

ReduplicationConfig

Reduplication validation settings for AABB/ABAB patterns.

neural_reranker

NeuralRerankerConfig

Neural suggestion re-ranking model configuration (MLP with ONNX).

broken_compound_strategy

BrokenCompoundStrategyConfig

Broken compound detection strategy thresholds and confidence.

token_refinement

TokenRefinementConfig

Token boundary refinement scoring (exposes hidden errors in merged tokens).

ner

NERConfig

default:"None"

NER model configuration. When provided with enabled=True, uses specified NER model.

SymSpell Settings

Controlled via the symspell attribute (SymSpellConfig).

Hide SymSpellConfig fields

prefix_length

int

default:"10"

Prefix length for indexing. Lower = smaller index but slower lookups.

count_threshold

int

default:"50"

Minimum frequency for a word to be considered valid.

max_word_length

int

default:"15"

Max word length for compound segmentation.

compound_lookup_count

int

default:"3"

Suggestions per word in compound check.

beam_width

int

default:"25"

Beam width for compound segmentation dynamic programming.

damerau_cache_size

int

default:"4096"

LRU cache size for edit distance calculations.

phonetic_bonus_weight

float

default:"0.4"

Weight for phonetic similarity bonus in suggestion ranking.

skip_init

bool

default:"False"

Skip SymSpell initialization (for POS-only use cases).

known_word_frequency_threshold

int

default:"100"

Minimum frequency to consider a word “known” for SymSpell lookups.

use_weighted_distance

bool

default:"True"

Use Myanmar-weighted edit distance for better Myanmar-specific scoring.

use_syllable_distance

bool

default:"True"

Enable syllable-aware edit distance.

use_myanmar_variants

bool

default:"True"

Enable Myanmar variant candidate generation.

myanmar_variant_max_candidates

int

default:"20"

Maximum Myanmar variant candidates per lookup (1-100).

compound_max_suggestions

int

default:"5"

Maximum suggestions for compound queries.

frequency_denominator

float

default:"10000.0"

Denominator for frequency bonus calculation.

Context & N-gram Settings

Controlled via the ngram_context attribute (NgramContextConfig).

Show NgramContextConfig fields

bigram_threshold

float

default:"0.0001"

Probability below which a bigram is flagged as suspicious.

trigram_threshold

float

default:"0.0001"

Probability below which a trigram is flagged as suspicious.

fourgram_threshold

float

default:"0.0001"

Probability below which a 4-gram is flagged as suspicious.

fivegram_threshold

float

default:"0.0001"

Probability below which a 5-gram is flagged as suspicious.

right_context_threshold

float

default:"0.001"

Probability threshold for right context to rescue a word.

edit_distance_weight

float

default:"0.6"

Weight for edit distance in scoring context candidates.

probability_weight

float

default:"0.4"

Weight for probability in scoring context candidates.

edit_distance

int

default:"2"

Max edit distance for context-based corrections.

candidate_limit

int

default:"50"

Max candidates to evaluate in context search.

use_smoothing

bool

default:"True"

Enable probability smoothing for sparse N-gram data.

smoothing_strategy

str

default:"stupid_backoff"

Smoothing strategy: "none", "stupid_backoff", or "add_k".

backoff_weight

float

default:"0.4"

Weight for Stupid Backoff smoothing.

add_k_smoothing

float

default:"0.0"

Add-k smoothing constant (used when smoothing_strategy="add_k").

max_suggestions

int

default:"5"

Maximum context-aware suggestions.

pos_score_weight

float

default:"0.2"

Weight for POS influence in scoring.

unigram_denominator

float

default:"500000000.0"

Denominator for unigram probability estimation.

min_unigram_threshold

int

default:"5"

Minimum frequency for a word to be considered valid in unseen contexts.

Phonetic Settings

Controlled via the phonetic attribute (PhoneticConfig).

Show PhoneticConfig fields

max_code_length

int

default:"10"

Maximum length for phonetic codes.

suggestion_threshold_unseen

float

default:"0.001"

Threshold for unseen phonetic suggestions.

suggestion_threshold_improvement

float

default:"0.01"

Threshold for phonetic improvement.

suggestion_improvement_ratio

float

default:"100.0"

Improvement ratio for phonetic suggestions.

phonetic_bypass_threshold

float

default:"0.85"

Minimum similarity to bypass edit-distance cap.

phonetic_extra_distance

int

default:"1"

Extra edit distance allowed for high-similarity phonetic candidates.

Semantic Model Settings

Controlled via the semantic attribute (SemanticConfig). Requires a trained model.

Show SemanticConfig fields

model_path

str | None

default:"None"

Path to your trained ONNX model file.

tokenizer_path

str | None

default:"None"

Path to tokenizer files or HuggingFace tokenizer directory.

num_threads

int

default:"0"

Number of threads for ONNX inference (CPU only). 0 = auto-detect (use all available cores, recommended).

predict_top_k

int

default:"5"

Top-K predictions for semantic suggestions.

check_top_k

int

default:"10"

Top-K tokens to check for semantic errors.

use_semantic_refinement

bool

default:"True"

Enable semantic refinement in error detection.

use_proactive_scanning

bool

default:"False"

Enable proactive AI-powered error scanning.

proactive_confidence_threshold

float

default:"0.85"

Confidence threshold for proactive error detection (0.0-1.0).

scoring_confidence_threshold

float

default:"0.3"

Confidence threshold for semantic scoring in suggestion ranking.

use_pytorch

bool

default:"False"

Force PyTorch backend instead of ONNX Runtime.

device

str

default:"cpu"

Device for model inference ("cpu" or "cuda:0"). PyTorch only.

validate_model_architecture

bool

default:"True"

Validate that loaded model has MLM architecture on load.

word_alignment_enabled

bool

default:"True"

Enable Myanmar word-aligned masking for BPE tokenizers.

logit_scale

float | None

default:"None"

Override for automatic logit scale detection. Set to a positive float to manually control logit normalization.

myanmar_text_ratio_threshold

float

default:"0.5"

Minimum ratio of Myanmar characters in text to enable semantic checking.

Proactive Semantic Scanning

When enabled, the spell checker will proactively scan sentences for semantic errors using a language model (XLM-RoBERTa, mDeBERTa, etc.). This can detect errors that traditional dictionary-based methods miss.

from myspellchecker.core.config import SpellCheckerConfig, SemanticConfig

config = SpellCheckerConfig(
    semantic=SemanticConfig(
        use_proactive_scanning=True,
        proactive_confidence_threshold=0.7,  # Higher = fewer false positives
    )
)

Note: Proactive scanning requires a trained model and may increase processing time.

POS Tagger Settings

Controlled via the pos_tagger attribute (POSTaggerConfig).

Show POSTaggerConfig fields

tagger_type

str

default:"rule_based"

Backend type: "rule_based", "transformer", "viterbi", or "custom".

model_name

str | None

default:"None"

HuggingFace model ID (for transformer type).

device

int

default:"-1"

Device ID: -1 for CPU, 0+ for GPU index.

batch_size

int

default:"32"

Batch size for transformer inference.

use_morphology_fallback

bool

default:"True"

Use morphology analyzer for OOV words.

cache_size

int

default:"10000"

LRU cache size for rule-based tagger.

beam_width

int

default:"10"

Beam width for Viterbi decoding.

emission_weight

float

default:"1.2"

Emission probability weight for Viterbi.

min_prob

float

default:"1e-10"

Minimum probability threshold to prevent underflow.

hmm_params_path

str | None

default:"None"

Path to precomputed HMM parameters JSON for Viterbi tagger.

unknown_tag

str

default:"UNK"

Tag assigned to completely unknown words.

use_fp16

bool

default:"True"

Use float16 on GPU for transformer tagger (approximately 2x throughput).

use_torch_compile

bool

default:"False"

Use torch.compile() JIT optimization for transformer tagger.

Validation & Error Detection Settings

Controlled via the validation attribute (ValidationConfig).

Show ValidationConfig fields

syllable_error_confidence

float

default:"1.0"

Confidence score for syllable errors.

word_error_confidence

float

default:"0.8"

Confidence score for word errors.

use_zawgyi_detection

bool

default:"True"

Enable Zawgyi encoding detection.

use_zawgyi_conversion

bool

default:"True"

Enable automatic Zawgyi to Unicode conversion.

zawgyi_confidence_threshold

float

default:"0.95"

Confidence threshold for Zawgyi detection.

strict_validation

bool

default:"True"

Enable strict validation rules.

colloquial_strictness

str

default:"lenient"

Strictness for colloquial variants: "strict", "lenient", or "off".

allow_extended_myanmar

bool

default:"False"

Allow Extended Myanmar characters for non-Burmese scripts (Shan, Mon, Karen).

raise_on_strategy_error

bool

default:"False"

Re-raise exceptions from validation strategies (useful for debugging).

homophone_confidence

float

default:"0.8"

Confidence for homophone validation strategy.

homophone_improvement_ratio

float

default:"5.0"

Minimum probability improvement ratio for homophones.

homophone_min_probability

float

default:"0.001"

Minimum N-gram probability threshold for homophone suggestions.

homophone_high_freq_threshold

int

default:"1000"

Word frequency above which stricter ratio applies.

homophone_high_freq_improvement_ratio

float

default:"50.0"

Improvement ratio required for high-frequency words.

semantic_min_word_length

int

default:"2"

Minimum word length for semantic validation.

use_homophone_detection

bool

default:"True"

Enable homophone detection in validation pipeline.

use_orthography_validation

bool

default:"True"

Enable orthography validation in validation pipeline.

use_confusable_semantic

bool

default:"False"

Enable MLM-enhanced confusable detection (requires semantic model).

use_reduplication_validation

bool

default:"True"

Enable reduplication pattern validation (AABB/ABAB).

use_compound_synthesis

bool

default:"True"

Enable compound word synthesis for OOV recovery.

use_broken_compound_detection

bool

default:"True"

Enable detection of incorrectly split compound words.

medial_confusion_confidence

float

default:"0.85"

Confidence for medial confusion (ျ vs ြ) errors.

orthography_confidence

float

default:"0.9"

Confidence for orthography validation errors.

tone_validation_confidence

float

default:"0.5"

Confidence for tone validation errors.

syntactic_validation_confidence

float

default:"0.9"

Confidence for syntactic validation errors.

pos_sequence_confidence

float

default:"0.85"

Confidence for POS sequence validation errors.

enable_strategy_timing

bool

default:"False"

Enable per-strategy timing instrumentation.

enable_strategy_debug

bool

default:"False"

Enable debug logging for individual validation strategies.

Provider Settings

Controlled via the provider_config attribute (ProviderConfig).

Parameter	Default	Description
`cache_size`	`1024`	LRU cache size for database queries.
`pool_min_size`	`1`	Minimum connections in pool.
`pool_max_size`	`5`	Maximum connections in pool (smaller is better for SQLite).
`pool_timeout`	`5.0`	Connection checkout timeout in seconds.
`pool_max_connection_age`	`3600.0`	Max connection age before recreation (seconds).

Connection Pooling

Connection pooling manages SQLite database connections for better resource control and production safety. The library uses connection pooling by default to ensure robust production behavior. Benefits:

Resource control with hard connection limits
Connection health monitoring and automatic recreation
Observability through pool statistics
Graceful degradation under load

Pool size recommendations:

Keep pool_max_size small (2-5) to reduce lock contention
Set pool_min_size=1 for most cases
Larger pools (>10) degrade performance due to lock contention

Example configurations:

from myspellchecker.core.config import SpellCheckerConfig, ProviderConfig

# Default configuration
config = SpellCheckerConfig(
    provider_config=ProviderConfig(
        pool_min_size=1,
        pool_max_size=5,
    )
)

# High-concurrency configuration
config = SpellCheckerConfig(
    provider_config=ProviderConfig(
        pool_min_size=2,
        pool_max_size=10,  # May impact performance
    )
)

# Custom timeout and connection age
config = SpellCheckerConfig(
    provider_config=ProviderConfig(
        pool_timeout=10.0,  # Wait up to 10s for connection
        pool_max_connection_age=7200.0,  # Recreate after 2 hours
    )
)

Performance characteristics:

Pooling adds ~30-50% overhead compared to direct connections
Overhead comes from queue operations, locking, and health checks
Trade-off: Performance vs. resource control and production safety
See tests/test_connection_pool.py for comprehensive test coverage

Joint Segmentation-Tagging Settings

The joint parameter accepts a JointConfig object for unified word segmentation and POS tagging.

from myspellchecker.core.config import SpellCheckerConfig, JointConfig

config = SpellCheckerConfig(
    joint=JointConfig(
        enabled=True,
        beam_width=15,
    )
)

Parameter	Default	Description
`enabled`	`False`	Enable joint segmentation-tagging mode.
`beam_width`	`15`	Number of hypotheses to keep per position.
`max_word_length`	`20`	Maximum word length in characters.
`emission_weight`	`1.2`	Weight for P(tag\|word) emission probabilities.
`word_score_weight`	`1.0`	Weight for word N-gram language model scores.
`min_prob`	`1e-10`	Minimum probability for smoothing.
`use_morphology_fallback`	`True`	Use morphology for OOV word tag guessing.

See Segmentation - Joint Mode for detailed usage.

Frequency Guard Settings

Controlled via the frequency_guards attribute (FrequencyGuardConfig). Centralized thresholds that suppress false positives across validators.

Parameter	Default	Description
`colloquial_high_freq_suppression`	`100000`	Suppress colloquial info for words above this frequency.
`homophone_high_freq`	`1000`	Apply stricter homophone ratio above this frequency.
`homophone_high_freq_ratio`	`50.0`	Improvement ratio required for high-frequency homophone words.
`ngram_high_freq_guard`	`5000`	Suppress N-gram false positives for words above this frequency.
`semantic_high_freq_protection`	`50000`	Apply high-frequency logit diff threshold above this frequency.

Compound Resolver Settings

Controlled via the compound_resolver attribute (CompoundResolverConfig). Handles compound word synthesis for OOV recovery.

Parameter	Default	Description
`min_morpheme_frequency`	`10`	Minimum frequency per morpheme.
`max_parts`	`4`	Maximum compound splits (2-8).
`cache_size`	`1024`	LRU cache entries.
`base_confidence`	`0.85`	Base confidence score for compound matches.
`high_freq_boost`	`0.05`	Confidence boost if min morpheme frequency >= 100.
`medium_freq_boost`	`0.03`	Confidence boost if min morpheme frequency >= 50.
`extra_parts_penalty`	`0.05`	Confidence penalty per extra part beyond 2.

Reduplication Settings

Controlled via the reduplication attribute (ReduplicationConfig). Validates Myanmar reduplication patterns (AABB, ABAB, rhyme).

Parameter	Default	Description
`min_base_frequency`	`5`	Minimum base word frequency for validation.
`cache_size`	`1024`	LRU cache entries.
`pattern_confidence_ab`	`0.90`	Confidence for AB simple doubling (e.g., ခဏခဏ).
`pattern_confidence_aabb`	`0.85`	Confidence for AABB syllable doubling (e.g., သေသေချာချာ).
`pattern_confidence_abab`	`0.85`	Confidence for ABAB word repeat.
`pattern_confidence_rhyme`	`0.95`	Confidence for rhyme reduplication patterns.

Neural Reranker Settings

Controlled via the neural_reranker attribute (NeuralRerankerConfig). MLP-based suggestion re-ranking using ONNX.

Parameter	Default	Description
`enabled`	`False`	Enable neural reranking (requires trained model).
`model_path`	`None`	Path to ONNX reranker model.
`stats_path`	`None`	Path to normalization statistics JSON.
`confidence_gap_threshold`	`0.15`	Skip reranking when top-2 confidence gap exceeds this.
`max_candidates`	`20`	Maximum candidates to score per error.

Broken Compound Strategy Settings

Controlled via the broken_compound_strategy attribute (BrokenCompoundStrategyConfig). Tunes the validation strategy that detects incorrectly split compound words.

Parameter	Default	Description
`rare_threshold`	`2000`	Frequency below which a word is considered rare.
`compound_min_frequency`	`5000`	Minimum compound frequency to flag broken compound.
`compound_ratio`	`5.0`	Minimum ratio of compound_freq / rare_word_freq.
`confidence`	`0.8`	Default confidence for broken compound errors.
`both_high_freq`	`5000`	Frequency guard for multi-syllable both-high compounds.
`min_compound_len`	`4`	Minimum compound length for both-high-freq guard.

Controlled via the token_refinement attribute (TokenRefinementConfig). Tunes the validation-time token-lattice refinement pass that exposes hidden error spans in merged tokens (e.g., particle attachment, negation attachment).

Parameter	Default	Description
`suffix_score_boost`	`0.85`	Score boost when suffix matches a known form.
`known_part_score`	`1.35`	Score for known dictionary parts.
`unknown_long_part_penalty`	`0.45`	Penalty for unknown long parts.
`split_complexity_penalty`	`0.30`	Penalty for complex multi-part splits.
`bigram_scale`	`120000.0`	Scaling factor for bigram probability contribution.
`min_token_len`	`3`	Minimum token length for refinement candidates.
`keep_if_freq_at_least`	`2000`	Keep token if frequency is at least this value.
`min_score_gain`	`0.55`	Minimum score improvement to accept a split.
`lattice_max_paths`	`2`	Maximum lattice paths to consider.
`syllable_split_min_token_len`	`4`	Minimum token length for syllable-level splitting.
`syllable_split_max_syllables`	`6`	Maximum syllables for syllable-level splitting.

Integrated Features

The following features are automatically integrated into the validation pipeline. Most are enabled by default and work transparently.

Particle Typo Detection

Automatically detects common Myanmar particle typos using PARTICLE_TYPO_PATTERNS. Examples:

တယ → တယ် (statement ending, missing asat)
နဲ → နဲ့ (with, missing tone)
သလာ → သလား (question, missing tone)

These patterns have 0.90-0.95 confidence and are checked during context validation.

Medial Confusion Detection

Catches context-aware ျ vs ြ medial confusion using MEDIAL_CONFUSION_PATTERNS. For example:

ကြီး vs ကျီး (big vs crow)
ပြု vs ပျု (do vs -)

Morphology OOV Recovery

For out-of-vocabulary (OOV) words, the system attempts to recover the root by stripping common suffixes:

Verb suffixes: သည်, ခဲ့, မည်, နေ, etc.
Noun suffixes: များ, တို့, etc.

This improves suggestion quality for inflected forms.

POS Sequence Validation

Uses ViterbiTagger output to detect invalid POS sequences:

V-V (consecutive verbs without particles)
P-P (consecutive particles)
Invalid tag sequences defined in INVALID_POS_SEQUENCES

Question Detection

Identifies sentence types (question/statement) and validates question particle usage:

Detects question words: ဘာ, ဘယ်, ဘယ်လို, etc.
Validates question particles: လား, လဲ, သလဲ, etc.

Unified Suggestion Ranking

Suggestions from different sources are ranked using UnifiedRanker with source-specific weights:

Source	Weight	Priority
`particle_typo`	1.2	Highest
`semantic`	1.15	High
`context`	1.15	High
`medial_confusion`	1.1	Medium-High
`medial_swap`	1.0	Base
`question_structure`	1.0	Base
`symspell`	1.0	Base
`compound`	0.95	Medium
`morphology`	0.9	Medium
`morpheme`	0.85	Medium-Low
`pos_sequence`	0.85	Medium-Low

Tone Disambiguation

The ToneDisambiguator provides context-aware correction for commonly confused Myanmar tone marks. Available via:

from myspellchecker.text.tone import ToneDisambiguator

disambiguator = ToneDisambiguator()
corrections = disambiguator.check_sentence(["word1", "word2", ...])

Handles ambiguous words like:

သား (son, disambiguated by family context patterns)
ငါ (I/me vs ငါး fish, detects missing visarga in numeral contexts)
ပဲ (only vs bean)

​Usage

​Configuration Presets

​Configuration Profiles

​Configuration Files

​Configuration Parameters

​General Settings

​Nested Configuration Objects

​SymSpell Settings

​Context & N-gram Settings

​Phonetic Settings

​Semantic Model Settings

​Proactive Semantic Scanning

​POS Tagger Settings

​Validation & Error Detection Settings

​Provider Settings

​Connection Pooling

​Joint Segmentation-Tagging Settings

​Frequency Guard Settings

​Compound Resolver Settings

​Reduplication Settings

​Neural Reranker Settings

​Broken Compound Strategy Settings

​Token Refinement Settings

​Integrated Features

​Particle Typo Detection

​Medial Confusion Detection

​Morphology OOV Recovery

​POS Sequence Validation

​Question Detection

​Unified Suggestion Ranking

​Tone Disambiguation

Usage

Configuration Presets

Configuration Profiles

Configuration Files

Configuration Parameters

General Settings

Nested Configuration Objects

SymSpell Settings

Context & N-gram Settings

Phonetic Settings

Semantic Model Settings

Proactive Semantic Scanning

POS Tagger Settings

Validation & Error Detection Settings

Provider Settings

Connection Pooling

Joint Segmentation-Tagging Settings

Frequency Guard Settings

Compound Resolver Settings

Reduplication Settings

Neural Reranker Settings

Broken Compound Strategy Settings

Token Refinement Settings

Integrated Features

Particle Typo Detection

Medial Confusion Detection

Morphology OOV Recovery

POS Sequence Validation

Question Detection

Unified Suggestion Ranking

Tone Disambiguation