Glossary - mySpellChecker

Key concepts, abbreviations, and Myanmar script terminology referenced throughout this documentation, organized by topic.

Myanmar Script & Unicode

Characters & Components

Term	Definition
Consonant	One of 35 consonant characters in Myanmar script: 34 base consonants (U+1000–U+1021) plus Great Sa (ဿ, U+103F).
Dependent Vowel	Vowel signs that attach to consonants (U+102B–U+1032). Cannot stand alone.
Independent Vowel	Vowel characters that can stand alone without a consonant (U+1023–U+102A).
Medial	Consonant modifiers that appear between the base consonant and vowel. Four medials exist: ျ (ya-pin), ြ (ya-yit), ွ (wa-hswe), ှ (ha-htoe).
Syllable	The fundamental unit of Myanmar text. Consists of consonant + optional medials + optional vowels + optional finals.
Tone Mark	Characters that indicate tone, nasalization, or emphasis: ံ (anusvara, U+1036), ့ (dot below, U+1037), and း (visarga, U+1038).

Special Characters

Term	Definition
Asat (်)	Myanmar Unicode character (U+103A) that “kills” the inherent vowel of a consonant, creating a final consonant sound. Also called “killer” or “vowel killer”.
Anusvara (ံ)	Myanmar Unicode character (U+1036) indicating nasalization of the preceding vowel.
Kinzi	A special stacking form where င် appears above the following consonant using virama (္).
Virama (္)	Myanmar Unicode character (U+1039) used for consonant stacking.
Visarga (း)	Myanmar Unicode character (U+1038) indicating emphasis or sentence finality.
Zero-Width Characters	Invisible Unicode characters (ZWSP, ZWNJ, ZWJ, BOM) that should typically be removed during normalization.

Encoding & Normalization

Term	Definition
Unicode	International standard for text encoding. Myanmar script uses range U+1000–U+109F plus extensions.
Myanmar Extended-A	Unicode block U+AA60–U+AA7F containing additional characters for Shan and other languages.
Myanmar Extended-B	Unicode block U+A9E0–U+A9FF containing additional characters for Shan and Pao languages.
NFC (Normalization Form Composed)	Unicode normalization form where characters are stored as precomposed units. Recommended for Myanmar text.
Normalization	Process of converting text to a standard form, including removing zero-width characters and applying Unicode normalization.
Zawgyi	Legacy font/encoding for Myanmar script that differs from Unicode. mySpellChecker can detect and convert Zawgyi text.

Pipeline Layers

Term	Definition
Syllable-First Approach	Validate at the syllable level first, since syllables can be identified without a dictionary, then move to word and context levels.
Validation Level	Configuration option specifying depth of checking: `SYLLABLE` or `WORD` (defined in `ValidationLevel` enum).
Word Validation	Layer 2 of the validation pipeline that checks words against the dictionary and generates suggestions.
Grammar Checking	Layer 2.5 validation that applies syntactic rules to detect grammatical errors.
Context Validation	Layer 3 of the validation pipeline that uses N-gram probabilities to detect real-word errors.
Semantic Validation	Optional deep validation using neural network models to understand meaning.

Error Types

Term	Definition
Real-Word Error	A spelling error where the misspelled word is itself a valid word but wrong in context.

Spelling Correction

Term	Definition
SymSpell	Algorithm for extremely fast spelling correction using symmetric delete operations.
Edit Distance	The minimum number of single-character operations (insert, delete, substitute) needed to transform one string into another.
Levenshtein Distance	Edit distance metric measuring single-character insertions, deletions, and substitutions.
Damerau-Levenshtein Distance	Edit distance metric that includes transposition as a single operation. Used for generating spelling suggestions.

Context & Tagging

Term	Definition
N-gram	A contiguous sequence of N items (syllables or words). Used in context validation.
Bigram	A sequence of two consecutive tokens (syllables or words) used for context analysis.
Trigram	A sequence of three consecutive tokens used for context analysis.
Part-of-Speech (POS) Tagging	Process of marking words with their grammatical category (noun, verb, particle, etc.).
Viterbi Algorithm	Dynamic programming algorithm used for POS tagging to find the most likely sequence of tags.

Models & Inference

Term	Definition
ONNX	Open Neural Network Exchange format used for semantic model deployment.

Storage & Processing

Term	Definition
Dictionary Provider	Pluggable storage backend for dictionary data. Implementations include SQLite, Memory, JSON.
Frequency	The count of how often a word or syllable appears in a corpus.
Segmentation	Process of breaking text into meaningful units (syllables or words).

Development Tools

Term	Definition
Cython	A programming language that makes writing C extensions for Python easy. Used in mySpellChecker for performance-critical paths.
OpenMP	API for parallel programming. Used in Cython extensions for batch processing.