Skip to main content
Key concepts, abbreviations, and Myanmar script terminology referenced throughout this documentation, organized by topic.

Myanmar Script & Unicode

TermDefinition
ConsonantOne of 35 consonant characters in Myanmar script: 34 base consonants (U+1000–U+1021) plus Great Sa (ဿ, U+103F).
Dependent VowelVowel signs that attach to consonants (U+102B–U+1032). Cannot stand alone.
Independent VowelVowel characters that can stand alone without a consonant (U+1023–U+102A).
MedialConsonant modifiers that appear between the base consonant and vowel. Four medials exist: ျ (ya-pin), ြ (ya-yit), ွ (wa-hswe), ှ (ha-htoe).
SyllableThe fundamental unit of Myanmar text. Consists of consonant + optional medials + optional vowels + optional finals.
Tone MarkCharacters that indicate tone, nasalization, or emphasis: ံ (anusvara, U+1036), ့ (dot below, U+1037), and း (visarga, U+1038).
TermDefinition
Asat (်)Myanmar Unicode character (U+103A) that “kills” the inherent vowel of a consonant, creating a final consonant sound. Also called “killer” or “vowel killer”.
Anusvara (ံ)Myanmar Unicode character (U+1036) indicating nasalization of the preceding vowel.
KinziA special stacking form where င် appears above the following consonant using virama (္).
Virama (္)Myanmar Unicode character (U+1039) used for consonant stacking.
Visarga (း)Myanmar Unicode character (U+1038) indicating emphasis or sentence finality.
Zero-Width CharactersInvisible Unicode characters (ZWSP, ZWNJ, ZWJ, BOM) that should typically be removed during normalization.
TermDefinition
UnicodeInternational standard for text encoding. Myanmar script uses range U+1000–U+109F plus extensions.
Myanmar Extended-AUnicode block U+AA60–U+AA7F containing additional characters for Shan and other languages.
Myanmar Extended-BUnicode block U+A9E0–U+A9FF containing additional characters for Shan and Pao languages.
NFC (Normalization Form Composed)Unicode normalization form where characters are stored as precomposed units. Recommended for Myanmar text.
NormalizationProcess of converting text to a standard form, including removing zero-width characters and applying Unicode normalization.
ZawgyiLegacy font/encoding for Myanmar script that differs from Unicode. mySpellChecker can detect and convert Zawgyi text.

Validation Pipeline

TermDefinition
Syllable-First ApproachValidate at the syllable level first, since syllables can be identified without a dictionary, then move to word and context levels.
Validation LevelConfiguration option specifying depth of checking: SYLLABLE or WORD (defined in ValidationLevel enum).
Word ValidationLayer 2 of the validation pipeline that checks words against the dictionary and generates suggestions.
Grammar CheckingLayer 2.5 validation that applies syntactic rules to detect grammatical errors.
Context ValidationLayer 3 of the validation pipeline that uses N-gram probabilities to detect real-word errors.
Semantic ValidationOptional deep validation using neural network models to understand meaning.
TermDefinition
Real-Word ErrorA spelling error where the misspelled word is itself a valid word but wrong in context.

Algorithms

TermDefinition
SymSpellAlgorithm for extremely fast spelling correction using symmetric delete operations.
Edit DistanceThe minimum number of single-character operations (insert, delete, substitute) needed to transform one string into another.
Levenshtein DistanceEdit distance metric measuring single-character insertions, deletions, and substitutions.
Damerau-Levenshtein DistanceEdit distance metric that includes transposition as a single operation. Used for generating spelling suggestions.
TermDefinition
N-gramA contiguous sequence of N items (syllables or words). Used in context validation.
BigramA sequence of two consecutive tokens (syllables or words) used for context analysis.
TrigramA sequence of three consecutive tokens used for context analysis.
Part-of-Speech (POS) TaggingProcess of marking words with their grammatical category (noun, verb, particle, etc.).
Viterbi AlgorithmDynamic programming algorithm used for POS tagging to find the most likely sequence of tags.
TermDefinition
ONNXOpen Neural Network Exchange format used for semantic model deployment.

Data & Infrastructure

TermDefinition
Dictionary ProviderPluggable storage backend for dictionary data. Implementations include SQLite, Memory, JSON.
FrequencyThe count of how often a word or syllable appears in a corpus.
SegmentationProcess of breaking text into meaningful units (syllables or words).
TermDefinition
CythonA programming language that makes writing C extensions for Python easy. Used in mySpellChecker for performance-critical paths.
OpenMPAPI for parallel programming. Used in Cython extensions for batch processing.