Documentation Index
Fetch the complete documentation index at: https://docs.myspellchecker.com/llms.txt
Use this file to discover all available pages before exploring further.
When generating spelling suggestions, the library needs to know which characters sound alike, look alike, or differ only by tone. These tables power the phonetic hasher, visual confusion detection, and tonal variant generation used throughout the suggestion pipeline.
Overview
from myspellchecker.text.phonetic_data import (
PHONETIC_GROUPS,
VISUAL_SIMILAR,
TONAL_GROUPS,
COLLOQUIAL_SUBSTITUTIONS,
)
# Check phonetic group for a consonant
labial_consonants = PHONETIC_GROUPS["p"] # labial consonants list
# Check visual similarity
similar_to_i = VISUAL_SIMILAR["ိ"] # visually similar characters
Phonetic Groups
Characters grouped by phonetic similarity (same sound category):
Consonant Groups
| Group Key | Name | Characters | IPA |
|---|
p | Labial | ပ, ဖ, ဗ, ဘ | /p, pʰ, b, bʰ/ |
t | Alveolar | တ, ထ, ဒ, ဓ | /t, tʰ, d, dʰ/ |
k | Velar | က, ခ, ဂ, ဃ | /k, kʰ, ɡ, ɡʰ/ |
c | Palatal | စ, ဆ, ဇ, ဈ | /s, sʰ, z, zʰ/ |
ṭ | Retroflex | ဋ, ဌ, ဍ, ဎ | /ʈ, ʈʰ, ɖ, ɖʰ/ |
s | Sibilant | သ, ဿ | /θ/ |
h | Glottal | ဟ | /h/ |
Nasal Groups
| Group Key | Name | Characters |
|---|
m | Nasal M | မ |
n | Nasal N | န |
ng | Nasal NG | င |
ny | Nasal NY | ည, ဉ |
n_retro | Retroflex N | ဏ |
Approximants and Liquids
| Group Key | Name | Characters |
|---|
l | Liquid L | လ, ဠ |
r | Liquid R | ရ |
y | Approximant Y | ယ |
w | Approximant W | ဝ |
| Group Key | Name | Character | Unicode |
|---|
medial_y | Ya-pin | ျ | U+103B |
medial_r | Ya-yit | ြ | U+103C |
medial_w | Wa-hswe | ွ | U+103D |
medial_h | Ha-htoe | ှ | U+103E |
Vowels
| Group Key | Name | Characters | IPA |
|---|
vowel_a | Vowel A | ာ, ါ (U+102B) | /a/ |
vowel_carrier | Vowel Carrier | အ (U+1021) | /ʔa/ |
vowel_i | Vowel I | ိ, ီ, ဣ (U+1023), ဤ (U+1024) | /i/ |
vowel_u | Vowel U | ု, ူ, ဥ (U+1025), ဦ (U+1026) | /u/ |
vowel_e | Vowel E | ေ, ဧ (U+1027) | /e/ |
vowel_ai | Vowel AI | ဲ | /ɛ/ |
vowel_o | Vowel O | ဩ (U+1029), ဪ (U+102A) | /o/ |
Tone
| Group Key | Name | Characters | Unicode |
|---|
tone | Tone Marks | ံ, ့, း | U+1036 (Anusvara), U+1037 (Dot Below), U+1038 (Visarga) |
Visual Similarity
Characters that look similar and are commonly confused. Accessed via VISUAL_SIMILAR dict. Most pairs are mapped bidirectionally.
| Character | Confused With | Description |
|---|
| ိ | ီ | Short i vs long ii |
| ီ | ိ | Long ii vs short i |
| ု | ူ | Short u vs long uu |
| ူ | ု | Long uu vs short u |
| ာ | ါ | Different aa marks |
| ါ | ာ | Tall AA vs regular AA |
| ျ | ြ | Ya-pin vs ya-yit |
| ြ | ျ | Ya-yit vs ya-pin |
| ွ | ှ | Wa-hswe vs ha-htoe |
| ှ | ွ | Ha-htoe vs wa-hswe |
Consonant Confusions
| Character | Confused With | Description |
|---|
| န | ည | Na vs nya |
| င | ည, ဉ | Nga vs nya variants |
| ရ | ယ | Ra vs ya |
| ယ | ရ | Ya vs ra |
| သ | ဿ | Sa vs great sa |
| ဿ | သ | Great sa vs sa |
| ပ | ဗ | Pa vs ba |
| ဗ | ပ | Ba vs pa |
| ည | ဉ | Nya vs archaic nya |
| ဉ | ည | Archaic nya vs nya |
Aspirated vs Unaspirated Pairs
| Unaspirated | Aspirated |
|---|
| က | ခ |
| ဂ | ဃ |
| စ | ဆ |
| တ | ထ |
| ဒ | ဓ |
| ဖ | ဘ |
| ဋ | ဌ |
| ဍ | ဎ |
Other Confusions
| Character | Confused With | Description |
|---|
| လ | ဠ | La vs great la |
| ဠ | လ | Great la vs la |
| ဝ | ၀ | Wa consonant (U+101D) vs zero digit (U+1040) |
| ၀ | ဝ | Zero digit (U+1040) vs Wa consonant (U+101D) |
Tonal Groups
Characters that differ by tone, commonly confused in typing. Accessed via TONAL_GROUPS dict.
| Base | Tonal Variants | Category |
|---|
| ာ | ာ, ့, း, ား | Vowel A |
| ါ | ါ, ့, း, ါး | Vowel A (tall AA, U+102B) |
| ိ | ိ, ီ, ိ့, ီး | Vowel I |
| ီ | ိ, ီ, ိ့, ီး | Vowel I |
| ု | ု, ူ, ု့, ူး | Vowel U |
| ူ | ု, ူ, ု့, ူး | Vowel U |
| ေ | ေ, ေ့, ေး | Vowel E |
| ဲ | ဲ, ဲ့ | Vowel AI |
| ော | ော, ော့, ော် | Vowel O (combined) |
| ့ | (empty), း | Tone mark (Dot Below → Visarga) |
| း | (empty), ့ | Tone mark (Visarga → Dot Below) |
Colloquial Substitutions
Multi-character substitutions found in colloquial/social media text. The COLLOQUIAL_SUBSTITUTIONS dict maps colloquial forms to their standard equivalents (25 entries total).
Particles
| Colloquial | Standard | Description |
|---|
| အုန်း | ဦး | Coconut → Particle |
| အုံး | ဦး | Pillow → Particle |
Verb Endings
| Colloquial | Standard | Description |
|---|
| ပါဘူး | မပါဘူး | Shortened negation |
| တာပဲ | တာပါပဲ | Shortened emphasis |
Pronouns
| Colloquial | Standard | Description |
|---|
| ကျနော် | ကျွန်တော် | Male 1st person (colloquial) |
| ကျွနော် | ကျွန်တော် | Male 1st person (variant) |
| ကျမ | ကျွန်မ | Female 1st person (colloquial) |
| မင်း | သင် | 2nd person (informal → formal) |
| ငါ | ကျွန်တော်, ကျွန်မ | 1st person (very informal) |
| သူတို့ | သူများ | 3rd person plural |
Common Words
| Colloquial | Standard | Description |
|---|
| ဟုတ် | ဟုတ်ကဲ့ | Yes (shortened) |
| အို | အိုး | Pot/exclamation (without visarga) |
| အဲ | ထို | That (colloquial → formal) |
| အဲဒါ | ထိုအရာ | That thing (colloquial → formal) |
| ဘယ်လို | မည်သို့ | How (colloquial → formal) |
| ဘာကြောင့် | အဘယ်ကြောင့် | Why (colloquial → formal) |
Adverbs and Reduplication
| Colloquial | Standard | Description |
|---|
| တော်တော် | အလွန် | Very (colloquial → formal) |
| သိပ် | အလွန် | Very (colloquial → formal) |
| ရမ်းရမ်း | အလွန် | Very (very colloquial) |
| ကောင်းကောင်း | ကောင်းမွန်စွာ | Well |
| မြန်မြန် | မြန်ဆန်စွာ | Quickly |
| နှေးနှေး | နှေးကွေးစွာ | Slowly |
Contractions and Texting
| Colloquial | Standard | Description |
|---|
| လို့ပဲ | ထို့ကြောင့် | Because (contracted) |
| ရင် | လျှင် | If (colloquial → formal) |
| 555 | ဟာဟာဟာ | Laughing (Thai style) |
Reverse Mapping: STANDARD_TO_COLLOQUIAL
The STANDARD_TO_COLLOQUIAL dictionary is the inverse of COLLOQUIAL_SUBSTITUTIONS. It maps each standard form back to its set of colloquial variants. This is built automatically at module load time.
Helper Functions
from myspellchecker.text.phonetic_data import (
is_colloquial_variant,
get_standard_forms,
STANDARD_TO_COLLOQUIAL,
)
is_colloquial_variant("ငါ") # True
get_standard_forms("unknown") # set() (empty)
Usage Examples
PhoneticHasher Integration
from myspellchecker.text.phonetic import PhoneticHasher
from myspellchecker.text.phonetic_data import PHONETIC_GROUPS
class PhoneticHasher:
def __init__(self):
# Build reverse mapping from char to group
self.char_to_group = {}
for group, chars in PHONETIC_GROUPS.items():
for char in chars:
self.char_to_group[char] = group
def hash(self, word: str) -> str:
"""Generate phonetic hash."""
result = []
for char in word:
if char in self.char_to_group:
result.append(self.char_to_group[char])
else:
result.append(char)
return "".join(result)
Visual Confusion Detection
from myspellchecker.text.phonetic_data import VISUAL_SIMILAR
def find_visual_variants(word: str) -> List[str]:
"""Generate visually similar variants of a word."""
variants = []
for i, char in enumerate(word):
if char in VISUAL_SIMILAR:
for similar in VISUAL_SIMILAR[char]:
variant = word[:i] + similar + word[i+1:]
variants.append(variant)
return variants
# Example
variants = find_visual_variants("ကိုယ်")
# ["ကီုယ်"] - short i replaced with long ii
Tonal Variant Generation
from myspellchecker.text.phonetic_data import TONAL_GROUPS
def generate_tonal_variants(word: str) -> List[str]:
"""Generate tonal variants of a word."""
variants = [word]
for i, char in enumerate(word):
if char in TONAL_GROUPS:
for variant_char in TONAL_GROUPS[char]:
if variant_char != char:
variant = word[:i] + variant_char + word[i+1:]
variants.append(variant)
return variants
# Example
variants = generate_tonal_variants("လာ")
# ["လာ", "လာ့", "လား", ...]
Data Constants
Available constants and functions:
from myspellchecker.text.phonetic_data import (
PHONETIC_GROUPS, # Phonetic similarity groups
VISUAL_SIMILAR, # Visual confusability mapping
MYANMAR_SUBSTITUTION_COSTS, # Weighted edit distance costs
TONAL_GROUPS, # Tonal variant mappings
COLLOQUIAL_SUBSTITUTIONS, # Colloquial -> standard mappings
STANDARD_TO_COLLOQUIAL, # Standard -> colloquial reverse mapping
is_colloquial_variant, # Check if word is colloquial
get_standard_forms, # Get standard forms for colloquial
)
Phoneme-Grapheme Notes
E vs AI Vowels
The module correctly distinguishes:
ေ (U+1031) - E vowel, IPA /e/, prefix position
ဲ (U+1032) - AI vowel, IPA /ɛ/, suffix position
These are phonetically distinct and should NOT be treated as interchangeable.
Aspirated vs Voiced
Consonant groups contain both aspirated and voiced variants:
ပ (unaspirated) vs ဖ (aspirated) vs ဗ (voiced)
- These sound similar and are often confused
See Also