Overview
Phonetic Groups
Characters grouped by phonetic similarity (same sound category):Consonant Groups
| Group Key | Name | Characters | IPA |
|---|---|---|---|
p | Labial | ပ, ဖ, ဗ, ဘ | /p, pʰ, b, bʰ/ |
t | Alveolar | တ, ထ, ဒ, ဓ | /t, tʰ, d, dʰ/ |
k | Velar | က, ခ, ဂ, ဃ | /k, kʰ, ɡ, ɡʰ/ |
c | Palatal | စ, ဆ, ဇ, ဈ | /s, sʰ, z, zʰ/ |
ṭ | Retroflex | ဋ, ဌ, ဍ, ဎ | /ʈ, ʈʰ, ɖ, ɖʰ/ |
s | Sibilant | သ, ဿ | /θ/ |
h | Glottal | ဟ | /h/ |
Nasal Groups
| Group Key | Name | Characters |
|---|---|---|
m | Nasal M | မ |
n | Nasal N | န |
ng | Nasal NG | င |
ny | Nasal NY | ည, ဉ |
n_retro | Retroflex N | ဏ |
Approximants and Liquids
| Group Key | Name | Characters |
|---|---|---|
l | Liquid L | လ, ဠ |
r | Liquid R | ရ |
y | Approximant Y | ယ |
w | Approximant W | ဝ |
Medials
| Group Key | Name | Character | Unicode |
|---|---|---|---|
medial_y | Ya-pin | ျ | U+103B |
medial_r | Ya-yit | ြ | U+103C |
medial_w | Wa-hswe | ွ | U+103D |
medial_h | Ha-htoe | ှ | U+103E |
Vowels
| Group Key | Name | Characters | IPA |
|---|---|---|---|
vowel_a | Vowel A | ာ, ါ (U+102B) | /a/ |
vowel_carrier | Vowel Carrier | အ (U+1021) | /ʔa/ |
vowel_i | Vowel I | ိ, ီ, ဣ (U+1023), ဤ (U+1024) | /i/ |
vowel_u | Vowel U | ု, ူ, ဥ (U+1025), ဦ (U+1026) | /u/ |
vowel_e | Vowel E | ေ, ဧ (U+1027) | /e/ |
vowel_ai | Vowel AI | ဲ | /ɛ/ |
vowel_o | Vowel O | ဩ (U+1029), ဪ (U+102A) | /o/ |
Tone
| Group Key | Name | Characters | Unicode |
|---|---|---|---|
tone | Tone Marks | ံ, ့, း | U+1036 (Anusvara), U+1037 (Dot Below), U+1038 (Visarga) |
Visual Similarity
Characters that look similar and are commonly confused. Accessed viaVISUAL_SIMILAR dict. Most pairs are mapped bidirectionally.
Vowel and Medial Confusions
| Character | Confused With | Description |
|---|---|---|
| ိ | ီ | Short i vs long ii |
| ီ | ိ | Long ii vs short i |
| ု | ူ | Short u vs long uu |
| ူ | ု | Long uu vs short u |
| ာ | ါ | Different aa marks |
| ါ | ာ | Tall AA vs regular AA |
| ျ | ြ | Ya-pin vs ya-yit |
| ြ | ျ | Ya-yit vs ya-pin |
| ွ | ှ | Wa-hswe vs ha-htoe |
| ှ | ွ | Ha-htoe vs wa-hswe |
Consonant Confusions
| Character | Confused With | Description |
|---|---|---|
| န | ည | Na vs nya |
| င | ည, ဉ | Nga vs nya variants |
| ရ | ယ | Ra vs ya |
| ယ | ရ | Ya vs ra |
| သ | ဿ | Sa vs great sa |
| ဿ | သ | Great sa vs sa |
| ပ | ဗ | Pa vs ba |
| ဗ | ပ | Ba vs pa |
| ည | ဉ | Nya vs archaic nya |
| ဉ | ည | Archaic nya vs nya |
Aspirated vs Unaspirated Pairs
| Unaspirated | Aspirated |
|---|---|
| က | ခ |
| ဂ | ဃ |
| စ | ဆ |
| တ | ထ |
| ဒ | ဓ |
| ဖ | ဘ |
| ဋ | ဌ |
| ဍ | ဎ |
Other Confusions
| Character | Confused With | Description |
|---|---|---|
| လ | ဠ | La vs great la |
| ဠ | လ | Great la vs la |
| ဝ | ၀ | Wa consonant (U+101D) vs zero digit (U+1040) |
| ၀ | ဝ | Zero digit (U+1040) vs Wa consonant (U+101D) |
Tonal Groups
Characters that differ by tone, commonly confused in typing. Accessed viaTONAL_GROUPS dict.
| Base | Tonal Variants | Category |
|---|---|---|
| ာ | ာ, ့, း, ား | Vowel A |
| ါ | ါ, ့, း, ါး | Vowel A (tall AA, U+102B) |
| ိ | ိ, ီ, ိ့, ီး | Vowel I |
| ီ | ိ, ီ, ိ့, ီး | Vowel I |
| ု | ု, ူ, ု့, ူး | Vowel U |
| ူ | ု, ူ, ု့, ူး | Vowel U |
| ေ | ေ, ေ့, ေး | Vowel E |
| ဲ | ဲ, ဲ့ | Vowel AI |
| ော | ော, ော့, ော် | Vowel O (combined) |
| ့ | (empty), း | Tone mark (Dot Below → Visarga) |
| း | (empty), ့ | Tone mark (Visarga → Dot Below) |
Colloquial Substitutions
Multi-character substitutions found in colloquial/social media text. TheCOLLOQUIAL_SUBSTITUTIONS dict maps colloquial forms to their standard equivalents (25 entries total).
Particles
| Colloquial | Standard | Description |
|---|---|---|
| အုန်း | ဦး | Coconut → Particle |
| အုံး | ဦး | Pillow → Particle |
Verb Endings
| Colloquial | Standard | Description |
|---|---|---|
| ပါဘူး | မပါဘူး | Shortened negation |
| တာပဲ | တာပါပဲ | Shortened emphasis |
Pronouns
| Colloquial | Standard | Description |
|---|---|---|
| ကျနော် | ကျွန်တော် | Male 1st person (colloquial) |
| ကျွနော် | ကျွန်တော် | Male 1st person (variant) |
| ကျမ | ကျွန်မ | Female 1st person (colloquial) |
| မင်း | သင် | 2nd person (informal → formal) |
| ငါ | ကျွန်တော်, ကျွန်မ | 1st person (very informal) |
| သူတို့ | သူများ | 3rd person plural |
Common Words
| Colloquial | Standard | Description |
|---|---|---|
| ဟုတ် | ဟုတ်ကဲ့ | Yes (shortened) |
| အို | အိုး | Pot/exclamation (without visarga) |
| အဲ | ထို | That (colloquial → formal) |
| အဲဒါ | ထိုအရာ | That thing (colloquial → formal) |
| ဘယ်လို | မည်သို့ | How (colloquial → formal) |
| ဘာကြောင့် | အဘယ်ကြောင့် | Why (colloquial → formal) |
Adverbs and Reduplication
| Colloquial | Standard | Description |
|---|---|---|
| တော်တော် | အလွန် | Very (colloquial → formal) |
| သိပ် | အလွန် | Very (colloquial → formal) |
| ရမ်းရမ်း | အလွန် | Very (very colloquial) |
| ကောင်းကောင်း | ကောင်းမွန်စွာ | Well |
| မြန်မြန် | မြန်ဆန်စွာ | Quickly |
| နှေးနှေး | နှေးကွေးစွာ | Slowly |
Contractions and Texting
| Colloquial | Standard | Description |
|---|---|---|
| လို့ပဲ | ထို့ကြောင့် | Because (contracted) |
| ရင် | လျှင် | If (colloquial → formal) |
| 555 | ဟာဟာဟာ | Laughing (Thai style) |
Reverse Mapping: STANDARD_TO_COLLOQUIAL
TheSTANDARD_TO_COLLOQUIAL dictionary is the inverse of COLLOQUIAL_SUBSTITUTIONS. It maps each standard form back to its set of colloquial variants. This is built automatically at module load time.
Helper Functions
Usage Examples
PhoneticHasher Integration
Visual Confusion Detection
Tonal Variant Generation
Data Constants
Available constants and functions:Phoneme-Grapheme Notes
E vs AI Vowels
The module correctly distinguishes:ေ(U+1031) - E vowel, IPA /e/, prefix positionဲ(U+1032) - AI vowel, IPA /ɛ/, suffix position
Aspirated vs Voiced
Consonant groups contain both aspirated and voiced variants:ပ(unaspirated) vsဖ(aspirated) vsဗ(voiced)- These sound similar and are often confused
See Also
- Phonetic Hasher - Using phonetic data
- Homophones - Sound-alike detection
- Edit Distance - Edit distance with phonetic weighting