Overview
TheSyllableRuleValidator performs 22 phonotactic checks — from zero-width character rejection through medial compatibility to tone mark rules — entirely without a dictionary. Syllables that pass structural validation then proceed to SyllableValidator for dictionary lookup and suggestion generation.
Architecture
The system uses a dual-implementation pattern:Validation Pipeline
Layer Flow
SyllableRuleValidator
Initialization
Basic Usage
Validation Rules (22 Checks)
Thevalidate() method performs 22 checks in order:
Phase 1: Basic Checks
| # | Check | Purpose |
|---|---|---|
| 1 | Zero-width character rejection | Detect encoding issues |
| 2 | Corruption check | Detect data corruption (length, repetition) |
| 3 | Start character check | Must start with consonant or independent vowel |
| 4 | Base character validation | Verify valid Myanmar base character |
Phase 2: Structure Rules
| # | Check | Purpose |
|---|---|---|
| 5 | Independent vowel rules | Independent vowels can’t take medials/vowels |
| 6 | Structure sanity | Medial sequences, ordering, Visarga position |
| 7 | Kinzi pattern validation | Validate င်္ sequences |
| 8 | Asat predecessor check | Asat must follow consonant |
Phase 3: Compatibility Rules
| # | Check | Purpose |
|---|---|---|
| 9 | Unexpected consonant detection | Multiple unconnected consonants |
| 10 | Medial compatibility | Consonant-medial phonotactics |
| 11 | Medial-vowel compatibility | Medial+vowel combination validity |
| 12 | Tone rules | Stop finals, tone conflicts |
| 13 | Virama usage check | Stacking must not end syllable |
| 14 | Vowel combinations (digraphs) | Valid multi-vowel patterns |
| 15 | Vowel exclusivity | Upper vs lower vowel slots |
| 16 | E vowel combinations | ေ combination restrictions |
| 17 | E vowel position | ေ must follow consonant/medial |
| 18 | Great Sa rules | ဿ usage restrictions |
| 19 | Anusvara compatibility | ံ vowel restrictions |
| 20 | Double diacritics | No duplicate diacritics |
| 21 | Tall A / Aa exclusivity | ါ and ာ are mutually exclusive |
| 22 | Dot below position | Dot below must follow valid base |
Strict Mode Additional Checks
Whenstrict=True, these additional checks are applied:
| Check | Purpose |
|---|---|
| Virama count | Max 1 virama (2 with Kinzi) |
| Anusvara + Asat conflict | Incompatible combination |
| Asat before vowel | Invalid sequence |
| Tone strictness | Max 1 tone mark per syllable |
| Tone position | Tone marks must be at end |
| Character scope | Only core Myanmar characters |
| Diacritic uniqueness | No duplicate medials/vowels |
| One final rule | Max 1 final element |
| Strict Kinzi | Nga + Virama needs Asat |
| Virama ordering | Virama before medials |
| Pat Sint validity | Stacking rules (Vagga logic) |
Myanmar Syllable Structure
Valid Myanmar syllables follow this pattern:Character Categories
| Component | Unicode Range | Examples |
|---|---|---|
| Consonants | U+1000-U+1021 | က ခ ဂ ဃ င စ ဆ ဇ ဈ ည ဋ ဌ ဍ ဎ ဏ တ ထ ဒ ဓ န ပ ဖ ဗ ဘ မ ယ ရ လ ဝ သ ဟ ဠ အ |
| Medials | U+103B-U+103E | ျ (Ya) ြ (Ra) ွ (Wa) ှ (Ha) |
| Vowels | U+102B-U+1032 | ါ ာ ိ ီ ု ူ ေ ဲ |
| Tone marks | U+1036-U+1038 | ံ ့ း |
| Asat | U+103A | ် |
| Virama | U+1039 | ္ |
Valid Medial Sequences
Medial Compatibility
Not all consonants can take all medials:Special Patterns
Kinzi (င်္)
Kinzi is a nasalization marker in Pali/Sanskrit loanwords:Stacking (Pat Sint)
Consonant stacking follows Vagga (row) rules:Great Sa (ဿ)
The doubled Sa conjunct has special rules:Integration with SyllableValidator
The rule validator integrates with the full validation pipeline:Performance
| Implementation | Speed | Notes |
|---|---|---|
| Pure Python | ~80μs/syllable | Fallback |
| Cython | ~10μs/syllable | 8x faster |
Check Implementation
Configuration Options
Strict vs Lenient Mode
- Pali/Sanskrit stacking rules (Vagga logic)
- Canonical character ordering
- Stricter tone mark rules
- Core Myanmar characters only
Extended Myanmar
Error Messages
When validation fails, the check that failed can be identified for debugging:See Also
- Syllable Segmentation - How text is split into syllables
- Word Validation - Layer 2 word-level validation
- Syllable Validation - Syllable structure and validation rules
- Cython Guide - Performance optimization