Documentation Index Fetch the complete documentation index at: https://docs.myspellchecker.com/llms.txt
Use this file to discover all available pages before exploring further.
Layer 1 of the validation pipeline catches approximately 90% of spelling errors by validating Myanmar syllable structure against orthographic rules before dictionary lookup.
Overview
The SyllableRuleValidator performs 22 phonotactic checks — from zero-width character rejection through medial compatibility to tone mark rules — entirely without a dictionary. Syllables that pass structural validation then proceed to SyllableValidator for dictionary lookup and suggestion generation.
Architecture
The system uses a dual-implementation pattern:
SyllableRuleValidator (syllable_rules.py)
_SyllableRuleValidatorPython: Pure Python fallback SyllableRuleValidator (from syllable_rules_c): Cython optimized, auto-selected
At runtime, the Cython version is used if available, providing ~8x speedup.
Validation Pipeline
Layer Flow
+-------------------+
| Input Text |
+---------+---------+
|
v
+------------------------------------------+
| Layer 1: SyllableRuleValidator |
| |
| Structural validation (syllable_rules) |
| - 22+ phonotactic rules |
| - No dictionary needed |
| - Returns True / False |
+--------------------+---------------------+
|
| if valid
v
+------------------------------------------+
| Layer 1b: SyllableValidator |
| |
| Dictionary lookup (validators.py) |
| - Uses SyllableRepository |
| - Frequency threshold check |
| - Generates suggestions |
+------------------------------------------+
SyllableRuleValidator
Initialization
from myspellchecker.core.syllable_rules import SyllableRuleValidator
# Default settings
validator = SyllableRuleValidator()
# Custom settings
validator = SyllableRuleValidator(
max_syllable_length = 15 , # Maximum valid syllable length
corruption_threshold = 3 , # Max consecutive identical chars
strict = True , # Enable strict mode
allow_extended_myanmar = False # Only standard Burmese
)
Basic Usage
from myspellchecker.core.syllable_rules import SyllableRuleValidator
validator = SyllableRuleValidator()
# Valid syllables
validator.validate( "မြန်" ) # True
validator.validate( "မာ" ) # True
validator.validate( "ကျွန်" ) # True
# Invalid syllables
validator.validate( "ြမန်" ) # False - medial without consonant
validator.validate( "" ) # False - empty
validator.validate( "ကကကက" ) # False - corruption (4 identical)
Validation Rules (22 Checks)
The validate() method performs 22 checks in order:
Phase 1: Basic Checks
# Check Purpose 1 Zero-width character rejection Detect encoding issues 2 Corruption check Detect data corruption (length, repetition) 3 Start character check Must start with consonant or independent vowel 4 Base character validation Verify valid Myanmar base character
Phase 2: Structure Rules
# Check Purpose 5 Independent vowel rules Independent vowels can’t take medials/vowels 6 Structure sanity Medial sequences, ordering, Visarga position 7 Kinzi pattern validation Validate င်္ sequences 8 Asat predecessor check Asat must follow consonant
Phase 3: Compatibility Rules
# Check Purpose 9 Unexpected consonant detection Multiple unconnected consonants 10 Medial compatibility Consonant-medial phonotactics 11 Medial-vowel compatibility Medial+vowel combination validity 12 Tone rules Stop finals, tone conflicts 13 Virama usage check Stacking must not end syllable 14 Vowel combinations (digraphs) Valid multi-vowel patterns 15 Vowel exclusivity Upper vs lower vowel slots 16 E vowel combinations ေ combination restrictions and position 17 Great Sa rules ဿ usage restrictions 18 Anusvara compatibility ံ vowel restrictions 19 Asat count Maximum asat characters per syllable 20 Double diacritics No duplicate diacritics 21 Tall A / Aa exclusivity ါ and ာ are mutually exclusive 22 Dot below position Dot below must follow valid base
Strict Mode Additional Checks
When strict=True, these additional checks are applied:
Check Purpose Virama count Max 1 virama (2 with Kinzi) Anusvara + Asat conflict Incompatible combination Asat before vowel Invalid sequence Tone strictness Max 1 tone mark per syllable Tone position Tone marks must be at end Character scope Only core Myanmar characters Diacritic uniqueness No duplicate medials/vowels One final rule Max 1 final element Strict Kinzi Nga + Virama needs Asat Virama ordering Virama before medials Pat Sint validity Stacking rules (Vagga logic)
Myanmar Syllable Structure
Valid Myanmar syllables follow this pattern:
Consonant + [Medial(s)] + [Vowel] + [Tone] + [Final]
Character Categories
Component Unicode Range Examples Consonants U+1000-U+1021 က ခ ဂ ဃ င စ ဆ ဇ ဈ ည ဋ ဌ ဍ ဎ ဏ တ ထ ဒ ဓ န ပ ဖ ဗ ဘ မ ယ ရ လ ဝ သ ဟ ဠ အ Medials U+103B-U+103E ျ (Ya) ြ (Ra) ွ (Wa) ှ (Ha) Vowels U+102B-U+1032 ါ ာ ိ ီ ု ူ ေ ဲ Tone marks U+1036-U+1038 ံ ့ း Asat U+103A ် Virama U+1039 ္
VALID_MEDIAL_SEQUENCES = {
# Four-medial (Ya+Ra+Wa+Ha)
"ျြွှ" ,
# Three-medial combinations
"ျြွ" , "ျြှ" , "ျွှ" , "ြွှ" ,
# Two-medial combinations (canonical order: Ya > Ra > Wa > Ha)
"ျြ" , "ျွ" , "ျှ" , "ြွ" , "ြှ" , "ွှ" ,
# Single medials
"ျ" , "ြ" , "ွ" , "ှ" ,
}
Not all consonants can take all medials:
# Medial Ya (ျ) compatible consonants
COMPATIBLE_YA = { "က" , "ခ" , "ဂ" , "ဃ" , "င" , "စ" , "ဆ" , "ဇ" , "ည" , ... }
# Medial Ra (ြ) compatible consonants
COMPATIBLE_RA = { "က" , "ခ" , "ဂ" , "ဃ" , "င" , "စ" , "ဆ" , "ဇ" , ... }
# Medial Wa (ွ) - broadly compatible
COMPATIBLE_WA = { "က" , "ခ" , "ဂ" , "ဃ" , "င" , "စ" , "ဆ" , ... }
# Medial Ha (ှ) - only sonorants
COMPATIBLE_HA = { "မ" , "န" , "ည" , "ဏ" , "လ" , "ရ" , "ဝ" , "ယ" }
Special Patterns
Kinzi (င်္)
Kinzi is a nasalization marker in Pali/Sanskrit loanwords:
# Valid Kinzi pattern: Nga + Asat + Virama + Consonant
kinzi_seq = "င" + "်" + "္" # U+1004 + U+103A + U+1039
# Example: သင်္ဘော (ship)
validator.validate( "သင်္ဘော" ) # True
# Invalid: Kinzi without following consonant
validator.validate( "သင်္" ) # False
Stacking (Pat Sint)
Consonant stacking follows Vagga (row) rules:
# Valid: Same-row stacking
validator.validate( "က္က" ) # True - Ka row
validator.validate( "မ္မ" ) # True - Ma row
# Pali/Sanskrit exceptions
validator.validate( "က္ခ" ) # True - Exception for loanwords
Great Sa (ဿ)
The doubled Sa conjunct has special rules:
# Great Sa cannot take medials or stack
validator.validate( "ဿ" ) # True
validator.validate( "ဿွ" ) # False - no medials
validator.validate( "ဿ္က" ) # False - no stacking
Integration with SyllableValidator
The rule validator integrates with the full validation pipeline:
from myspellchecker.core.validators import SyllableValidator
from myspellchecker.core.syllable_rules import SyllableRuleValidator
# SyllableValidator uses SyllableRuleValidator internally
validator = SyllableValidator.create(
repository = provider,
segmenter = segmenter,
symspell = symspell,
config = config,
syllable_rule_validator = SyllableRuleValidator( strict = True ),
)
# Validate returns errors with suggestions
errors = validator.validate( "invalid text here" )
Implementation Speed Notes Pure Python ~80μs/syllable Fallback Cython ~10μs/syllable 8x faster
Check Implementation
from myspellchecker.core.syllable_rules import _USING_CYTHON
print ( f "Using Cython: { _USING_CYTHON } " )
Configuration Options
Strict vs Lenient Mode
# Strict mode (default) - for formal documents
validator = SyllableRuleValidator( strict = True )
# Lenient mode - for informal text, transliterations
validator = SyllableRuleValidator( strict = False )
Strict mode enforces:
Pali/Sanskrit stacking rules (Vagga logic)
Canonical character ordering
Stricter tone mark rules
Core Myanmar characters only
Extended Myanmar
# Allow Extended Myanmar blocks (Shan, Mon, etc.)
validator = SyllableRuleValidator( allow_extended_myanmar = True )
# Standard Burmese only (default)
validator = SyllableRuleValidator( allow_extended_myanmar = False )
Error Messages
When validation fails, the check that failed can be identified for debugging:
# For debugging, check individual rules
validator = SyllableRuleValidator()
syllable = "ြမန်" # Invalid: starts with medial
# These methods can help identify the issue
validator._check_start_char(syllable) # False - fails here
validator._check_medial_compatibility(syllable) # Not reached
See Also