Documentation Index
Fetch the complete documentation index at: https://docs.myspellchecker.com/llms.txt
Use this file to discover all available pages before exploring further.
Syllable validation is the first layer in the pipeline. It checks Myanmar character sequences against structural rules before attempting more expensive word or context analysis.
Why Start from Syllables?
Myanmar has no spaces between words, so you can’t split text into words without a dictionary. But you can split it into syllables using regex alone. Since every word is made of one or more syllables, checking syllable structure first catches the majority of errors cheaply.
Traditional spell checkers would try to segment text into words first, which is:
- Expensive computationally
- Error-prone on misspelled text
- Wasteful when obvious typos exist
mySpellChecker inverts this:
- Break text into syllables first (fast, deterministic)
- Validate syllables (catches 90%+ of typos immediately)
- Only assemble valid syllables into words for deeper checking
Syllable Anatomy
A Myanmar syllable follows this pattern:
Syllable = Consonant + [Stacked]* + [Medial]* + [Vowel]* + [Final]*
| Component | Required | Position | Examples |
|---|
| Consonant | Yes | Initial | က, ခ, မ |
| Stacked | No | After consonant | ္က, ္ခ |
| Medial | No | After consonant | ျ, ြ, ွ, ှ |
| Vowel | No | Various | ါ, ိ, ု, ေ |
| Final | No | End | ်, ံ, း |
Simple Syllables
Note: Tonal information is omitted from these transcriptions for simplicity. Standard Burmese has four tones (low, high, creaky, checked).
Consonant only: က = "ka", မ = "ma", သ = "tha"
Consonant + Vowel: ကာ = "ka", ကိ = "ki", ကု = "ku", ကေ = "kay"
Consonant + Asat: က် = "k" (inherent vowel killed), မ် = "m" (inherent vowel killed)
Checked syllable: ကတ် = "kat" (final stop, glottal closure)
Single medial: ကျ = "kya", ကြ = "kra" (merged to kya in modern Burmese), ကွ = "kwa"
Combined: ကြွ = "krwa", ကျွ = "kywa"
Ha-htoe: နှ = "hna", မှ = "hma", လှ = "hla" (voiceless sonorants)
Medial order (Unicode canonical order, UTN #11):
1. ျ (ya-pin/medial ya, U+103B)
2. ြ (ya-yit/medial ra, U+103C)
3. ွ (wa-hswe/medial wa, U+103D)
4. ှ (ha-htoe/medial ha, U+103E) - always last
Valid: ကြွ (ြ before ွ) | Invalid: ကွြ (wrong order)
Common Syllable Patterns
| Pattern | Example | Phonetic |
|---|
| CV (Consonant + Vowel) | မာ, နေ, သူ | ma, ne, thu |
| CVC (Consonant + Vowel + Consonant) | ကန်, သင်, ကိန်း | kan, thin, kein: |
| CMV (Consonant + Medial + Vowel) | မြေ, ကျော်, ကြီး | mye, kyaw, kyi: |
| Complex | ကြောင်, မြန်မာ | kyaung, myanma |
How It Works
Syllable Segmentation
Text is broken into syllables using Myanmar orthographic rules:text = "မြန်မာနိုင်ငံ"
# Segments to: ["မြန်", "မာ", "နိုင်", "ငံ"]
Rule-Based Validation
Each syllable is checked against 5 structural rules:Rule 1: Must start with consonant"ကာ" → Valid | "ာက" → Invalid (starts with vowel)
Rule 2: Medials in correct order (Ya < Ra < Wa < Ha)"ကြွ" → Valid (ြ before ွ) | "ကွြ" → Invalid (wrong order)
Rule 3: No duplicate medials"ကြ" → Valid | "ကြြ" → Invalid (duplicate ြ)
Rule 4: Vowel compatibility"ကိ" → Valid | "ကိီ" → Invalid (both are above vowels)
"ကု" → Valid | "ကုူ" → Invalid (both are below vowels)
Rule 5: Finals at end position"မြန်" → Valid (် at end) | Finals (်, း, ံ) must not precede non-finals
Dictionary Lookup
Valid syllable structures are checked against the syllable dictionary:# Syllable exists in dictionary
"မြန်" → Valid
# Valid structure but not in dictionary
"ဆြန်" → May be invalid (flagged for review)
Stacked Consonants
Kinzi (special stacking with င):
မင်္ဂလာ = /mingala/ (မ + င် + ္ + ဂ + လ + ာ)
Regular stacking (using virama ္):
သ္တ = /sta/ (သ + ္ + တ)
ဗ္ဗ = /bba/ (ဗ + ္ + ဗ)
Configuration
Enable/Disable Syllable Validation
from myspellchecker import SpellChecker
from myspellchecker.providers import SQLiteProvider
from myspellchecker.core.constants import ValidationLevel
# Validation level is specified per-check, not in configuration
provider = SQLiteProvider(database_path="path/to/dictionary.db")
checker = SpellChecker(provider=provider)
# Syllable-only validation (fastest)
result = checker.check(text, level=ValidationLevel.SYLLABLE)
# Word validation includes syllable validation
result = checker.check(text, level=ValidationLevel.WORD)
Syllable Rule Configuration
from myspellchecker.core.syllable_rules import SyllableRuleValidator
# Custom rule validator
validator = SyllableRuleValidator(
max_syllable_length=15, # Max characters per syllable (default: 15)
corruption_threshold=3, # Max consecutive identical chars (default: 3)
strict=True, # Enforce strict Pali/Sanskrit rules (default: True)
allow_extended_myanmar=False, # Accept Extended-A/B blocks (default: False)
)
Syllable Error Types
Invalid Structure
Syllable doesn’t follow Myanmar orthographic rules:
result = checker.check("ကက") # Invalid: double consonant without medial/vowel
# Error: SyllableError with error_type=ErrorType.SYLLABLE
Unknown Syllable
Valid structure but not in dictionary:
result = checker.check("ဆြန်") # Valid structure, unknown syllable
# Error: SyllableError with suggestions from similar syllables
Common error with similar-looking medials:
# ျ (ya-pin) vs ြ (ya-yit) confusion
result = checker.check("ကျြောင်") # Incompatible medials (both ya-pin and ya-yit)
# Suggestion: "ကြောင်"
| Metric | Value |
|---|
| Speed | Very Fast |
| Time Complexity | O(n) where n = syllable count |
| Lookup Complexity | O(1) per syllable |
Syllable validation is the fastest layer in the pipeline. Each syllable is validated independently with O(1) dictionary lookups, making it suitable for real-time typing feedback.
API Reference
Using SpellChecker for Syllable Validation
from myspellchecker import SpellChecker
from myspellchecker.providers import SQLiteProvider
from myspellchecker.core.constants import ValidationLevel
provider = SQLiteProvider(database_path="path/to/dictionary.db")
checker = SpellChecker(provider=provider)
# Validate text at syllable level (default)
result = checker.check("မြန်မာ", level=ValidationLevel.SYLLABLE)
# Check for syllable-level errors
for error in result.errors:
print(f"Error at position {error.position}: {error.text}")
print(f"Suggestions: {error.suggestions}")
# Check if text is valid at syllable level
print(f"Has errors: {result.has_errors}")
Note: Direct instantiation of SyllableValidator requires a DI container setup.
For most use cases, use SpellChecker.check() instead.
SyllableRuleValidator
from myspellchecker.core.syllable_rules import SyllableRuleValidator
rule_validator = SyllableRuleValidator()
# Check if syllable follows structural rules (returns bool)
is_valid = rule_validator.validate("မြန်") # True
# Invalid syllable structures
is_valid = rule_validator.validate("ာက") # False - starts with vowel
is_valid = rule_validator.validate("ကွြ") # False - wrong medial order
is_valid = rule_validator.validate("ကိီ") # False - incompatible vowels
Common Patterns
Real-Time Validation
from myspellchecker import SpellChecker
from myspellchecker.providers import SQLiteProvider
from myspellchecker.core.constants import ValidationLevel
def validate_realtime(text: str) -> dict:
"""Fast validation for typing feedback."""
provider = SQLiteProvider(database_path="path/to/dictionary.db")
checker = SpellChecker(provider=provider)
# Use syllable-level validation for fastest response
result = checker.check(text, level=ValidationLevel.SYLLABLE)
return {
"valid": not result.has_errors,
"errors": [
{"position": e.position, "text": e.text}
for e in result.errors
]
}
Syllable-Only Suggestions
from myspellchecker import SpellChecker
from myspellchecker.providers import SQLiteProvider
from myspellchecker.core.constants import ValidationLevel
def get_syllable_suggestions(syllable: str) -> list:
"""Get suggestions for a single syllable."""
provider = SQLiteProvider(database_path="path/to/dictionary.db")
checker = SpellChecker(provider=provider)
# Use syllable-level validation
result = checker.check(syllable, level=ValidationLevel.SYLLABLE)
if not result.has_errors:
return [] # Already valid
# Return suggestions from first error
return result.errors[0].suggestions if result.errors else []
Troubleshooting
Issue: Valid syllables marked as errors
Cause: Syllable not in dictionary
Solution: Add to custom dictionary or update database:
myspellchecker build --input additional_syllables.txt --output dictionary.db --incremental
Issue: Slow syllable validation
Cause: Missing Cython extensions
Solution: Rebuild extensions:
python setup.py build_ext --inplace
Issue: Incorrect syllable segmentation
Cause: Complex stacked consonants or rare characters
Solution: Use custom segmenter or report issue:
from myspellchecker.segmenters import DefaultSegmenter
# Use strict Myanmar-only mode (no extended characters)
segmenter = DefaultSegmenter(allow_extended_myanmar=False)
syllables = segmenter.segment_syllables(text)
Next Steps