Skip to main content
Layer 1 of the validation pipeline catches approximately 90% of spelling errors by validating Myanmar syllable structure against orthographic rules before dictionary lookup.

Overview

The SyllableRuleValidator performs 22 phonotactic checks — from zero-width character rejection through medial compatibility to tone mark rules — entirely without a dictionary. Syllables that pass structural validation then proceed to SyllableValidator for dictionary lookup and suggestion generation.

Architecture

The system uses a dual-implementation pattern:
  SyllableRuleValidator (in syllable_rules.py)
  ├── _SyllableRuleValidatorPython   (Pure Python fallback)
  └── SyllableRuleValidatorCython    (Cython optimized, auto-selected)
At runtime, the Cython version is used if available, providing ~8x speedup.

Validation Pipeline

Layer Flow

  +-------------------+
  | Input Text        |
  +---------+---------+
            |
            v
  +------------------------------------------+
  | Layer 1: SyllableRuleValidator           |
  |                                          |
  |   Structural validation (this module)    |
  |   - 22+ phonotactic rules               |
  |   - No dictionary needed                |
  |   - Returns True/False                  |
  +--------------------+---------------------+
                       |
                       | if valid
                       v
  +------------------------------------------+
  | Layer 1b: SyllableValidator              |
  |                                          |
  |   Dictionary lookup (in validators.py)   |
  |   - Uses SyllableRepository             |
  |   - Frequency threshold check           |
  |   - Generates suggestions               |
  +------------------------------------------+

SyllableRuleValidator

Initialization

from myspellchecker.core.syllable_rules import SyllableRuleValidator

# Default settings
validator = SyllableRuleValidator()

# Custom settings
validator = SyllableRuleValidator(
    max_syllable_length=10,      # Maximum valid syllable length
    corruption_threshold=3,       # Max consecutive identical chars
    strict=True,                  # Enable strict mode
    allow_extended_myanmar=False  # Only standard Burmese
)

Basic Usage

from myspellchecker.core.syllable_rules import SyllableRuleValidator

validator = SyllableRuleValidator()

# Valid syllables
validator.validate("မြန်")  # True
validator.validate("မာ")    # True
validator.validate("ကျွန်") # True

# Invalid syllables
validator.validate("ြမန်")  # False - medial without consonant
validator.validate("")      # False - empty
validator.validate("ကကကက") # False - corruption (4 identical)

Validation Rules (22 Checks)

The validate() method performs 22 checks in order:

Phase 1: Basic Checks

#CheckPurpose
1Zero-width character rejectionDetect encoding issues
2Corruption checkDetect data corruption (length, repetition)
3Start character checkMust start with consonant or independent vowel
4Base character validationVerify valid Myanmar base character

Phase 2: Structure Rules

#CheckPurpose
5Independent vowel rulesIndependent vowels can’t take medials/vowels
6Structure sanityMedial sequences, ordering, Visarga position
7Kinzi pattern validationValidate င်္ sequences
8Asat predecessor checkAsat must follow consonant

Phase 3: Compatibility Rules

#CheckPurpose
9Unexpected consonant detectionMultiple unconnected consonants
10Medial compatibilityConsonant-medial phonotactics
11Medial-vowel compatibilityMedial+vowel combination validity
12Tone rulesStop finals, tone conflicts
13Virama usage checkStacking must not end syllable
14Vowel combinations (digraphs)Valid multi-vowel patterns
15Vowel exclusivityUpper vs lower vowel slots
16E vowel combinationsေ combination restrictions
17E vowel positionေ must follow consonant/medial
18Great Sa rulesဿ usage restrictions
19Anusvara compatibilityံ vowel restrictions
20Double diacriticsNo duplicate diacritics
21Tall A / Aa exclusivityါ and ာ are mutually exclusive
22Dot below positionDot below must follow valid base

Strict Mode Additional Checks

When strict=True, these additional checks are applied:
CheckPurpose
Virama countMax 1 virama (2 with Kinzi)
Anusvara + Asat conflictIncompatible combination
Asat before vowelInvalid sequence
Tone strictnessMax 1 tone mark per syllable
Tone positionTone marks must be at end
Character scopeOnly core Myanmar characters
Diacritic uniquenessNo duplicate medials/vowels
One final ruleMax 1 final element
Strict KinziNga + Virama needs Asat
Virama orderingVirama before medials
Pat Sint validityStacking rules (Vagga logic)

Myanmar Syllable Structure

Valid Myanmar syllables follow this pattern:
Consonant + [Medial(s)] + [Vowel] + [Tone] + [Final]

Character Categories

ComponentUnicode RangeExamples
ConsonantsU+1000-U+1021က ခ ဂ ဃ င စ ဆ ဇ ဈ ည ဋ ဌ ဍ ဎ ဏ တ ထ ဒ ဓ န ပ ဖ ဗ ဘ မ ယ ရ လ ဝ သ ဟ ဠ အ
MedialsU+103B-U+103Eျ (Ya) ြ (Ra) ွ (Wa) ှ (Ha)
VowelsU+102B-U+1032ါ ာ ိ ီ ု ူ ေ ဲ
Tone marksU+1036-U+1038ံ ့ း
AsatU+103A
ViramaU+1039

Valid Medial Sequences

VALID_MEDIAL_SEQUENCES = {
    # Four-medial (Ya+Ra+Wa+Ha)
    "ျြွှ",
    # Three-medial combinations
    "ျြွ", "ျြှ", "ျွှ", "ြွှ",
    # Two-medial combinations (canonical order: Ya > Ra > Wa > Ha)
    "ျြ", "ျွ", "ျှ", "ြွ", "ြှ", "ွှ",
    # Single medials
    "ျ", "ြ", "ွ", "ှ",
}

Medial Compatibility

Not all consonants can take all medials:
# Medial Ya (ျ) compatible consonants
COMPATIBLE_YA = {"က", "ခ", "ဂ", "ဃ", "င", "စ", "ဆ", "ဇ", "ည", ...}

# Medial Ra (ြ) compatible consonants
COMPATIBLE_RA = {"က", "ခ", "ဂ", "ဃ", "င", "စ", "ဆ", "ဇ", ...}

# Medial Wa (ွ) - broadly compatible
COMPATIBLE_WA = {"က", "ခ", "ဂ", "ဃ", "င", "စ", "ဆ", ...}

# Medial Ha (ှ) - only sonorants
COMPATIBLE_HA = {"မ", "န", "ည", "ဏ", "လ", "ရ", "ဝ", "ယ"}

Special Patterns

Kinzi (င်္)

Kinzi is a nasalization marker in Pali/Sanskrit loanwords:
# Valid Kinzi pattern: Nga + Asat + Virama + Consonant
kinzi_seq = "င" + "်" + "္"  # U+1004 + U+103A + U+1039

# Example: သင်္ဘော (ship)
validator.validate("သင်္ဘော")  # True

# Invalid: Kinzi without following consonant
validator.validate("သင်္")  # False

Stacking (Pat Sint)

Consonant stacking follows Vagga (row) rules:
# Valid: Same-row stacking
validator.validate("က္က")  # True - Ka row
validator.validate("မ္မ")  # True - Ma row

# Pali/Sanskrit exceptions
validator.validate("က္ခ")  # True - Exception for loanwords

Great Sa (ဿ)

The doubled Sa conjunct has special rules:
# Great Sa cannot take medials or stack
validator.validate("ဿ")    # True
validator.validate("ဿွ")   # False - no medials
validator.validate("ဿ္က")  # False - no stacking

Integration with SyllableValidator

The rule validator integrates with the full validation pipeline:
from myspellchecker.core.validators import SyllableValidator
from myspellchecker.core.syllable_rules import SyllableRuleValidator

# SyllableValidator uses SyllableRuleValidator internally
validator = SyllableValidator.create(
    repository=provider,
    segmenter=segmenter,
    symspell=symspell,
    syllable_rule_validator=SyllableRuleValidator(strict=True),
)

# Validate returns errors with suggestions
errors = validator.validate("invalid text here")

Performance

ImplementationSpeedNotes
Pure Python~80μs/syllableFallback
Cython~10μs/syllable8x faster

Check Implementation

from myspellchecker.core.syllable_rules import _USING_CYTHON

print(f"Using Cython: {_USING_CYTHON}")

Configuration Options

Strict vs Lenient Mode

# Strict mode (default) - for formal documents
validator = SyllableRuleValidator(strict=True)

# Lenient mode - for informal text, transliterations
validator = SyllableRuleValidator(strict=False)
Strict mode enforces:
  • Pali/Sanskrit stacking rules (Vagga logic)
  • Canonical character ordering
  • Stricter tone mark rules
  • Core Myanmar characters only

Extended Myanmar

# Allow Extended Myanmar blocks (Shan, Mon, etc.)
validator = SyllableRuleValidator(allow_extended_myanmar=True)

# Standard Burmese only (default)
validator = SyllableRuleValidator(allow_extended_myanmar=False)

Error Messages

When validation fails, the check that failed can be identified for debugging:
# For debugging, check individual rules
validator = SyllableRuleValidator()

syllable = "ြမန်"  # Invalid: starts with medial

# These methods can help identify the issue
validator._check_start_char(syllable)         # False - fails here
validator._check_medial_compatibility(syllable)  # Not reached

See Also