Overview
Myanmar words are often formed by combining roots with suffixes and prefixes. TheMorphologyAnalyzer can:
- Guess POS tags based on word morphology
- Decompose words into roots and suffixes
- Identify numeral words
- Handle ambiguous words with multiple POS possibilities
- Extract morphological patterns for OOV recovery
MorphologyAnalyzer
The main class for morphological analysis.Basic POS Guessing
Multi-POS Support
Handle words with multiple possible parts of speech:Comprehensive POS Inference
Word Analysis
Decompose words into roots and suffixes:With Dictionary Validation
Validate extracted roots against a dictionary:Using Cached Analyzer
For better performance on repeated calls:Numeral Detection
Detect Myanmar numerals (digits and words):POSGuess Data Structure
Results include detailed reasoning:WordAnalysis Data Structure
Complete word decomposition:POS Tag Priority
When multiple suffixes match, tags are prioritized:| Priority | Tag | Description |
|---|---|---|
| 0 | NUM | Numerals (highest) |
| 1 | P_SENT | Sentence particles |
| 2 | P_MOD | Modifier particles |
| 3 | P_LOC | Location particles |
| 4 | P_SUBJ | Subject particles |
| 5 | P_OBJ | Object particles |
| 6 | V | Verbs |
| 7 | N | Nouns |
| 8 | ADJ | Adjectives |
| 9 | ADV | Adverbs |
Confidence Scoring
Confidence is based on:- Numeral detection: 0.95-0.99 (highest)
- Proper noun suffixes: 0.85-0.95
- Prefix patterns (e.g., အ → Noun): 0.60-0.75
- Suffix length ratio: Longer suffix matches = higher confidence
- Tag priority: Tie-breaker for similar confidence
Integration with Stemmer
For performance-critical applications, integrate with the Stemmer:Custom Configuration
Load morphology rules from custom config:Suffix Categories
The analyzer recognizes these suffix types:Verb Suffixes
- ခဲ့ (past tense)
- နေ (progressive)
- မည် (future)
- ပြီ (completion)
- ရ (potential)
Noun Suffixes
- များ (plural)
- ချင်း (comparative)
- လောက် (approximation)
Particle Suffixes
- သည် (formal ending)
- တယ် (colloquial ending)
- မှာ (location)
- ကို (object marker)
Adverb Suffixes
- စွာ (manner)
- အောင် (result)
Morphological Synthesis
In addition to morphological analysis (decomposition), the library provides morphological synthesis (validation of productive word formation). These modules validate OOV words formed through compounding and reduplication.ReduplicationEngine
Validates words formed by repeating known dictionary words:| Pattern | Example | Description |
|---|---|---|
| AA | ကောင်းကောင်း | Simple repetition |
| AABB | သေသေချာချာ | Each syllable doubles (A-A-B-B) |
| ABAB | ခဏခဏ | Whole word repeats (AB-AB) |
| RHYME | ရှုပ်ယှက် | Known rhyme pairs |
CompoundResolver
Validates compound words by splitting into known dictionary morphemes using dynamic programming:| Pattern | Example | Description |
|---|---|---|
| N+N | ကျောင်းသား (student) | Noun + Noun |
| V+V | စားသောက် (eat & drink) | Verb + Verb |
| N+V | ရေချိုး (bathe) | Noun + Verb |
| V+N | စားခန်း (dining room) | Verb + Noun |
| N+ADJ | မြို့ကြီး (big city) | Noun + Adjective |
Integration
Both engines are automatically integrated intoWordValidator when enabled via config:
Relationship to MorphologyAnalyzer
| Module | Purpose | When Used |
|---|---|---|
text/morphology.py | POS guessing, suffix stripping, OOV decomposition | Suggestion generation (analyzing unknown words) |
text/reduplication.py | Validate productive reduplications | Before error creation (suppress false positives) |
text/compound_resolver.py | Validate productive compounds | Before error creation (suppress false positives) |
See Also
- POS Tagging - Full POS tagging system
- Grammar Checkers - Grammar validation
- Syllable Validation - Syllable-level checking
- Word Validation - Full word validation pipeline
- Suggestion Strategy - Suggestion pipeline