Documentation Index
Fetch the complete documentation index at: https://docs.myspellchecker.com/llms.txt
Use this file to discover all available pages before exploring further.
Many Myanmar words can function as multiple parts of speech depending on context. This module examines surrounding POS tags, determiners, and adverb markers to select the most likely tag, producing a confidence score and audit trail for each decision.
Overview
from myspellchecker.algorithms.pos_disambiguator import POSDisambiguator
disambiguator = POSDisambiguator()
# Word "ကြီး" can be ADJ, N, or V
result = disambiguator.disambiguate_in_context(
word="ကြီး",
word_pos_tags=frozenset(["ADJ", "N", "V"]),
prev_word_pos="V",
next_word_pos=None,
)
print(result.resolved_pos) # "N" (Rule R1: after verb)
The Disambiguation Problem
Many Myanmar words can have multiple POS tags:
| Word | Possible POS | Example Usage |
|---|
ကြီး | ADJ, N, V | big/size/grow |
သား | N, V | son/child/be born |
ပြော | V, N | speak/speech |
Context determines the correct POS:
- “ကြီး သော အိမ်” → ADJ (modifies noun)
- “သူ ကြီး တယ်” → V (before particle)
- “အကြီး ကို” → N (after verb)
Disambiguation Rules
R1: Noun After Verb
If the previous word is a verb, the ambiguous word is likely a noun (object).
# "သူ ပြော ကြီး ကို ဝယ်သည်" → ကြီး = N (after V)
result = disambiguator.disambiguate_in_context(
word="ကြီး",
word_pos_tags=frozenset(["ADJ", "N", "V"]),
prev_word_pos="V",
)
# Rule R1 applied, confidence: 0.85
R2: Adjective Before Noun/Pronoun
If the next word is a noun or pronoun, the ambiguous word is likely an adjective.
# "ကြီး သော အိမ်" → ကြီး = ADJ (before N)
result = disambiguator.disambiguate_in_context(
word="ကြီး",
word_pos_tags=frozenset(["ADJ", "N", "V"]),
next_word_pos="N",
)
# Rule R2 applied, confidence: 0.80
R3: Verb Before Particle
If the next word is a sentence-final or modifying particle, the word is likely a verb.
# "သူ ကြီး ပြီ" → ကြီး = V (before particle)
result = disambiguator.disambiguate_in_context(
word="ကြီး",
word_pos_tags=frozenset(["ADJ", "N", "V"]),
next_word_pos="P_SENT",
)
# Rule R3 applied, confidence: 0.90 (highest priority)
R4: Noun After Determiner
If the previous word is a determiner/demonstrative, the word is likely a noun.
# "ဤ ကြီး ကို" → ကြီး = N (after determiner)
result = disambiguator.disambiguate_in_context(
word="ကြီး",
word_pos_tags=frozenset(["ADJ", "N", "V"]),
prev_word="ဤ",
)
# Rule R4 applied, confidence: 0.88
R5: Verb/Adjective After Adverb
If the previous word is an adverb, the word is likely a verb being modified. Exception: degree adverbs (e.g., အလွန်) resolve the word to ADJ instead of V.
# "လျင်မြန်စွာ ကြီး လာသည်" → ကြီး = V (after adverb)
result = disambiguator.disambiguate_in_context(
word="ကြီး",
word_pos_tags=frozenset(["ADJ", "N", "V"]),
prev_word="လျင်မြန်စွာ",
)
# Rule R5 applied, confidence: 0.85
Rule Priority
Rules are applied in priority order (highest to lowest):
| Priority | Rule | Confidence |
|---|
| 1 | R3 - Verb Before Particle | 0.90 |
| 2 | R5 - Verb After Adverb | 0.85 |
| 3 | R1 - Noun After Verb | 0.85 |
| 4 | R2 - Adjective Before Noun | 0.80 |
| 5 | R4 - Noun After Determiner | 0.88 |
DisambiguationResult
Results include detailed information:
@dataclass
class DisambiguationResult:
word: str # The disambiguated word
original_pos_tags: FrozenSet[str] # Original possible tags
resolved_pos: str # The resolved single tag
rule_applied: DisambiguationRule # Which rule was applied
confidence: float # Confidence score (0.0-1.0)
context_used: str # Description of context
Accessing Results
result = disambiguator.disambiguate_in_context(
word="ကြီး",
word_pos_tags=frozenset(["ADJ", "N", "V"]),
next_word_pos="P_SENT",
)
print(result.resolved_pos) # "V"
print(result.rule_applied) # DisambiguationRule.R3_VERB_BEFORE_PARTICLE
print(result.confidence) # 0.90
print(result.context_used) # "before particle 'တယ်' (P_SENT)"
Sentence Disambiguation
Disambiguate each word in a sentence by calling disambiguate_in_context() per word:
words = ["ကြီး", "သော", "အိမ်"]
pos_tags = [
frozenset(["ADJ", "N", "V"]),
frozenset(["P_MOD"]),
frozenset(["N"]),
]
# Disambiguate each word individually using surrounding POS context
for i, (word, tags) in enumerate(zip(words, pos_tags)):
prev_pos = None if i == 0 else pos_tags[i - 1]
next_pos = None if i == len(words) - 1 else pos_tags[i + 1]
# Extract single POS string from unambiguous tags for context
prev_pos_str = next(iter(prev_pos)) if prev_pos and len(prev_pos) == 1 else None
next_pos_str = next(iter(next_pos)) if next_pos and len(next_pos) == 1 else None
result = disambiguator.disambiguate_in_context(
word, tags,
prev_word=words[i - 1] if i > 0 else None,
prev_word_pos=prev_pos_str,
next_word=words[i + 1] if i < len(words) - 1 else None,
next_word_pos=next_pos_str,
)
print(f"{word}: {result.resolved_pos} ({result.rule_applied.value})")
# ကြီး: ADJ (R2)
# သော: P_MOD (none - unambiguous)
# အိမ်: N (none - unambiguous)
Convenience Function
For quick single-word disambiguation:
from myspellchecker.algorithms.pos_disambiguator import disambiguate
pos = disambiguate(
word="ကြီး",
word_pos_tags=frozenset(["ADJ", "N", "V"]),
next_word_pos="N",
)
print(pos) # "ADJ"
Linguistic Data
Determiners
Words that trigger R4 (noun context):
DETERMINERS = {
"ဤ", # this
"ယင်း", # that
"ထို", # that
"ဒီ", # this (colloquial)
"အဲဒီ", # that (colloquial)
"တစ်", # one/a
"အားလုံး", # all
...
}
Adverb Markers
Words that trigger R5 (verb context):
ADVERB_MARKERS = {
"လျင်မြန်စွာ", # quickly
"ဖြည်းဖြည်း", # slowly
"အလွန်", # very
"ကောင်းစွာ", # well
"ချက်ချင်း", # immediately
...
}
Particle Tags
POS tags that trigger R3:
PARTICLE_POS_TAGS = {
"P_SENT", # Sentence-final particle
"P_MOD", # Modifying particle
"PPM", # Post-positional marker
}
Thread Safety
The module provides a thread-safe singleton:
from myspellchecker.algorithms.pos_disambiguator import get_disambiguator
# Thread-safe singleton
disambiguator = get_disambiguator()
Integration
With Grammar Checker
from myspellchecker.grammar.engine import SyntacticRuleChecker
from myspellchecker.algorithms.pos_disambiguator import POSDisambiguator
# Use POSDisambiguator to resolve ambiguous tags before grammar checking
disambiguator = POSDisambiguator()
def check_with_disambiguation(words, pos_tags, grammar_checker):
# Disambiguate each word before checking grammar
resolved_tags = []
for i, (word, tags) in enumerate(zip(words, pos_tags)):
result = disambiguator.disambiguate_in_context(
word, tags,
prev_word_pos=resolved_tags[-1] if resolved_tags else None,
)
resolved_tags.append(result.resolved_pos)
# Check grammar with resolved tags
return grammar_checker.check_sequence(words)
With Viterbi Tagger
from myspellchecker.algorithms.pos_disambiguator import POSDisambiguator
class ViterbiPOSTagger:
def __init__(self):
self.disambiguator = POSDisambiguator()
def tag(self, words):
# Get initial tags from Viterbi
initial_tags = self._viterbi_decode(words)
# Refine ambiguous tags per word
pos_tag_sets = [self._get_possible_tags(w) for w in words]
resolved = []
for i, (word, tags) in enumerate(zip(words, pos_tag_sets)):
result = self.disambiguator.disambiguate_in_context(
word, tags,
prev_word_pos=initial_tags[i - 1] if i > 0 else None,
next_word_pos=initial_tags[i + 1] if i < len(words) - 1 else None,
)
resolved.append(result.resolved_pos)
return resolved
See Also