Skip to main content
Many Myanmar words can function as multiple parts of speech depending on context. This module examines surrounding POS tags, determiners, and adverb markers to select the most likely tag, producing a confidence score and audit trail for each decision.

Overview

from myspellchecker.algorithms.pos_disambiguator import POSDisambiguator

disambiguator = POSDisambiguator()

# Word "ကြီး" can be ADJ, N, or V
result = disambiguator.disambiguate_in_context(
    word="ကြီး",
    word_pos_tags=frozenset({"ADJ", "N", "V"}),
    prev_word_pos="V",
    next_word_pos=None,
)
print(result.resolved_pos)  # "N" (Rule R1: after verb)

The Disambiguation Problem

Many Myanmar words can have multiple POS tags:
WordPossible POSExample Usage
ကြီးADJ, N, Vbig/size/grow
သားN, Vson/child/be born
ပြောV, Nspeak/speech
Context determines the correct POS:
  • “ကြီး သော အိမ်” → ADJ (modifies noun)
  • “သူ ကြီး တယ်” → V (before particle)
  • “အကြီး ကို” → N (after verb)

Disambiguation Rules

R1: Noun After Verb

If the previous word is a verb, the ambiguous word is likely a noun (object).
# "သူ ပြော ကြီး ကို ဝယ်သည်" → ကြီး = N (after V)
result = disambiguator.disambiguate_in_context(
    word="ကြီး",
    word_pos_tags=frozenset({"ADJ", "N", "V"}),
    prev_word_pos="V",
)
# Rule R1 applied, confidence: 0.85

R2: Adjective Before Noun

If the next word is a noun, the ambiguous word is likely an adjective.
# "ကြီး သော အိမ်" → ကြီး = ADJ (before N)
result = disambiguator.disambiguate_in_context(
    word="ကြီး",
    word_pos_tags=frozenset({"ADJ", "N", "V"}),
    next_word_pos="N",
)
# Rule R2 applied, confidence: 0.80

R3: Verb Before Particle

If the next word is a sentence-final or modifying particle, the word is likely a verb.
# "သူ ကြီး ပြီ" → ကြီး = V (before particle)
result = disambiguator.disambiguate_in_context(
    word="ကြီး",
    word_pos_tags=frozenset({"ADJ", "N", "V"}),
    next_word_pos="P_SENT",
)
# Rule R3 applied, confidence: 0.90 (highest priority)

R4: Noun After Determiner

If the previous word is a determiner/demonstrative, the word is likely a noun.
# "ဤ ကြီး ကို" → ကြီး = N (after determiner)
result = disambiguator.disambiguate_in_context(
    word="ကြီး",
    word_pos_tags=frozenset({"ADJ", "N", "V"}),
    prev_word="ဤ",
)
# Rule R4 applied, confidence: 0.88

R5: Verb After Adverb

If the previous word is an adverb, the word is likely a verb being modified.
# "လျင်မြန်စွာ ကြီး လာသည်" → ကြီး = V (after adverb)
result = disambiguator.disambiguate_in_context(
    word="ကြီး",
    word_pos_tags=frozenset({"ADJ", "N", "V"}),
    prev_word="လျင်မြန်စွာ",
)
# Rule R5 applied, confidence: 0.85

Rule Priority

Rules are applied in priority order (highest to lowest):
PriorityRuleConfidence
1R3 - Verb Before Particle0.90
2R5 - Verb After Adverb0.85
3R1 - Noun After Verb0.85
4R2 - Adjective Before Noun0.80
5R4 - Noun After Determiner0.88

DisambiguationResult

Results include detailed information:
@dataclass
class DisambiguationResult:
    word: str                          # The disambiguated word
    original_pos_tags: FrozenSet[str]  # Original possible tags
    resolved_pos: str                  # The resolved single tag
    rule_applied: DisambiguationRule   # Which rule was applied
    confidence: float                  # Confidence score (0.0-1.0)
    context_used: str                  # Description of context

Accessing Results

result = disambiguator.disambiguate_in_context(
    word="ကြီး",
    word_pos_tags=frozenset({"ADJ", "N", "V"}),
    next_word_pos="P_SENT",
)

print(result.resolved_pos)      # "V"
print(result.rule_applied)      # DisambiguationRule.R3_VERB_BEFORE_PARTICLE
print(result.confidence)        # 0.90
print(result.context_used)      # "before particle 'တယ်' (P_SENT)"

Sentence Disambiguation

Disambiguate all words in a sentence:
words = ["ကြီး", "သော", "အိမ်"]
pos_tags = [
    frozenset({"ADJ", "N", "V"}),
    frozenset({"P_MOD"}),
    frozenset({"N"}),
]

results = disambiguator.disambiguate_sentence(words, pos_tags)

for word, result in zip(words, results):
    print(f"{word}: {result.resolved_pos} ({result.rule_applied.value})")
# ကြီး: ADJ (R2)
# သော: P_MOD (none - unambiguous)
# အိမ်: N (none - unambiguous)

Convenience Function

For quick single-word disambiguation:
from myspellchecker.algorithms.pos_disambiguator import disambiguate

pos = disambiguate(
    word="ကြီး",
    word_pos_tags=frozenset({"ADJ", "N", "V"}),
    next_word_pos="N",
)
print(pos)  # "ADJ"

Linguistic Data

Determiners

Words that trigger R4 (noun context):
DETERMINERS = {
    "ဤ",      # this
    "ယင်း",   # that
    "ထို",    # that
    "ဒီ",     # this (colloquial)
    "အဲဒီ",   # that (colloquial)
    "တစ်",    # one/a
    "အားလုံး", # all
    ...
}

Adverb Markers

Words that trigger R5 (verb context):
ADVERB_MARKERS = {
    "လျင်မြန်စွာ",  # quickly
    "ဖြည်းဖြည်း",  # slowly
    "အလွန်",       # very
    "ကောင်းစွာ",   # well
    "ချက်ချင်း",   # immediately
    ...
}

Particle Tags

POS tags that trigger R3:
PARTICLE_POS_TAGS = {
    "P_SENT",  # Sentence-final particle
    "P_MOD",   # Modifying particle
    "PPM",     # Post-positional marker
}

Thread Safety

The module provides a thread-safe singleton:
from myspellchecker.algorithms.pos_disambiguator import get_disambiguator

# Thread-safe singleton
disambiguator = get_disambiguator()

Integration

With Grammar Checker

from myspellchecker.grammar.engine import SyntacticRuleChecker
from myspellchecker.algorithms.pos_disambiguator import POSDisambiguator

# Use POSDisambiguator to resolve ambiguous tags before grammar checking
disambiguator = POSDisambiguator()

def check_with_disambiguation(words, pos_tags, grammar_checker):
    # Disambiguate before checking grammar
    results = disambiguator.disambiguate_sentence(words, pos_tags)
    resolved_tags = [r.resolved_pos for r in results]

    # Check grammar with resolved tags
    return grammar_checker.check_sequence(words)

With Viterbi Tagger

from myspellchecker.algorithms.pos_disambiguator import POSDisambiguator

class ViterbiPOSTagger:
    def __init__(self):
        self.disambiguator = POSDisambiguator()

    def tag(self, words):
        # Get initial tags from Viterbi
        initial_tags = self._viterbi_decode(words)

        # Refine ambiguous tags
        pos_tag_sets = [self._get_possible_tags(w) for w in words]
        results = self.disambiguator.disambiguate_sentence(words, pos_tag_sets)

        return [r.resolved_pos for r in results]

See Also