POS Disambiguator - mySpellChecker

Many Myanmar words can function as multiple parts of speech depending on context. This module examines surrounding POS tags, determiners, and adverb markers to select the most likely tag, producing a confidence score and audit trail for each decision.

Overview

from myspellchecker.algorithms.pos_disambiguator import POSDisambiguator

disambiguator = POSDisambiguator()

# Word "ကြီး" can be ADJ, N, or V
result = disambiguator.disambiguate_in_context(
    word="ကြီး",
    word_pos_tags=frozenset(["ADJ", "N", "V"]),
    prev_word_pos="V",
    next_word_pos=None,
)
print(result.resolved_pos)  # "N" (Rule R1: after verb)

The Disambiguation Problem

Many Myanmar words can have multiple POS tags:

Word	Possible POS	Example Usage
`ကြီး`	ADJ, N, V	big/size/grow
`သား`	N, V	son/child/be born
`ပြော`	V, N	speak/speech

Context determines the correct POS:

“ကြီး သော အိမ်” → ADJ (modifies noun)
“သူ ကြီး တယ်” → V (before particle)
“အကြီး ကို” → N (after verb)

Disambiguation Rules

R1: Noun After Verb

If the previous word is a verb, the ambiguous word is likely a noun (object).

# "သူ ပြော ကြီး ကို ဝယ်သည်" → ကြီး = N (after V)
result = disambiguator.disambiguate_in_context(
    word="ကြီး",
    word_pos_tags=frozenset(["ADJ", "N", "V"]),
    prev_word_pos="V",
)
# Rule R1 applied, confidence: 0.85

R2: Adjective Before Noun/Pronoun

If the next word is a noun or pronoun, the ambiguous word is likely an adjective.

# "ကြီး သော အိမ်" → ကြီး = ADJ (before N)
result = disambiguator.disambiguate_in_context(
    word="ကြီး",
    word_pos_tags=frozenset(["ADJ", "N", "V"]),
    next_word_pos="N",
)
# Rule R2 applied, confidence: 0.80

R3: Verb Before Particle

If the next word is a sentence-final or modifying particle, the word is likely a verb.

# "သူ ကြီး ပြီ" → ကြီး = V (before particle)
result = disambiguator.disambiguate_in_context(
    word="ကြီး",
    word_pos_tags=frozenset(["ADJ", "N", "V"]),
    next_word_pos="P_SENT",
)
# Rule R3 applied, confidence: 0.90 (highest priority)

R4: Noun After Determiner

If the previous word is a determiner/demonstrative, the word is likely a noun.

# "ဤ ကြီး ကို" → ကြီး = N (after determiner)
result = disambiguator.disambiguate_in_context(
    word="ကြီး",
    word_pos_tags=frozenset(["ADJ", "N", "V"]),
    prev_word="ဤ",
)
# Rule R4 applied, confidence: 0.88

R5: Verb/Adjective After Adverb

If the previous word is an adverb, the word is likely a verb being modified. Exception: degree adverbs (e.g., အလွန်) resolve the word to ADJ instead of V.

# "လျင်မြန်စွာ ကြီး လာသည်" → ကြီး = V (after adverb)
result = disambiguator.disambiguate_in_context(
    word="ကြီး",
    word_pos_tags=frozenset(["ADJ", "N", "V"]),
    prev_word="လျင်မြန်စွာ",
)
# Rule R5 applied, confidence: 0.85

Rule Priority

Rules are applied in priority order (highest to lowest):

Priority	Rule	Confidence
1	R3 - Verb Before Particle	0.90
2	R5 - Verb After Adverb	0.85
3	R1 - Noun After Verb	0.85
4	R2 - Adjective Before Noun	0.80
5	R4 - Noun After Determiner	0.88

DisambiguationResult

Results include detailed information:

@dataclass
class DisambiguationResult:
    word: str                          # The disambiguated word
    original_pos_tags: FrozenSet[str]  # Original possible tags
    resolved_pos: str                  # The resolved single tag
    rule_applied: DisambiguationRule   # Which rule was applied
    confidence: float                  # Confidence score (0.0-1.0)
    context_used: str                  # Description of context

Accessing Results

result = disambiguator.disambiguate_in_context(
    word="ကြီး",
    word_pos_tags=frozenset(["ADJ", "N", "V"]),
    next_word_pos="P_SENT",
)

print(result.resolved_pos)      # "V"
print(result.rule_applied)      # DisambiguationRule.R3_VERB_BEFORE_PARTICLE
print(result.confidence)        # 0.90
print(result.context_used)      # "before particle 'တယ်' (P_SENT)"

Sentence Disambiguation

Disambiguate each word in a sentence by calling disambiguate_in_context() per word:

words = ["ကြီး", "သော", "အိမ်"]
pos_tags = [
    frozenset(["ADJ", "N", "V"]),
    frozenset(["P_MOD"]),
    frozenset(["N"]),
]

# Disambiguate each word individually using surrounding POS context
for i, (word, tags) in enumerate(zip(words, pos_tags)):
    prev_pos = None if i == 0 else pos_tags[i - 1]
    next_pos = None if i == len(words) - 1 else pos_tags[i + 1]
    # Extract single POS string from unambiguous tags for context
    prev_pos_str = next(iter(prev_pos)) if prev_pos and len(prev_pos) == 1 else None
    next_pos_str = next(iter(next_pos)) if next_pos and len(next_pos) == 1 else None
    result = disambiguator.disambiguate_in_context(
        word, tags,
        prev_word=words[i - 1] if i > 0 else None,
        prev_word_pos=prev_pos_str,
        next_word=words[i + 1] if i < len(words) - 1 else None,
        next_word_pos=next_pos_str,
    )
    print(f"{word}: {result.resolved_pos} ({result.rule_applied.value})")
# ကြီး: ADJ (R2)
# သော: P_MOD (none - unambiguous)
# အိမ်: N (none - unambiguous)

Convenience Function

For quick single-word disambiguation:

from myspellchecker.algorithms.pos_disambiguator import disambiguate

pos = disambiguate(
    word="ကြီး",
    word_pos_tags=frozenset(["ADJ", "N", "V"]),
    next_word_pos="N",
)
print(pos)  # "ADJ"

Linguistic Data

Determiners

Words that trigger R4 (noun context):

DETERMINERS = {
    "ဤ",      # this
    "ယင်း",   # that
    "ထို",    # that
    "ဒီ",     # this (colloquial)
    "အဲဒီ",   # that (colloquial)
    "တစ်",    # one/a
    "အားလုံး", # all
    ...
}

Adverb Markers

Words that trigger R5 (verb context):

ADVERB_MARKERS = {
    "လျင်မြန်စွာ",  # quickly
    "ဖြည်းဖြည်း",  # slowly
    "အလွန်",       # very
    "ကောင်းစွာ",   # well
    "ချက်ချင်း",   # immediately
    ...
}

Particle Tags

POS tags that trigger R3:

PARTICLE_POS_TAGS = {
    "P_SENT",  # Sentence-final particle
    "P_MOD",   # Modifying particle
    "PPM",     # Post-positional marker
}

Thread Safety

The module provides a thread-safe singleton:

from myspellchecker.algorithms.pos_disambiguator import get_disambiguator

# Thread-safe singleton
disambiguator = get_disambiguator()

Integration

With Grammar Checker

from myspellchecker.grammar.engine import SyntacticRuleChecker
from myspellchecker.algorithms.pos_disambiguator import POSDisambiguator

# Use POSDisambiguator to resolve ambiguous tags before grammar checking
disambiguator = POSDisambiguator()

def check_with_disambiguation(words, pos_tags, grammar_checker):
    # Disambiguate each word before checking grammar
    resolved_tags = []
    for i, (word, tags) in enumerate(zip(words, pos_tags)):
        result = disambiguator.disambiguate_in_context(
            word, tags,
            prev_word_pos=resolved_tags[-1] if resolved_tags else None,
        )
        resolved_tags.append(result.resolved_pos)

    # Check grammar with resolved tags
    return grammar_checker.check_sequence(words)

With Viterbi Tagger

from myspellchecker.algorithms.pos_disambiguator import POSDisambiguator

class ViterbiPOSTagger:
    def __init__(self):
        self.disambiguator = POSDisambiguator()

    def tag(self, words):
        # Get initial tags from Viterbi
        initial_tags = self._viterbi_decode(words)

        # Refine ambiguous tags per word
        pos_tag_sets = [self._get_possible_tags(w) for w in words]
        resolved = []
        for i, (word, tags) in enumerate(zip(words, pos_tag_sets)):
            result = self.disambiguator.disambiguate_in_context(
                word, tags,
                prev_word_pos=initial_tags[i - 1] if i > 0 else None,
                next_word_pos=initial_tags[i + 1] if i < len(words) - 1 else None,
            )
            resolved.append(result.resolved_pos)

        return resolved

​Overview

​The Disambiguation Problem

​Disambiguation Rules

​R1: Noun After Verb

​R2: Adjective Before Noun/Pronoun

​R3: Verb Before Particle

​R4: Noun After Determiner

​R5: Verb/Adjective After Adverb

​Rule Priority

​DisambiguationResult

​Accessing Results

​Sentence Disambiguation

​Convenience Function

​Linguistic Data

​Determiners

​Adverb Markers

​Particle Tags

​Thread Safety

​Integration

​With Grammar Checker

​With Viterbi Tagger

​See Also