Skip to main content
Myanmar borrows extensively from English, Pali, Sanskrit, and neighboring languages. These loan words often have multiple transliteration spellings — for example, “computer” can be written as ကွန်ပျူတာ or ကွမ်ပျူတာ. The loan word module provides bidirectional lookup between standard and variant forms.

Overview

The module loads transliteration variants from rules/loan_words.yaml and provides:
  • Variant → Standard: Given a non-standard spelling, find the standard form(s)
  • Standard → Variants: Given a standard spelling, find all known variant forms
  • SymSpell integration: Variants are injected as spell check candidates during word validation

Data Coverage

Source LanguageWordsVariantsExamples
English~120~200computer, telephone, television
Pali/Sanskrit~50~100religious terms, formal vocabulary
Other~16~23Thai, Hindi, Chinese borrowings
Total186323

Usage

Basic Lookup

from myspellchecker.core.loan_word_variants import (
    get_loan_word_standard,
    get_loan_word_variants,
    is_loan_word_variant,
)

# Check if a word is a known variant
is_loan_word_variant("ကွမ်ပျူတာ")  # True

# Get standard form(s) for a variant
standards = get_loan_word_standard("ကွမ်ပျူတာ")
# frozenset({"ကွန်ပျူတာ"})

# Get all variants for a standard form
variants = get_loan_word_variants("ကွန်ပျူတာ")
# frozenset({"ကွမ်ပျူတာ", ...})

Integration with SpellChecker

Loan word variants are automatically used during spell checking when the SymSpell algorithm generates candidates. If a user types a non-standard variant, the standard form is suggested — and vice versa.
from myspellchecker import SpellChecker
from myspellchecker.core.config import SpellCheckerConfig
from myspellchecker.providers import SQLiteProvider

config = SpellCheckerConfig()
provider = SQLiteProvider(database_path="dictionary.db")
checker = SpellChecker(config=config, provider=provider)

# Non-standard variant will get the standard form as a suggestion
result = checker.check("ကွမ်ပျူတာ ဝယ်မယ်")
No additional configuration is needed — loan word variant lookup is enabled by default at the word validation level.

YAML Structure

Loan words are organized by source language in rules/loan_words.yaml:
english:
  - standard: "ကွန်ပျူတာ"
    original: "computer"
    variants: ["ကွမ်ပျူတာ"]
  - standard: "တယ်လီဖုန်း"
    original: "telephone"
    variants: ["တယ်လီဖုံး", "တဲလီဖုန်း"]

pali_sanskrit:
  - standard: "ဗုဒ္ဓ"
    original: "Buddha"
    variants: ["ဘုရား"]

other:
  - standard: "ကော်ဖီ"
    original: "coffee"
    variants: ["ကော်ဖိ"]

Performance

  • YAML is loaded once and cached via @lru_cache
  • Public API returns frozenset (immutable, safe for concurrent access)
  • Lookup is O(1) dictionary access

See Also