Skip to main content
Grammar rules are defined in YAML files and drive the syntactic validation strategy — checking POS sequences, particle chains, sentence boundaries, register consistency, and negation patterns. You can provide your own rules file to customize validation behavior.

Overview

Grammar rules are defined in YAML files and control syntactic validation including:
  • Invalid POS sequences (e.g., V-V, P-P patterns)
  • Sentence boundary constraints
  • Particle chain validation
  • Register consistency (formal vs colloquial)
  • Negation and classifier rules

Rule File Location

Default grammar rules are located at:
src/myspellchecker/rules/grammar_rules.yaml
You can provide custom rules via configuration:
from myspellchecker.grammar.config import GrammarRuleConfig

# GrammarRuleConfig loads and manages rules from a YAML file
rule_config = GrammarRuleConfig(config_path="/path/to/custom_rules.yaml")

YAML Schema

File Structure

# Grammar rules file structure
version: "1.0.0"
category: "grammar_rules"
description: "Custom grammar validation rules"

metadata:
  created_date: "2025-01-01"
  last_updated: "2025-01-15"
  total_entries: 45
  source: "Custom patterns"

rules:
  # Rule categories
  sentence_start_constraints: [...]
  sentence_end_constraints: [...]
  invalid_sequences: [...]
  required_particles: [...]
  sentence_final_required: [...]
  particle_chains:
    valid_chains: [...]
    invalid_chains: [...]
  register_rules: [...]
  clause_linkage: [...]
  negation_rules: [...]
  classifier_rules: [...]

# Tag definitions
particle_tags:
  "ကို": "P_OBJ"
  "က": "P_SUBJ"
  # ...

interjection_tags:
  "အိုး": "INT"
  # ...

Rule Categories

Sentence Constraints

Define what particles can or cannot appear at sentence boundaries:
rules:
  sentence_start_constraints:
    - forbidden_tags: ["P_OBJ", "P_POSS", "P_BEN"]
      forbidden_words: ["ကို", "ရဲ့", "အတွက်"]
      severity: "error"
      message: "Sentence usually cannot start with this particle"
      confidence: 0.90

  sentence_end_constraints:
    - forbidden_tags: ["P_SUBJ", "P_LOC", "P_OBJ"]
      forbidden_words: ["က", "မှာ", "ကို"]
      severity: "error"
      message: "Sentence cannot end with this particle"
      confidence: 0.90

Invalid POS Sequences

Define patterns that indicate grammatical errors:
rules:
  invalid_sequences:
    # Consecutive verbs (might need particle)
    - pattern: "V-V"
      severity: "info"
      message: "Consecutive verbs (Serial Verb Construction?)"
      suggestion: "Check if particle is needed if not an SVC"
      confidence: 0.50
      examples:
        incorrect: "သွား စား"
        correct: "သွားပြီး စားတယ်"
        translation: "went and then ate"
      exceptions: ["သွား", "လာ", "နေ", "ပေး", "လိုက်"]

    # Conflicting tense markers
    - pattern: "P_PAST-P_FUT"
      severity: "error"
      message: "Conflicting tense markers (Past + Future)"
      suggestion: "Choose one tense"
      confidence: 0.98
      examples:
        incorrect: "သွားခဲ့မယ်"
        correct: "သွားခဲ့တယ် / သွားမယ်"

    # Number without classifier
    - pattern: "NUM-N"
      severity: "error"
      message: "Number cannot directly precede Noun (needs classifier)"
      suggestion: "Use Noun + Number + Classifier pattern"
      confidence: 0.90
      examples:
        incorrect: "တစ် လူ"
        correct: "လူ တစ် ယောက်"
        translation: "one person"

Required Particles

Define when particles are typically required:
rules:
  required_particles:
    - pattern: "N-V"
      required_one_of: ["က", "ကို", "မှာ", "မှ"]
      severity: "warning"
      message: "Noun-verb sequence usually needs case particle"
      confidence: 0.75
      examples:
        questionable: "သူ သွားတယ်"
        better: "သူက သွားတယ်"
        translation: "he went"

  sentence_final_required:
    - pattern: "V$"
      required_one_of: ["တယ်", "ပါတယ်", "သည်", "ပါသည်", "မယ်", "ပြီ"]
      severity: "warning"
      message: "Verb at sentence end usually needs final particle"
      confidence: 0.80

Particle Chains

Define valid and invalid particle combinations:
rules:
  particle_chains:
    valid_chains:
      - particles: ["က", "တော့"]
        meaning: "Subject + topic/contrast"
        example: "သူကတော့ သွားတယ်"
        translation: "As for him, (he) went"

      - particles: ["ကို", "လည်း"]
        meaning: "Object + also"
        example: "သူ့ကိုလည်း ခေါ်"
        translation: "Call him too"

    invalid_chains:
      - particles: ["က", "ကို"]
        severity: "error"
        message: "Subject marker cannot be followed by object marker"
        confidence: 0.95
        examples:
          incorrect: "သူကကို"
          correct: "သူ့ကို / သူက"

      - particles: ["တယ်", "သည်"]
        severity: "error"
        message: "Mixed colloquial and formal endings"
        confidence: 0.95

Register Rules

Check for consistency between formal and colloquial style:
rules:
  register_rules:
    formal_endings: ["သည်", "ပါသည်", "မည်", "ပါမည်", "၏"]
    colloquial_endings: ["တယ်", "ပါတယ်", "မယ်", "ပါမယ်", "ဘူး"]
    checks:
      - pattern: "formal_with_colloquial"
        severity: "info"
        message: "Mixed formal and colloquial register"
        suggestion: "Consider using consistent register throughout"
        confidence: 0.60

Negation Rules

Validate proper negation patterns:
rules:
  negation_rules:
    # Standard negation
    - pattern: "မ-V-ဘူး"
      severity: "info"
      message: "Standard negative construction"
      confidence: 0.90
      examples:
        correct: "မသွားဘူး"
        translation: "don't go / won't go"

    # Imperative negation
    - pattern: "မ-V-နဲ့"
      severity: "info"
      message: "Negative imperative (don't)"
      confidence: 0.85
      examples:
        correct: "မသွားနဲ့"
        translation: "don't go!"

    # Missing ဘူး
    - pattern: "မ-V$"
      severity: "warning"
      message: "Negative verb may need ဘူး ending"
      suggestion: "Add ဘူး for complete negation"
      confidence: 0.70

Classifier Rules

Validate numeral-classifier-noun patterns:
rules:
  classifier_rules:
    - pattern: "N-NUM-CLF"
      severity: "info"
      message: "Standard classifier construction"
      confidence: 0.80
      examples:
        correct: "လူ သုံးယောက်"
        translation: "three people"

    - pattern: "CLF-NUM"
      severity: "error"
      message: "Classifier before numeral (wrong order)"
      suggestion: "Classifier should follow numeral"
      confidence: 0.95

Tag Definitions

Particle Tags

Map specific words to POS tags for validation:
particle_tags:
  # Possessive
  "ရဲ့": "P_POSS"
  "၏": "P_POSS"

  # Object markers
  "ကို": "P_OBJ"
  "အား": "P_OBJ"

  # Subject marker
  "က": "P_SUBJ"

  # Location markers
  "မှာ": "P_LOC"
  "၌": "P_LOC"
  "တွင်": "P_LOC"

  # Sentence endings (colloquial)
  "တယ်": "P_SENT"
  "ပါတယ်": "P_SENT"
  "မယ်": "P_SENT"
  "ဘူး": "P_NEG_SENT"

  # Sentence endings (formal)
  "သည်": "P_SENT"
  "ပါသည်": "P_SENT"
  "မည်": "P_SENT"

Interjection Tags

interjection_tags:
  "အိုး": "INT"      # Oh! (surprise)
  "ဟေ့": "INT"       # Hey! (attention)
  "အော်": "INT"      # Ah! (exclamation)
  "ကွာ": "INT"       # Emphatic particle
  "ဟုတ်လား": "INT"   # Really?

Severity Levels

LevelDescriptionAction
errorClear grammatical errorFlag as error, suggest fix
warningLikely error, context-dependentFlag with medium confidence
infoInformational, may be intentionalOptional flag, low confidence

Confidence Scores

Confidence ranges from 0.0 to 1.0:
RangeMeaning
0.90-1.0Very high confidence, clear error
0.70-0.89High confidence, likely error
0.50-0.69Medium confidence, context-dependent
< 0.50Low confidence, informational

Creating Custom Rules

1

Create YAML File

# my_custom_rules.yaml
version: "1.0.0"
category: "custom_grammar_rules"
description: "Domain-specific grammar rules"

rules:
  invalid_sequences:
    - pattern: "ADJ-ADJ"
      severity: "warning"
      message: "Multiple consecutive adjectives"
      confidence: 0.60
2

Load Custom Rules

from myspellchecker import SpellChecker
from myspellchecker.core.config import GrammarEngineConfig, SpellCheckerConfig
from myspellchecker.core.builder import SpellCheckerBuilder

# Enable grammar checking via SpellCheckerConfig
config = SpellCheckerConfig(
    use_rule_based_validation=True,
)
checker = SpellCheckerBuilder(config).build()

# To customize confidence thresholds, pass GrammarEngineConfig
# directly to SyntacticRuleChecker:
#   from myspellchecker.grammar import SyntacticRuleChecker
#   engine = SyntacticRuleChecker(
#       grammar_config=GrammarEngineConfig(
#           default_confidence_threshold=0.80,
#           high_confidence=0.90,
#       )
#   )
3

Test Rules

result = checker.check("သူကကို ခေါ်တယ်")

for error in result.errors:
    if error.error_type == "grammar_error":
        print(f"Grammar: {error.text} - {error.suggestions}")

Best Practices

1. Use Examples

Always include examples in rules:
examples:
  incorrect: "သွားခဲ့မယ်"
  correct: "သွားခဲ့တယ် / သွားမယ်"
  translation: "went / will go"

2. Include Exceptions

List valid exceptions to patterns:
exceptions: ["သွား", "လာ", "နေ", "ပေး"]  # Serial verb auxiliaries

3. Set Appropriate Confidence

  • High confidence (0.9+) for clear errors
  • Medium confidence (0.6-0.8) for likely errors
  • Low confidence (<0.5) for informational patterns
Organize rules by category for maintainability:
rules:
  tense_rules:
    - pattern: "P_PAST-P_FUT" ...
    - pattern: "P_PAST-P_PAST" ...

  case_rules:
    - pattern: "P_SUBJ-P_OBJ" ...

Debugging Rules

Enable Grammar Debug Logging

from myspellchecker.utils.logging_utils import configure_logging

configure_logging(level="DEBUG")

Check Rule Matching

from myspellchecker.grammar.engine import SyntacticRuleChecker
from myspellchecker.providers import SQLiteProvider

# SyntacticRuleChecker requires a DictionaryProvider as first argument
provider = SQLiteProvider(database_path="/path/to/dict.db")
grammar_checker = SyntacticRuleChecker(provider)

# check_sequence() only takes words; POS tags are looked up internally
errors = grammar_checker.check_sequence(
    words=["သူ", "က", "ကို", "ခေါ်", "တယ်"]
)

for idx, rule_name, message in errors:
    print(f"Position: {idx}")
    print(f"Rule: {rule_name}")
    print(f"Message: {message}")

See Also