Caching Guide - mySpellChecker

Database I/O is the primary bottleneck for spell checking throughput. Since Myanmar text reuses a relatively small set of common syllables and words, caching is highly effective, and a well-tuned cache can eliminate 90%+ of SQLite reads. This guide covers how to size caches for your workload.

Overview

The caching system provides:

LRU Cache: Least Recently Used eviction strategy
CacheManager: Unified cache management

AlgorithmCacheConfig

Note: There are two cache config classes in the library:

myspellchecker.core.config.AlgorithmCacheConfig (Pydantic model) — used with SpellCheckerConfig for algorithm-level cache sizing

myspellchecker.utils.cache.CacheConfig (dataclass) — used for low-level cache instance configuration with maxsize and name

Configure caching behavior for different lookup types:

from myspellchecker.core.config import AlgorithmCacheConfig

# Configure cache sizes for different components
config = AlgorithmCacheConfig(
    syllable_cache_size=4096,    # Syllable lookups
    word_cache_size=8192,        # Word lookups
    frequency_cache_size=8192,   # Frequency lookups
    bigram_cache_size=16384,     # Bigram probability lookups
    trigram_cache_size=16384,    # Trigram probability lookups
)

Configuration Options

Option	Type	Default	Description
`syllable_cache_size`	int	4096	LRU cache size for syllable lookups
`word_cache_size`	int	8192	LRU cache size for word lookups
`frequency_cache_size`	int	8192	LRU cache size for frequency lookups
`bigram_cache_size`	int	16384	LRU cache size for bigram lookups
`trigram_cache_size`	int	16384	LRU cache size for trigram lookups

Creating Cache Configs

from myspellchecker.utils.cache import CacheConfig

# CacheConfig is a dataclass with maxsize, ttl_seconds, enable_stats, name
syllable_cache = CacheConfig(maxsize=4096, name="syllables")
word_cache = CacheConfig(maxsize=8192, name="words")
custom_cache = CacheConfig(maxsize=5000, ttl_seconds=3600, name="custom")

LRU Cache

Least Recently Used cache with fixed size:

from myspellchecker.utils.cache import LRUCache

# Create cache
cache = LRUCache(maxsize=1000)

# Store values
cache.set("key1", "value1")
cache.set("key2", {"complex": "data"})

# Retrieve values
value = cache.get("key1")  # Returns "value1"
missing = cache.get("unknown")  # Returns None

# Check existence (use 'in' operator, not .has())
if "key1" in cache:
    print("Key exists")

# Get with default
value = cache.get("unknown", default="fallback")

# Clear cache
cache.clear()

# Get statistics
stats = cache.stats()
print(f"Hits: {stats['hits']}, Misses: {stats['misses']}")
print(f"Hit rate: {stats['hit_rate']:.2%}")

LRU Eviction

When the cache is full, the least recently accessed item is evicted:

cache = LRUCache(maxsize=3)

cache.set("a", 1)
cache.set("b", 2)
cache.set("c", 3)

# Access "a" to make it recently used
_ = cache.get("a")

# Add new item - "b" is evicted (least recently used)
cache.set("d", 4)

"a" in cache  # True (recently accessed)
"b" in cache  # False (evicted)
"c" in cache  # True
"d" in cache  # True

CacheManager

Unified cache management for multiple cache instances:

from myspellchecker.utils.cache import CacheManager

# Create manager with default cache size
manager = CacheManager(default_maxsize=1024)

# Get or create named caches
syllable_cache = manager.get_cache("syllables", maxsize=4096)
word_cache = manager.get_cache("words", maxsize=8192)
context_cache = manager.get_cache("context", maxsize=500)

# Use cache
syllable_cache.set("မြန်", True)
result = syllable_cache.get("မြန်")

# Clear all caches
manager.clear_all()

# Get combined statistics
stats = manager.get_all_stats()
for name, cache_stats in stats.items():
    print(f"{name}: {cache_stats['hit_rate']:.2%} hit rate")

Integration with SpellChecker

Caching is automatically configured via SpellCheckerConfig:

from myspellchecker import SpellChecker
from myspellchecker.core.config import SpellCheckerConfig, AlgorithmCacheConfig

config = SpellCheckerConfig(
    cache=AlgorithmCacheConfig(
        syllable_cache_size=8192,
        word_cache_size=16384,
        bigram_cache_size=32768,
    )
)

checker = SpellChecker(config=config)

What’s Cached

Component	Cache Type	Default Size	Purpose
Syllable validation	LRU	4096	Syllable validity
Word lookup	LRU	8192	Dictionary results
Frequency lookup	LRU	8192	Frequency scores
Bigram probabilities	LRU	16384	Bigram scores
Trigram probabilities	LRU	16384	Trigram scores
Edit distance	LRU	4096	Damerau-Levenshtein
POS tags	LRU	1024	Tag sequences
Stemmer	LRU	1024	Root extraction

Performance Tips

1. Size Appropriately

# For real-time typing (small vocabulary per session)
config = CacheConfig(maxsize=1000)

# For batch processing (large vocabulary)
config = CacheConfig(maxsize=50000)

2. Monitor Hit Rates

from myspellchecker.utils.cache import CacheManager

# Use CacheManager.get_all_stats() to monitor hit rates
manager = CacheManager()
stats = manager.get_all_stats()
for name, cache_stats in stats.items():
    if cache_stats['hit_rate'] < 0.5:
        print(f"Warning: {name} has low hit rate")

4. Clear on Data Changes

# After updating dictionary, clear all caches via CacheManager
manager = CacheManager()
manager.clear_all()

Thread Safety

All cache implementations are thread-safe:

from concurrent.futures import ThreadPoolExecutor

cache = LRUCache(maxsize=1000)

def worker(key):
    cache.set(key, f"value_{key}")
    return cache.get(key)

with ThreadPoolExecutor(max_workers=4) as executor:
    futures = [executor.submit(worker, i) for i in range(100)]
    results = [f.result() for f in futures]

Minimizing Cache

For debugging or testing with minimal caching:

# Use minimal cache sizes (minimum is 1 — LRUCache raises ValueError if maxsize < 1)
config = SpellCheckerConfig(
    cache=AlgorithmCacheConfig(
        syllable_cache_size=1,
        word_cache_size=1,
        bigram_cache_size=1,
    )
)

Do not set cache sizes to 0. LRUCache requires maxsize >= 1 and raises ValueError otherwise. Use 1 for the smallest possible cache.

Best Practices

Start with defaults: The default configuration works well for most cases
Monitor hit rates: Use SpellChecker.cache_stats() to identify underperforming caches
Size for working set: Cache should fit typical vocabulary in use
Clear strategically: Clear cache when dictionary data changes

​Overview

​AlgorithmCacheConfig

​Configuration Options

​Creating Cache Configs

​LRU Cache

​LRU Eviction

​CacheManager

​Integration with SpellChecker

​What’s Cached

​Performance Tips

​1. Size Appropriately

​2. Monitor Hit Rates

​4. Clear on Data Changes

​Thread Safety

​Minimizing Cache

​Best Practices

​See Also

Overview

AlgorithmCacheConfig

Configuration Options

Creating Cache Configs

LRU Cache

LRU Eviction

CacheManager

Integration with SpellChecker

What’s Cached

Performance Tips

1. Size Appropriately

2. Monitor Hit Rates

4. Clear on Data Changes

Thread Safety

Minimizing Cache

Best Practices

See Also