Database I/O is the primary bottleneck for spell checking throughput. Since Myanmar text reuses a relatively small set of common syllables and words, caching is highly effective, and a well-tuned cache can eliminate 90%+ of SQLite reads. This guide covers how to size caches for your workload.
Overview
The caching system provides:
- LRU Cache: Least Recently Used eviction strategy
- CacheManager: Unified cache management
AlgorithmCacheConfig
Note: There are two cache config classes in the library:
myspellchecker.core.config.AlgorithmCacheConfig (Pydantic model) — used with SpellCheckerConfig for algorithm-level cache sizing
myspellchecker.utils.cache.CacheConfig (dataclass) — used for low-level cache instance configuration with maxsize and name
Configure caching behavior for different lookup types:
from myspellchecker.core.config import AlgorithmCacheConfig
# Configure cache sizes for different components
config = AlgorithmCacheConfig(
syllable_cache_size=4096, # Syllable lookups
word_cache_size=8192, # Word lookups
frequency_cache_size=8192, # Frequency lookups
bigram_cache_size=16384, # Bigram probability lookups
trigram_cache_size=16384, # Trigram probability lookups
)
Configuration Options
| Option | Type | Default | Description |
|---|
syllable_cache_size | int | 4096 | LRU cache size for syllable lookups |
word_cache_size | int | 8192 | LRU cache size for word lookups |
frequency_cache_size | int | 8192 | LRU cache size for frequency lookups |
bigram_cache_size | int | 16384 | LRU cache size for bigram lookups |
trigram_cache_size | int | 16384 | LRU cache size for trigram lookups |
Creating Cache Configs
from myspellchecker.utils.cache import CacheConfig
# CacheConfig is a dataclass with maxsize, ttl_seconds, enable_stats, name
syllable_cache = CacheConfig(maxsize=4096, name="syllables")
word_cache = CacheConfig(maxsize=8192, name="words")
custom_cache = CacheConfig(maxsize=5000, ttl_seconds=3600, name="custom")
LRU Cache
Least Recently Used cache with fixed size:
from myspellchecker.utils.cache import LRUCache
# Create cache
cache = LRUCache(maxsize=1000)
# Store values
cache.set("key1", "value1")
cache.set("key2", {"complex": "data"})
# Retrieve values
value = cache.get("key1") # Returns "value1"
missing = cache.get("unknown") # Returns None
# Check existence (use 'in' operator, not .has())
if "key1" in cache:
print("Key exists")
# Get with default
value = cache.get("unknown", default="fallback")
# Clear cache
cache.clear()
# Get statistics
stats = cache.stats()
print(f"Hits: {stats['hits']}, Misses: {stats['misses']}")
print(f"Hit rate: {stats['hit_rate']:.2%}")
LRU Eviction
When the cache is full, the least recently accessed item is evicted:
cache = LRUCache(maxsize=3)
cache.set("a", 1)
cache.set("b", 2)
cache.set("c", 3)
# Access "a" to make it recently used
_ = cache.get("a")
# Add new item - "b" is evicted (least recently used)
cache.set("d", 4)
"a" in cache # True (recently accessed)
"b" in cache # False (evicted)
"c" in cache # True
"d" in cache # True
CacheManager
Unified cache management for multiple cache instances:
from myspellchecker.utils.cache import CacheManager
# Create manager with default cache size
manager = CacheManager(default_maxsize=1024)
# Get or create named caches
syllable_cache = manager.get_cache("syllables", maxsize=4096)
word_cache = manager.get_cache("words", maxsize=8192)
context_cache = manager.get_cache("context", maxsize=500)
# Use cache
syllable_cache.set("မြန်", True)
result = syllable_cache.get("မြန်")
# Clear all caches
manager.clear_all()
# Get combined statistics
stats = manager.get_all_stats()
for name, cache_stats in stats.items():
print(f"{name}: {cache_stats['hit_rate']:.2%} hit rate")
Integration with SpellChecker
Caching is automatically configured via SpellCheckerConfig:
from myspellchecker import SpellChecker
from myspellchecker.core.config import SpellCheckerConfig, AlgorithmCacheConfig
config = SpellCheckerConfig(
cache=AlgorithmCacheConfig(
syllable_cache_size=8192,
word_cache_size=16384,
bigram_cache_size=32768,
)
)
checker = SpellChecker(config=config)
What’s Cached
| Component | Cache Type | Default Size | Purpose |
|---|
| Syllable validation | LRU | 4096 | Syllable validity |
| Word lookup | LRU | 8192 | Dictionary results |
| Frequency lookup | LRU | 8192 | Frequency scores |
| Bigram probabilities | LRU | 16384 | Bigram scores |
| Trigram probabilities | LRU | 16384 | Trigram scores |
| Edit distance | LRU | 4096 | Damerau-Levenshtein |
| POS tags | LRU | 1024 | Tag sequences |
| Stemmer | LRU | 1024 | Root extraction |
1. Size Appropriately
# For real-time typing (small vocabulary per session)
config = CacheConfig(maxsize=1000)
# For batch processing (large vocabulary)
config = CacheConfig(maxsize=50000)
2. Monitor Hit Rates
from myspellchecker.utils.cache import CacheManager
# Use CacheManager.get_all_stats() to monitor hit rates
manager = CacheManager()
stats = manager.get_all_stats()
for name, cache_stats in stats.items():
if cache_stats['hit_rate'] < 0.5:
print(f"Warning: {name} has low hit rate")
4. Clear on Data Changes
# After updating dictionary, clear all caches via CacheManager
manager = CacheManager()
manager.clear_all()
Thread Safety
All cache implementations are thread-safe:
from concurrent.futures import ThreadPoolExecutor
cache = LRUCache(maxsize=1000)
def worker(key):
cache.set(key, f"value_{key}")
return cache.get(key)
with ThreadPoolExecutor(max_workers=4) as executor:
futures = [executor.submit(worker, i) for i in range(100)]
results = [f.result() for f in futures]
Minimizing Cache
For debugging or testing with minimal caching:
# Use minimal cache sizes (minimum is 1 — LRUCache raises ValueError if maxsize < 1)
config = SpellCheckerConfig(
cache=AlgorithmCacheConfig(
syllable_cache_size=1,
word_cache_size=1,
bigram_cache_size=1,
)
)
Do not set cache sizes to 0. LRUCache requires maxsize >= 1 and raises ValueError otherwise. Use 1 for the smallest possible cache.
Best Practices
- Start with defaults: The default configuration works well for most cases
- Monitor hit rates: Use
SpellChecker.cache_stats() to identify underperforming caches
- Size for working set: Cache should fit typical vocabulary in use
- Clear strategically: Clear cache when dictionary data changes
See Also