Skip to main content
Every dictionary lookup in mySpellChecker goes through an LRU cache — syllable validation, word lookups, frequency scores, and n-gram probabilities. Tuning cache sizes to match your working set can significantly reduce database I/O.

Overview

The caching system provides:
  • LRU Cache: Least Recently Used eviction strategy
  • CacheManager: Unified cache management

AlgorithmCacheConfig

Note: There are two cache config classes in the library:
  • myspellchecker.core.config.AlgorithmCacheConfig (Pydantic model) — used with SpellCheckerConfig for algorithm-level cache sizing
  • myspellchecker.utils.cache.CacheConfig (dataclass) — used for low-level cache instance configuration with maxsize and name
Configure caching behavior for different lookup types:
from myspellchecker.core.config import AlgorithmCacheConfig

# Configure cache sizes for different components
config = AlgorithmCacheConfig(
    syllable_cache_size=4096,    # Syllable lookups
    word_cache_size=8192,        # Word lookups
    frequency_cache_size=8192,   # Frequency lookups
    bigram_cache_size=16384,     # Bigram probability lookups
    trigram_cache_size=16384,    # Trigram probability lookups
)

Configuration Options

OptionTypeDefaultDescription
syllable_cache_sizeint4096LRU cache size for syllable lookups
word_cache_sizeint8192LRU cache size for word lookups
frequency_cache_sizeint8192LRU cache size for frequency lookups
bigram_cache_sizeint16384LRU cache size for bigram lookups
trigram_cache_sizeint16384LRU cache size for trigram lookups

Pre-defined Configurations

from myspellchecker.utils.cache import CacheConfig

# For syllable validation (high hit rate, small entries)
syllable_cache = CacheConfig.for_syllables()

# For word validation (moderate size)
word_cache = CacheConfig.for_words()

# Standard LRU cache (uses default size)
lru_cache = CacheConfig.for_lru()

# Custom LRU cache size
custom_cache = CacheConfig(maxsize=5000, name="custom")

LRU Cache

Least Recently Used cache with fixed size:
from myspellchecker.utils.cache import LRUCache

# Create cache
cache = LRUCache(maxsize=1000)

# Store values
cache.set("key1", "value1")
cache.set("key2", {"complex": "data"})

# Retrieve values
value = cache.get("key1")  # Returns "value1"
missing = cache.get("unknown")  # Returns None

# Check existence (use 'in' operator, not .has())
if "key1" in cache:
    print("Key exists")

# Get with default
value = cache.get("unknown", default="fallback")

# Clear cache
cache.clear()

# Get statistics
stats = cache.stats()
print(f"Hits: {stats['hits']}, Misses: {stats['misses']}")
print(f"Hit rate: {stats['hit_rate']:.2%}")

LRU Eviction

When the cache is full, the least recently accessed item is evicted:
cache = LRUCache(maxsize=3)

cache.set("a", 1)
cache.set("b", 2)
cache.set("c", 3)

# Access "a" to make it recently used
_ = cache.get("a")

# Add new item - "b" is evicted (least recently used)
cache.set("d", 4)

"a" in cache  # True (recently accessed)
"b" in cache  # False (evicted)
"c" in cache  # True
"d" in cache  # True

CacheManager

Unified cache management for multiple cache instances:
from myspellchecker.utils.cache import CacheManager

# Create manager with default cache size
manager = CacheManager(default_maxsize=1024)

# Get or create named caches
syllable_cache = manager.get_cache("syllables", maxsize=4096)
word_cache = manager.get_cache("words", maxsize=8192)
context_cache = manager.get_cache("context", maxsize=500)

# Use cache
syllable_cache.set("မြန်", True)
result = syllable_cache.get("မြန်")

# Remove a specific cache
manager.remove_cache("context")

# Clear all caches
manager.clear_all()

# Get combined statistics
stats = manager.get_all_stats()
for name, cache_stats in stats.items():
    print(f"{name}: {cache_stats['hit_rate']:.2%} hit rate")

Integration with SpellChecker

Caching is automatically configured via SpellCheckerConfig:
from myspellchecker import SpellChecker
from myspellchecker.core.config import SpellCheckerConfig, AlgorithmCacheConfig

config = SpellCheckerConfig(
    cache=AlgorithmCacheConfig(
        syllable_cache_size=8192,
        word_cache_size=16384,
        bigram_cache_size=32768,
    )
)

checker = SpellChecker(config=config)

What’s Cached

ComponentCache TypeDefault SizePurpose
Syllable validationLRU4096Syllable validity
Word lookupLRU8192Dictionary results
Frequency lookupLRU8192Frequency scores
Bigram probabilitiesLRU16384Bigram scores
Trigram probabilitiesLRU16384Trigram scores
Edit distanceLRU4096Damerau-Levenshtein
POS tagsLRU1024Tag sequences
StemmerLRU1024Root extraction

Performance Tips

1. Size Appropriately

# For real-time typing (small vocabulary per session)
config = CacheConfig(maxsize=1000)

# For batch processing (large vocabulary)
config = CacheConfig(maxsize=50000)

2. Monitor Hit Rates

from myspellchecker.utils.cache import CacheManager

# Use CacheManager.get_all_stats() to monitor hit rates
manager = CacheManager()
stats = manager.get_all_stats()
for name, cache_stats in stats.items():
    if cache_stats['hit_rate'] < 0.5:
        print(f"Warning: {name} has low hit rate")

4. Clear on Data Changes

# After updating dictionary, clear all caches via CacheManager
manager = CacheManager()
manager.clear_all()

Thread Safety

All cache implementations are thread-safe:
from concurrent.futures import ThreadPoolExecutor

cache = LRUCache(maxsize=1000)

def worker(key):
    cache.set(key, f"value_{key}")
    return cache.get(key)

with ThreadPoolExecutor(max_workers=4) as executor:
    futures = [executor.submit(worker, i) for i in range(100)]
    results = [f.result() for f in futures]

Minimizing Cache

For debugging or testing with minimal caching:
# Use minimal cache sizes
config = SpellCheckerConfig(
    cache=AlgorithmCacheConfig(
        syllable_cache_size=0,
        word_cache_size=0,
        bigram_cache_size=0,
    )
)

Best Practices

  1. Start with defaults: The default configuration works well for most cases
  2. Monitor hit rates: Use SpellChecker.cache_stats() to identify underperforming caches
  3. Size for working set: Cache should fit typical vocabulary in use
  4. Clear strategically: Clear cache when dictionary data changes

See Also