Documentation Index
Fetch the complete documentation index at: https://docs.myspellchecker.com/llms.txt
Use this file to discover all available pages before exploring further.
If you need direct access to SymSpell lookups or the semantic checker outside of the full SpellChecker pipeline, whether for benchmarking, custom workflows, or embedding in other tools, AlgorithmFactory gives you configured instances with built-in LRU caching (10-100x speedup on repeated lookups).
Overview
from myspellchecker.algorithms.factory import AlgorithmFactory
from myspellchecker.providers import SQLiteProvider
# Create factory with provider
provider = SQLiteProvider("mydict.db")
factory = AlgorithmFactory(provider)
# Create algorithms with built-in caching
symspell = factory.create_symspell()
AlgorithmFactory Class
Central factory for all spell checking algorithms:
class AlgorithmFactory:
"""Factory for creating spell checking algorithms with caching.
Provides centralized creation of algorithm instances with:
- Transparent result caching (10-100x speedup)
- Consistent configuration
- Lazy initialization
- Cache statistics
Args:
provider: Dictionary provider for data access
enable_caching: Enable result caching (default: True)
cache_sizes: Custom cache sizes per data type (default: None)
share_caches: Share caches across factory instances with same provider (default: True)
"""
def __init__(
self,
provider: DictionaryProvider,
enable_caching: bool = True,
cache_sizes: Optional[Dict[str, int]] = None,
share_caches: bool = True,
):
self.provider = provider
self.enable_caching = enable_caching
self.cache_sizes = cache_sizes
Factory Methods
create_symspell
Creates a SymSpell instance with dictionary lookup caching:
def create_symspell(
self,
config: Optional[SymSpellConfig] = None,
max_edit_distance: int = 2,
phonetic_hasher: Optional[Any] = None,
build_index: bool = True,
) -> SymSpell:
"""Create a SymSpell instance with caching.
Args:
config: SymSpellConfig instance (uses defaults if None)
max_edit_distance: Maximum edit distance for suggestions
phonetic_hasher: Optional PhoneticHasher for phonetic matching
build_index: Whether to build the index after creation (default: True)
Returns:
Configured SymSpell instance (cached if enable_caching=True)
"""
Usage:
factory = AlgorithmFactory(provider)
# Create with defaults
symspell = factory.create_symspell()
# Create with custom config
from myspellchecker.core.config import SymSpellConfig
symspell = factory.create_symspell(
config=SymSpellConfig(prefix_length=5, beam_width=100),
max_edit_distance=3,
)
# Get suggestions (results are cached)
suggestions = symspell.lookup("ကျောင့်")
suggestions = symspell.lookup("ကျောင့်") # Cache hit!
create_semantic_checker
Creates an ONNX-based semantic checker:
def create_semantic_checker(
self,
config: Optional[SemanticConfig] = None,
) -> Optional[SemanticChecker]:
"""Create a semantic checker (ONNX-based).
Args:
config: Semantic checker configuration (uses defaults if None)
Returns:
SemanticChecker instance, or None if model not found
"""
Usage:
from myspellchecker.core.config import SemanticConfig
semantic = factory.create_semantic_checker(
config=SemanticConfig(
model_path="models/semantic.onnx",
tokenizer_path="models/tokenizer.json",
),
)
if semantic:
result = semantic.check("မြန်မာ [MASK] သည်")
Cached Wrappers
CachedDictionaryLookup
Wraps dictionary lookups with LRU caching:
class CachedDictionaryLookup:
"""Cached wrapper for dictionary lookups.
Caches syllable/word validation and frequency lookups.
"""
def __init__(
self,
provider: DictionaryLookup,
syllable_cache_size: int = 4096,
word_cache_size: int = 8192,
use_lock: bool = False,
):
self._provider = provider
# Creates instance-specific lru_cache methods for:
# is_valid_syllable, is_valid_word,
# get_syllable_frequency, get_word_frequency
def is_valid_syllable(self, syllable: str) -> bool:
"""Check if syllable exists in dictionary (cached)."""
...
def is_valid_word(self, word: str) -> bool:
"""Check if word exists in dictionary (cached)."""
...
def get_syllable_frequency(self, syllable: str) -> int:
"""Get syllable frequency (cached)."""
...
def get_word_frequency(self, word: str) -> int:
"""Get word frequency (cached)."""
...
CachedBigramSource
Wraps bigram lookups with functools.lru_cache:
class CachedBigramSource:
"""Cached wrapper for bigram lookups."""
def __init__(self, provider: BigramSource, cache_size: int = 16384):
self._provider = provider
# Creates instance-specific lru_cache decorators
self._cached_get_bigram_probability = lru_cache(maxsize=cache_size)(
self._get_bigram_probability_impl
)
self._cached_get_top_continuations = lru_cache(maxsize=cache_size // 4)(
self._get_top_continuations_impl
)
def get_bigram_probability(self, w1: str, w2: str) -> float:
"""Get bigram probability P(w2|w1) (cached via lru_cache)."""
return self._cached_get_bigram_probability(w1, w2)
CachedPOSRepository
Wraps POS lookups with caching:
from myspellchecker.algorithms.cache import CachedPOSRepository
# Create cached POS repository (no cache_size param - uses lazy init internally)
cached_pos = CachedPOSRepository(provider=provider)
# Get POS for a word (cached)
pos = cached_pos.get_pos("သွား")
Speedup by Operation
| Operation | Uncached | Cached | Speedup |
|---|
| Word lookup | 0.5ms | 0.005ms | 100x |
| Bigram lookup | 1ms | 0.01ms | 100x |
| Suggestions | 50ms | 0.5ms | 100x |
| Semantic check | 200ms | 2ms | 100x |
Memory Usage
| Cache Size | Memory | Typical Coverage |
|---|
| 1,000 | ~1MB | 80% of common words |
| 10,000 | ~10MB | 95% of common words |
| 100,000 | ~100MB | 99% of vocabulary |
Integration with SpellChecker
The factory integrates with the main SpellChecker:
from myspellchecker import SpellChecker
from myspellchecker.core.config import SpellCheckerConfig, AlgorithmCacheConfig
config = SpellCheckerConfig(
cache=AlgorithmCacheConfig(
syllable_cache_size=4096,
word_cache_size=8192,
frequency_cache_size=10000,
),
)
checker = SpellChecker(config=config)
# Factory is created internally with caching enabled
Manual Factory Usage
from myspellchecker.algorithms.factory import AlgorithmFactory
from myspellchecker.core.validators import WordValidator
# Create factory
factory = AlgorithmFactory(provider, cache_sizes={"dictionary_word": 50000})
# Create cached algorithms
symspell = factory.create_symspell()
# These algorithms are used internally by validators
# created through the DI container. For direct usage:
checker = SpellChecker(provider=provider)
result = checker.check("text")
Configuration
Factory Configuration
factory = AlgorithmFactory(
provider=provider,
enable_caching=True, # Enable all caching (default)
cache_sizes={ # Custom cache sizes per data type (optional)
"dictionary_syllable": 4096,
"dictionary_word": 8192,
"bigram": 16384,
},
)
Per-Algorithm Configuration
# Configure cache sizes via the cache_sizes parameter
factory = AlgorithmFactory(
provider=provider,
cache_sizes={
"dictionary_syllable": 4096,
"dictionary_word": 20000,
"bigram": 16384,
},
)
symspell = factory.create_symspell()
Best Practices
1. Reuse Factory Instances
# Good: Single factory for application (share_caches=True by default)
factory = AlgorithmFactory(provider)
symspell = factory.create_symspell()
# Bad: Multiple factories with share_caches=False (no cache sharing)
factory1 = AlgorithmFactory(provider, share_caches=False)
factory2 = AlgorithmFactory(provider, share_caches=False)
2. Clear Provider Caches When Needed
# Clear provider caches (SQLiteProvider)
provider.clear_caches()
See Also