Skip to main content
If you need direct access to SymSpell lookups or the semantic checker outside of the full SpellChecker pipeline, whether for benchmarking, custom workflows, or embedding in other tools, AlgorithmFactory gives you configured instances with built-in LRU caching (10-100x speedup on repeated lookups).

Overview

from myspellchecker.algorithms.factory import AlgorithmFactory
from myspellchecker.providers import SQLiteProvider

# Create factory with provider
provider = SQLiteProvider("mydict.db")
factory = AlgorithmFactory(provider)

# Create algorithms with built-in caching
symspell = factory.create_symspell()

AlgorithmFactory Class

Central factory for all spell checking algorithms:
class AlgorithmFactory:
    """Factory for creating spell checking algorithms with caching.

    Provides centralized creation of algorithm instances with:
    - Transparent result caching (10-100x speedup)
    - Consistent configuration
    - Lazy initialization
    - Cache statistics

    Args:
        provider: Dictionary provider for data access
        enable_caching: Enable result caching (default: True)
        cache_sizes: Custom cache sizes per data type (default: None)
        share_caches: Share caches across factory instances with same provider (default: True)
    """

    def __init__(
        self,
        provider: DictionaryProvider,
        enable_caching: bool = True,
        cache_sizes: Optional[Dict[str, int]] = None,
        share_caches: bool = True,
    ):
        self.provider = provider
        self.enable_caching = enable_caching
        self.cache_sizes = cache_sizes

Factory Methods

create_symspell

Creates a SymSpell instance with dictionary lookup caching:
def create_symspell(
    self,
    config: Optional[SymSpellConfig] = None,
    max_edit_distance: int = 2,
    phonetic_hasher: Optional[Any] = None,
    build_index: bool = True,
) -> SymSpell:
    """Create a SymSpell instance with caching.

    Args:
        config: SymSpellConfig instance (uses defaults if None)
        max_edit_distance: Maximum edit distance for suggestions
        phonetic_hasher: Optional PhoneticHasher for phonetic matching
        build_index: Whether to build the index after creation (default: True)

    Returns:
        Configured SymSpell instance (cached if enable_caching=True)
    """
Usage:
factory = AlgorithmFactory(provider)

# Create with defaults
symspell = factory.create_symspell()

# Create with custom config
from myspellchecker.core.config import SymSpellConfig

symspell = factory.create_symspell(
    config=SymSpellConfig(prefix_length=5, beam_width=100),
    max_edit_distance=3,
)

# Get suggestions (results are cached)
suggestions = symspell.lookup("ကျောင့်")
suggestions = symspell.lookup("ကျောင့်")  # Cache hit!

create_semantic_checker

Creates an ONNX-based semantic checker:
def create_semantic_checker(
    self,
    config: Optional[SemanticConfig] = None,
) -> Optional[SemanticChecker]:
    """Create a semantic checker (ONNX-based).

    Args:
        config: Semantic checker configuration (uses defaults if None)

    Returns:
        SemanticChecker instance, or None if model not found
    """
Usage:
from myspellchecker.core.config import SemanticConfig

semantic = factory.create_semantic_checker(
    config=SemanticConfig(
        model_path="models/semantic.onnx",
        tokenizer_path="models/tokenizer.json",
    ),
)

if semantic:
    result = semantic.check("မြန်မာ [MASK] သည်")

Cached Wrappers

CachedDictionaryLookup

Wraps dictionary lookups with LRU caching:
class CachedDictionaryLookup:
    """Cached wrapper for dictionary lookups.

    Caches syllable/word validation and frequency lookups.
    """

    def __init__(
        self,
        provider: DictionaryLookup,
        syllable_cache_size: int = 4096,
        word_cache_size: int = 8192,
        use_lock: bool = False,
    ):
        self._provider = provider
        # Creates instance-specific lru_cache methods for:
        # is_valid_syllable, is_valid_word,
        # get_syllable_frequency, get_word_frequency

    def is_valid_syllable(self, syllable: str) -> bool:
        """Check if syllable exists in dictionary (cached)."""
        ...

    def is_valid_word(self, word: str) -> bool:
        """Check if word exists in dictionary (cached)."""
        ...

    def get_syllable_frequency(self, syllable: str) -> int:
        """Get syllable frequency (cached)."""
        ...

    def get_word_frequency(self, word: str) -> int:
        """Get word frequency (cached)."""
        ...

CachedBigramSource

Wraps bigram lookups with functools.lru_cache:
class CachedBigramSource:
    """Cached wrapper for bigram lookups."""

    def __init__(self, provider: BigramSource, cache_size: int = 16384):
        self._provider = provider
        # Creates instance-specific lru_cache decorators
        self._cached_get_bigram_probability = lru_cache(maxsize=cache_size)(
            self._get_bigram_probability_impl
        )
        self._cached_get_top_continuations = lru_cache(maxsize=cache_size // 4)(
            self._get_top_continuations_impl
        )

    def get_bigram_probability(self, w1: str, w2: str) -> float:
        """Get bigram probability P(w2|w1) (cached via lru_cache)."""
        return self._cached_get_bigram_probability(w1, w2)

CachedPOSRepository

Wraps POS lookups with caching:
from myspellchecker.algorithms.cache import CachedPOSRepository

# Create cached POS repository (no cache_size param - uses lazy init internally)
cached_pos = CachedPOSRepository(provider=provider)

# Get POS for a word (cached)
pos = cached_pos.get_pos("သွား")

Performance Benefits

Speedup by Operation

OperationUncachedCachedSpeedup
Word lookup0.5ms0.005ms100x
Bigram lookup1ms0.01ms100x
Suggestions50ms0.5ms100x
Semantic check200ms2ms100x

Memory Usage

Cache SizeMemoryTypical Coverage
1,000~1MB80% of common words
10,000~10MB95% of common words
100,000~100MB99% of vocabulary

Integration with SpellChecker

The factory integrates with the main SpellChecker:
from myspellchecker import SpellChecker
from myspellchecker.core.config import SpellCheckerConfig, AlgorithmCacheConfig

config = SpellCheckerConfig(
    cache=AlgorithmCacheConfig(
        syllable_cache_size=4096,
        word_cache_size=8192,
        frequency_cache_size=10000,
    ),
)

checker = SpellChecker(config=config)
# Factory is created internally with caching enabled

Manual Factory Usage

from myspellchecker.algorithms.factory import AlgorithmFactory
from myspellchecker.core.validators import WordValidator

# Create factory
factory = AlgorithmFactory(provider, cache_sizes={"dictionary_word": 50000})

# Create cached algorithms
symspell = factory.create_symspell()

# These algorithms are used internally by validators
# created through the DI container. For direct usage:
checker = SpellChecker(provider=provider)
result = checker.check("text")

Configuration

Factory Configuration

factory = AlgorithmFactory(
    provider=provider,
    enable_caching=True,     # Enable all caching (default)
    cache_sizes={            # Custom cache sizes per data type (optional)
        "dictionary_syllable": 4096,
        "dictionary_word": 8192,
        "bigram": 16384,
    },
)

Per-Algorithm Configuration

# Configure cache sizes via the cache_sizes parameter
factory = AlgorithmFactory(
    provider=provider,
    cache_sizes={
        "dictionary_syllable": 4096,
        "dictionary_word": 20000,
        "bigram": 16384,
    },
)
symspell = factory.create_symspell()

Best Practices

1. Reuse Factory Instances

# Good: Single factory for application (share_caches=True by default)
factory = AlgorithmFactory(provider)
symspell = factory.create_symspell()

# Bad: Multiple factories with share_caches=False (no cache sharing)
factory1 = AlgorithmFactory(provider, share_caches=False)
factory2 = AlgorithmFactory(provider, share_caches=False)

2. Clear Provider Caches When Needed

# Clear provider caches (SQLiteProvider)
provider.clear_caches()

See Also