Algorithm Factory - mySpellChecker

If you need direct access to SymSpell lookups or the semantic checker outside of the full SpellChecker pipeline, whether for benchmarking, custom workflows, or embedding in other tools, AlgorithmFactory gives you configured instances with built-in LRU caching (10-100x speedup on repeated lookups).

Overview

from myspellchecker.algorithms.factory import AlgorithmFactory
from myspellchecker.providers import SQLiteProvider

# Create factory with provider
provider = SQLiteProvider("mydict.db")
factory = AlgorithmFactory(provider)

# Create algorithms with built-in caching
symspell = factory.create_symspell()

AlgorithmFactory Class

Central factory for all spell checking algorithms:

class AlgorithmFactory:
    """Factory for creating spell checking algorithms with caching.

    Provides centralized creation of algorithm instances with:
    - Transparent result caching (10-100x speedup)
    - Consistent configuration
    - Lazy initialization
    - Cache statistics

    Args:
        provider: Dictionary provider for data access
        enable_caching: Enable result caching (default: True)
        cache_sizes: Custom cache sizes per data type (default: None)
        share_caches: Share caches across factory instances with same provider (default: True)
    """

    def __init__(
        self,
        provider: DictionaryProvider,
        enable_caching: bool = True,
        cache_sizes: Optional[Dict[str, int]] = None,
        share_caches: bool = True,
    ):
        self.provider = provider
        self.enable_caching = enable_caching
        self.cache_sizes = cache_sizes

Factory Methods

create_symspell

Creates a SymSpell instance with dictionary lookup caching:

def create_symspell(
    self,
    config: Optional[SymSpellConfig] = None,
    max_edit_distance: int = 2,
    phonetic_hasher: Optional[Any] = None,
    build_index: bool = True,
) -> SymSpell:
    """Create a SymSpell instance with caching.

    Args:
        config: SymSpellConfig instance (uses defaults if None)
        max_edit_distance: Maximum edit distance for suggestions
        phonetic_hasher: Optional PhoneticHasher for phonetic matching
        build_index: Whether to build the index after creation (default: True)

    Returns:
        Configured SymSpell instance (cached if enable_caching=True)
    """

Usage:

factory = AlgorithmFactory(provider)

# Create with defaults
symspell = factory.create_symspell()

# Create with custom config
from myspellchecker.core.config import SymSpellConfig

symspell = factory.create_symspell(
    config=SymSpellConfig(prefix_length=5, beam_width=100),
    max_edit_distance=3,
)

# Get suggestions (results are cached)
suggestions = symspell.lookup("ကျောင့်")
suggestions = symspell.lookup("ကျောင့်")  # Cache hit!

create_semantic_checker

Creates an ONNX-based semantic checker:

def create_semantic_checker(
    self,
    config: Optional[SemanticConfig] = None,
) -> Optional[SemanticChecker]:
    """Create a semantic checker (ONNX-based).

    Args:
        config: Semantic checker configuration (uses defaults if None)

    Returns:
        SemanticChecker instance, or None if model not found
    """

Usage:

from myspellchecker.core.config import SemanticConfig

semantic = factory.create_semantic_checker(
    config=SemanticConfig(
        model_path="models/semantic.onnx",
        tokenizer_path="models/tokenizer.json",
    ),
)

if semantic:
    result = semantic.check("မြန်မာ [MASK] သည်")

Cached Wrappers

CachedDictionaryLookup

Wraps dictionary lookups with LRU caching:

class CachedDictionaryLookup:
    """Cached wrapper for dictionary lookups.

    Caches syllable/word validation and frequency lookups.
    """

    def __init__(
        self,
        provider: DictionaryLookup,
        syllable_cache_size: int = 4096,
        word_cache_size: int = 8192,
        use_lock: bool = False,
    ):
        self._provider = provider
        # Creates instance-specific lru_cache methods for:
        # is_valid_syllable, is_valid_word,
        # get_syllable_frequency, get_word_frequency

    def is_valid_syllable(self, syllable: str) -> bool:
        """Check if syllable exists in dictionary (cached)."""
        ...

    def is_valid_word(self, word: str) -> bool:
        """Check if word exists in dictionary (cached)."""
        ...

    def get_syllable_frequency(self, syllable: str) -> int:
        """Get syllable frequency (cached)."""
        ...

    def get_word_frequency(self, word: str) -> int:
        """Get word frequency (cached)."""
        ...

CachedBigramSource

Wraps bigram lookups with functools.lru_cache:

class CachedBigramSource:
    """Cached wrapper for bigram lookups."""

    def __init__(self, provider: BigramSource, cache_size: int = 16384):
        self._provider = provider
        # Creates instance-specific lru_cache decorators
        self._cached_get_bigram_probability = lru_cache(maxsize=cache_size)(
            self._get_bigram_probability_impl
        )
        self._cached_get_top_continuations = lru_cache(maxsize=cache_size // 4)(
            self._get_top_continuations_impl
        )

    def get_bigram_probability(self, w1: str, w2: str) -> float:
        """Get bigram probability P(w2|w1) (cached via lru_cache)."""
        return self._cached_get_bigram_probability(w1, w2)

CachedPOSRepository

Wraps POS lookups with caching:

from myspellchecker.algorithms.cache import CachedPOSRepository

# Create cached POS repository (no cache_size param - uses lazy init internally)
cached_pos = CachedPOSRepository(provider=provider)

# Get POS for a word (cached)
pos = cached_pos.get_pos("သွား")

Performance Benefits

Speedup by Operation

Operation	Uncached	Cached	Speedup
Word lookup	0.5ms	0.005ms	100x
Bigram lookup	1ms	0.01ms	100x
Suggestions	50ms	0.5ms	100x
Semantic check	200ms	2ms	100x

Memory Usage

Cache Size	Memory	Typical Coverage
1,000	~1MB	80% of common words
10,000	~10MB	95% of common words
100,000	~100MB	99% of vocabulary

Integration with SpellChecker

The factory integrates with the main SpellChecker:

from myspellchecker import SpellChecker
from myspellchecker.core.config import SpellCheckerConfig, AlgorithmCacheConfig

config = SpellCheckerConfig(
    cache=AlgorithmCacheConfig(
        syllable_cache_size=4096,
        word_cache_size=8192,
        frequency_cache_size=10000,
    ),
)

checker = SpellChecker(config=config)
# Factory is created internally with caching enabled

Manual Factory Usage

from myspellchecker.algorithms.factory import AlgorithmFactory
from myspellchecker.core.validators import WordValidator

# Create factory
factory = AlgorithmFactory(provider, cache_sizes={"dictionary_word": 50000})

# Create cached algorithms
symspell = factory.create_symspell()

# These algorithms are used internally by validators
# created through the DI container. For direct usage:
checker = SpellChecker(provider=provider)
result = checker.check("text")

Configuration

Factory Configuration

factory = AlgorithmFactory(
    provider=provider,
    enable_caching=True,     # Enable all caching (default)
    cache_sizes={            # Custom cache sizes per data type (optional)
        "dictionary_syllable": 4096,
        "dictionary_word": 8192,
        "bigram": 16384,
    },
)

Per-Algorithm Configuration

# Configure cache sizes via the cache_sizes parameter
factory = AlgorithmFactory(
    provider=provider,
    cache_sizes={
        "dictionary_syllable": 4096,
        "dictionary_word": 20000,
        "bigram": 16384,
    },
)
symspell = factory.create_symspell()

Best Practices

1. Reuse Factory Instances

# Good: Single factory for application (share_caches=True by default)
factory = AlgorithmFactory(provider)
symspell = factory.create_symspell()

# Bad: Multiple factories with share_caches=False (no cache sharing)
factory1 = AlgorithmFactory(provider, share_caches=False)
factory2 = AlgorithmFactory(provider, share_caches=False)

2. Clear Provider Caches When Needed

# Clear provider caches (SQLiteProvider)
provider.clear_caches()

​Overview

​AlgorithmFactory Class

​Factory Methods

​create_symspell

​create_semantic_checker

​Cached Wrappers

​CachedDictionaryLookup

​CachedBigramSource

​CachedPOSRepository

​Performance Benefits

​Speedup by Operation

​Memory Usage

​Integration with SpellChecker

​Manual Factory Usage

​Configuration

​Factory Configuration

​Per-Algorithm Configuration

​Best Practices

​1. Reuse Factory Instances

​2. Clear Provider Caches When Needed

​See Also