Documentation Index
Fetch the complete documentation index at: https://docs.myspellchecker.com/llms.txt
Use this file to discover all available pages before exploring further.
Follow the steps below to clone the repository, install dependencies, build Cython extensions, and verify everything works before making your first contribution.
Prerequisites
- Python 3.10+ (3.11 recommended)
- Git
- C++ compiler (for Cython extensions)
- Linux:
gcc or clang
- macOS: Xcode Command Line Tools
- Windows: Visual Studio Build Tools
Optional
- OpenMP (for parallel processing)
- macOS:
brew install libomp
- Linux: Usually pre-installed
- CUDA (for GPU acceleration with transformer models)
Quick Setup
# Clone repository
git clone https://github.com/thettwe/myspellchecker.git
cd my-spellchecker
# Create virtual environment
python3 -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
# Install in development mode
pip install -e ".[dev]"
# Build Cython extensions
python setup.py build_ext --inplace
# Verify installation
pytest tests/ -x -q
Detailed Setup
Clone Repository
git clone https://github.com/thettwe/myspellchecker.git
cd my-spellchecker
Create Virtual Environment
# Create environment
python3 -m venv venv
# Activate
source venv/bin/activate # Linux/macOS
venv\Scripts\activate # Windows
Install Dependencies
# Development dependencies (includes testing, linting)
pip install -e ".[dev]"
# All optional dependencies
pip install -e ".[dev,build,ai,ai-full,transformers,train]"
Available extras groups:| Extra | Purpose | Key Packages |
|---|
dev | Testing, linting, type checking | pytest, mypy, ruff |
build | Dictionary building pipeline | pyarrow, duckdb, xxhash, tqdm |
ai | Semantic context checking (ONNX) | onnxruntime, tokenizers |
ai-full | Complete AI stack (semantic + transformers) | onnxruntime, tokenizers, transformers, torch |
transformers | Transformer-based POS tagging | transformers, torch |
train | Custom model training pipelines | transformers, datasets, accelerate, torch, onnx |
Build Cython Extensions
python setup.py build_ext --inplace
This compiles all 11 Cython modules:
text/normalize_c.pyx - Text normalization
algorithms/viterbi.pyx - POS tagging
algorithms/distance/edit_distance_c.pyx - Levenshtein distance
data_pipeline/batch_processor.pyx - Parallel batch processing
data_pipeline/frequency_counter.pyx - Fast frequency calculations
data_pipeline/ingester_c.pyx - Corpus ingestion
data_pipeline/repair_c.pyx - Segmentation repair
data_pipeline/tsv_reader_c.pyx - TSV file reading
tokenizers/cython/word_segment.pyx - Word segmentation
tokenizers/cython/mmap_reader.pyx - Memory-mapped file reading
core/syllable_rules_c.pyx - Syllable rule validation
Build Sample Database
myspellchecker build --sample
This creates a test database for development.
IDE Setup
VS Code
Recommended extensions:
- Python (Microsoft)
- Pylance
- Cython
.vscode/settings.json:
{
"python.defaultInterpreterPath": "${workspaceFolder}/venv/bin/python",
"python.linting.enabled": true,
"python.linting.ruffEnabled": true,
"python.formatting.provider": "none",
"[python]": {
"editor.defaultFormatter": "charliermarsh.ruff",
"editor.formatOnSave": true
}
}
PyCharm
- Open project folder
- Configure interpreter:
venv/bin/python
- Mark
src as Sources Root
- Enable Ruff plugin for linting
Environment Variables
# Optional: Set database path
export MYSPELL_DATABASE_PATH=/path/to/custom.db
# Optional: Enable debug logging in your code:
# from myspellchecker.utils.logging_utils import configure_logging
# configure_logging(level="DEBUG")
# Optional: GPU device for transformers
export CUDA_VISIBLE_DEVICES=0
Verifying Setup
Run Tests
# All tests
pytest tests/
# Quick smoke test
pytest tests/ -x -q --tb=short
# With coverage
pytest tests/ --cov=myspellchecker
Check Cython
# Verify Cython extensions loaded
try:
from myspellchecker.text.normalize_c import remove_zero_width_chars
print("Cython normalize: loaded")
except ImportError:
print("Cython normalize: not available (normalize_c is required — install via wheel)")
from myspellchecker.algorithms.viterbi import _HAS_CYTHON_VITERBI
print(f"Cython viterbi: {_HAS_CYTHON_VITERBI}")
Test Spell Checker
from myspellchecker import SpellChecker
checker = SpellChecker()
result = checker.check("မြန်မာ")
print(f"Working: {not result.has_errors}")
Common Issues
Cython Build Fails
Error: fatal error: Python.h: No such file or directory
Solution: Install Python development headers
# Ubuntu/Debian
sudo apt install python3-dev
# Fedora
sudo dnf install python3-devel
# macOS (usually included with Python)
xcode-select --install
OpenMP Not Found (macOS)
Error: ld: library not found for -lomp
Solution:
brew install libomp
export LDFLAGS="-L/opt/homebrew/opt/libomp/lib"
export CPPFLAGS="-I/opt/homebrew/opt/libomp/include"
python setup.py build_ext --inplace
Database Not Found
Error: MissingDatabaseError
Solution:
# Build sample database
myspellchecker build --sample
# Or specify path
export MYSPELL_DATABASE_PATH=/path/to/db
Import Errors
Error: ModuleNotFoundError: No module named 'myspellchecker'
Solution:
# Ensure installed in editable mode
pip install -e .
# Verify installation
pip show myspellchecker
Development Workflow
Create Branch
git checkout -b feature/my-feature
Make Changes
Edit code in src/myspellchecker/
Run Quality Checks
# Lint
ruff check .
# Format
ruff format .
# Type check
mypy src/myspellchecker
# Tests
pytest tests/
Rebuild Cython (if modified .pyx)
python setup.py build_ext --inplace
Commit and Push
git add .
git commit -m "feat: description"
git push origin feature/my-feature
See Also