Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.myspellchecker.com/llms.txt

Use this file to discover all available pages before exploring further.

Follow the steps below to clone the repository, install dependencies, build Cython extensions, and verify everything works before making your first contribution.

Prerequisites

  • Python 3.10+ (3.11 recommended)
  • Git
  • C++ compiler (for Cython extensions)
    • Linux: gcc or clang
    • macOS: Xcode Command Line Tools
    • Windows: Visual Studio Build Tools

Optional

  • OpenMP (for parallel processing)
    • macOS: brew install libomp
    • Linux: Usually pre-installed
  • CUDA (for GPU acceleration with transformer models)

Quick Setup

# Clone repository
git clone https://github.com/thettwe/myspellchecker.git
cd my-spellchecker

# Create virtual environment
python3 -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate

# Install in development mode
pip install -e ".[dev]"

# Build Cython extensions
python setup.py build_ext --inplace

# Verify installation
pytest tests/ -x -q

Detailed Setup

1

Clone Repository

git clone https://github.com/thettwe/myspellchecker.git
cd my-spellchecker
2

Create Virtual Environment

# Create environment
python3 -m venv venv

# Activate
source venv/bin/activate  # Linux/macOS
venv\Scripts\activate     # Windows
3

Install Dependencies

# Development dependencies (includes testing, linting)
pip install -e ".[dev]"

# All optional dependencies
pip install -e ".[dev,build,ai,ai-full,transformers,train]"
Available extras groups:
ExtraPurposeKey Packages
devTesting, linting, type checkingpytest, mypy, ruff
buildDictionary building pipelinepyarrow, duckdb, xxhash, tqdm
aiSemantic context checking (ONNX)onnxruntime, tokenizers
ai-fullComplete AI stack (semantic + transformers)onnxruntime, tokenizers, transformers, torch
transformersTransformer-based POS taggingtransformers, torch
trainCustom model training pipelinestransformers, datasets, accelerate, torch, onnx
4

Build Cython Extensions

python setup.py build_ext --inplace
This compiles all 11 Cython modules:
  • text/normalize_c.pyx - Text normalization
  • algorithms/viterbi.pyx - POS tagging
  • algorithms/distance/edit_distance_c.pyx - Levenshtein distance
  • data_pipeline/batch_processor.pyx - Parallel batch processing
  • data_pipeline/frequency_counter.pyx - Fast frequency calculations
  • data_pipeline/ingester_c.pyx - Corpus ingestion
  • data_pipeline/repair_c.pyx - Segmentation repair
  • data_pipeline/tsv_reader_c.pyx - TSV file reading
  • tokenizers/cython/word_segment.pyx - Word segmentation
  • tokenizers/cython/mmap_reader.pyx - Memory-mapped file reading
  • core/syllable_rules_c.pyx - Syllable rule validation
5

Build Sample Database

myspellchecker build --sample
This creates a test database for development.

IDE Setup

VS Code

Recommended extensions:
  • Python (Microsoft)
  • Pylance
  • Cython
.vscode/settings.json:
{
    "python.defaultInterpreterPath": "${workspaceFolder}/venv/bin/python",
    "python.linting.enabled": true,
    "python.linting.ruffEnabled": true,
    "python.formatting.provider": "none",
    "[python]": {
        "editor.defaultFormatter": "charliermarsh.ruff",
        "editor.formatOnSave": true
    }
}

PyCharm

  1. Open project folder
  2. Configure interpreter: venv/bin/python
  3. Mark src as Sources Root
  4. Enable Ruff plugin for linting

Environment Variables

# Optional: Set database path
export MYSPELL_DATABASE_PATH=/path/to/custom.db

# Optional: Enable debug logging in your code:
#   from myspellchecker.utils.logging_utils import configure_logging
#   configure_logging(level="DEBUG")

# Optional: GPU device for transformers
export CUDA_VISIBLE_DEVICES=0

Verifying Setup

Run Tests

# All tests
pytest tests/

# Quick smoke test
pytest tests/ -x -q --tb=short

# With coverage
pytest tests/ --cov=myspellchecker

Check Cython

# Verify Cython extensions loaded
try:
    from myspellchecker.text.normalize_c import remove_zero_width_chars
    print("Cython normalize: loaded")
except ImportError:
    print("Cython normalize: not available (normalize_c is required — install via wheel)")

from myspellchecker.algorithms.viterbi import _HAS_CYTHON_VITERBI
print(f"Cython viterbi: {_HAS_CYTHON_VITERBI}")

Test Spell Checker

from myspellchecker import SpellChecker

checker = SpellChecker()
result = checker.check("မြန်မာ")
print(f"Working: {not result.has_errors}")

Common Issues

Cython Build Fails

Error: fatal error: Python.h: No such file or directory Solution: Install Python development headers
# Ubuntu/Debian
sudo apt install python3-dev

# Fedora
sudo dnf install python3-devel

# macOS (usually included with Python)
xcode-select --install

OpenMP Not Found (macOS)

Error: ld: library not found for -lomp Solution:
brew install libomp
export LDFLAGS="-L/opt/homebrew/opt/libomp/lib"
export CPPFLAGS="-I/opt/homebrew/opt/libomp/include"
python setup.py build_ext --inplace

Database Not Found

Error: MissingDatabaseError Solution:
# Build sample database
myspellchecker build --sample

# Or specify path
export MYSPELL_DATABASE_PATH=/path/to/db

Import Errors

Error: ModuleNotFoundError: No module named 'myspellchecker' Solution:
# Ensure installed in editable mode
pip install -e .

# Verify installation
pip show myspellchecker

Development Workflow

1

Create Branch

git checkout -b feature/my-feature
2

Make Changes

Edit code in src/myspellchecker/
3

Run Quality Checks

# Lint
ruff check .

# Format
ruff format .

# Type check
mypy src/myspellchecker

# Tests
pytest tests/
4

Rebuild Cython (if modified .pyx)

python setup.py build_ext --inplace
5

Commit and Push

git add .
git commit -m "feat: description"
git push origin feature/my-feature

See Also