Skip to main content
Follow the steps below to clone the repository, install dependencies, build Cython extensions, and verify everything works before making your first contribution.

Prerequisites

  • Python 3.10+ (3.11 recommended)
  • Git
  • C++ compiler (for Cython extensions)
    • Linux: gcc or clang
    • macOS: Xcode Command Line Tools
    • Windows: Visual Studio Build Tools

Optional

  • OpenMP (for parallel processing)
    • macOS: brew install libomp
    • Linux: Usually pre-installed
  • CUDA (for GPU acceleration with transformer models)

Quick Setup

# Clone repository
git clone https://github.com/thettwe/my-spellchecker.git
cd my-spellchecker

# Create virtual environment
python3 -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate

# Install in development mode
pip install -e ".[dev]"

# Build Cython extensions
python setup.py build_ext --inplace

# Verify installation
pytest tests/ -x -q

Detailed Setup

1

Clone Repository

git clone https://github.com/thettwe/my-spellchecker.git
cd my-spellchecker
2

Create Virtual Environment

# Create environment
python3 -m venv venv

# Activate
source venv/bin/activate  # Linux/macOS
venv\Scripts\activate     # Windows
3

Install Dependencies

# Development dependencies (includes testing, linting)
pip install -e ".[dev]"

# All optional dependencies
pip install -e ".[dev,ai,transformers,examples]"
4

Build Cython Extensions

python setup.py build_ext --inplace
This compiles all 11 Cython modules:
  • text/normalize_c.pyx - Text normalization
  • algorithms/viterbi.pyx - POS tagging
  • algorithms/distance/edit_distance_c.pyx - Levenshtein distance
  • data_pipeline/batch_processor.pyx - Parallel batch processing
  • data_pipeline/frequency_counter.pyx - Fast frequency calculations
  • data_pipeline/ingester_c.pyx - Corpus ingestion
  • data_pipeline/repair_c.pyx - Segmentation repair
  • data_pipeline/tsv_reader_c.pyx - TSV file reading
  • tokenizers/cython/word_segment.pyx - Word segmentation
  • tokenizers/cython/mmap_reader.pyx - Memory-mapped file reading
  • core/syllable_rules_c.pyx - Syllable rule validation
5

Build Sample Database

myspellchecker build --sample
This creates a test database for development.

IDE Setup

VS Code

Recommended extensions:
  • Python (Microsoft)
  • Pylance
  • Cython
.vscode/settings.json:
{
    "python.defaultInterpreterPath": "${workspaceFolder}/venv/bin/python",
    "python.linting.enabled": true,
    "python.linting.ruffEnabled": true,
    "python.formatting.provider": "none",
    "[python]": {
        "editor.defaultFormatter": "charliermarsh.ruff",
        "editor.formatOnSave": true
    }
}

PyCharm

  1. Open project folder
  2. Configure interpreter: venv/bin/python
  3. Mark src as Sources Root
  4. Enable Ruff plugin for linting

Environment Variables

# Optional: Set database path
export MYSPELL_DATABASE_PATH=/path/to/custom.db

# Optional: Enable debug logging in your code:
#   from myspellchecker.utils.logging_utils import configure_logging
#   configure_logging(level="DEBUG")

# Optional: GPU device for transformers
export CUDA_VISIBLE_DEVICES=0

Verifying Setup

Run Tests

# All tests
pytest tests/

# Quick smoke test
pytest tests/ -x -q --tb=short

# With coverage
pytest tests/ --cov=myspellchecker

Check Cython

# Verify Cython extensions loaded
try:
    from myspellchecker.text.normalize_c import remove_zero_width_chars
    print("Cython normalize: loaded")
except ImportError:
    print("Cython normalize: not available (using pure Python fallback)")

from myspellchecker.algorithms.viterbi import _HAS_CYTHON_VITERBI
print(f"Cython viterbi: {_HAS_CYTHON_VITERBI}")

Test Spell Checker

from myspellchecker import SpellChecker

checker = SpellChecker()
result = checker.check("မြန်မာ")
print(f"Working: {not result.has_errors}")

Common Issues

Cython Build Fails

Error: fatal error: Python.h: No such file or directory Solution: Install Python development headers
# Ubuntu/Debian
sudo apt install python3-dev

# Fedora
sudo dnf install python3-devel

# macOS (usually included with Python)
xcode-select --install

OpenMP Not Found (macOS)

Error: ld: library not found for -lomp Solution:
brew install libomp
export LDFLAGS="-L/opt/homebrew/opt/libomp/lib"
export CPPFLAGS="-I/opt/homebrew/opt/libomp/include"
python setup.py build_ext --inplace

Database Not Found

Error: MissingDatabaseError Solution:
# Build sample database
myspellchecker build --sample

# Or specify path
export MYSPELL_DATABASE_PATH=/path/to/db

Import Errors

Error: ModuleNotFoundError: No module named 'myspellchecker' Solution:
# Ensure installed in editable mode
pip install -e .

# Verify installation
pip show myspellchecker

Development Workflow

1

Create Branch

git checkout -b feature/my-feature
2

Make Changes

Edit code in src/myspellchecker/
3

Run Quality Checks

# Lint
ruff check .

# Format
ruff format .

# Type check
mypy src/myspellchecker

# Tests
pytest tests/
4

Rebuild Cython (if modified .pyx)

python setup.py build_ext --inplace
5

Commit and Push

git add .
git commit -m "feat: description"
git push origin feature/my-feature

See Also