Skip to content

Development Workflow

Target Audience: Contributors to the codebase

Purpose: Learn the day-to-day development workflow, commands, git practices, and common tasks


When to Use This Guide

Use this guide if you're: - ✅ Setting up your development environment (first-time setup) - ✅ Running tests locally (before commits) - ✅ Formatting/linting code (code quality checks) - ✅ Making commits and pull requests (git workflow) - ✅ Understanding quality gates (CI requirements) - ✅ Performing common tasks (adding datasets, training models, debugging)


Quick Reference

Daily Commands:

make format      # Auto-format code with ruff
make lint        # Check linting with ruff
make typecheck   # Type check with mypy (strict mode)
make test        # Fast tests (unit + integration; skips e2e/slow/gpu)
make test-e2e    # End-to-end suite (honors env flags, e.g., RUN_NOVO_E2E=1)
make test-all    # Full pytest suite (env-gated e2e may still skip if data/flags absent)
make all         # Run format → lint → typecheck → test (full quality gate)

Before Commit:

make hooks       # Run pre-commit checks manually

Training/Testing:

make train       # Train with default config
uv run antibody-test --model experiments/checkpoints/esm1v/logreg/boughter_vh_esm1v_logreg.pkl --data data/test/jain/fragments/VH_only_jain.csv  # Test model (hierarchical path)



Environment Setup

Initial Setup

Install all dependencies including dev tools:

uv sync --all-extras

This installs: - Core dependencies (torch, transformers, scikit-learn, pandas) - Dev tools (pytest, ruff, mypy, pre-commit, bandit) - CLI tools (click, pyyaml)

Pre-commit Hooks

Install pre-commit hooks to catch issues before committing:

uv run pre-commit install

What runs on commit: - ruff format - Auto-format code - ruff lint - Lint checks - mypy - Type checking (strict mode)

Manual run:

make hooks

Behavior: Failures block commits (intended - fix issues before committing)


Development Commands

Testing

Run fast suite (unit + integration):

make test

Run end-to-end suite (opt-in flags honored):

make test-e2e
# Heavy tests:
#   RUN_NOVO_E2E=1 make test-e2e          # Novo accuracy reproduction (~650MB download)
#   RUN_PREDICT_CLI_E2E=1 make test-e2e   # Real-weights predict CLI test

Run full pytest (all markers; env-gated tests may still skip):

make test-all

Run specific test file:

uv run pytest tests/unit/core/test_trainer.py

Run specific test by name:

uv run pytest -k test_function_name

Coverage report:

uv run pytest --cov=. --cov-report=html --cov-report=term-missing --cov-fail-under=70
# HTML report: htmlcov/index.html
# Terminal: Shows missing lines
# Enforced: ≥70% coverage required

Important: All tests must be tagged with unit, integration, e2e, or slow markers. Register new markers in pyproject.toml before using.


Code Quality

Format code:

make format      # Auto-format with ruff (modifies files in-place)

Lint code:

make lint        # Check linting with ruff (no modifications)

Type check:

make typecheck   # Type check with mypy (strict mode)

Run all quality checks:

make all         # Format → Lint → Typecheck → Test

Critical: This repo maintains 100% type safety. All functions must have complete type annotations. Mypy runs with disallow_untyped_defs=true.


Training & Testing

Train with default config:

make train
# Uses: src/antibody_training_esm/conf/config.yaml (Boughter train, Jain test)

Override parameters from CLI:

uv run antibody-train experiment.name=my_experiment hardware.device=cuda

Test trained model:

uv run antibody-test --model experiments/checkpoints/esm1v/logreg/boughter_vh_esm1v_logreg.pkl --data data/test/jain/fragments/VH_only_jain.csv
# Note: antibody-test auto-writes results under experiments/benchmarks/{backbone}/{classifier}/{dataset}/

All CLI options:

uv run antibody-train --help
uv run antibody-train --cfg job  # Show resolved config
uv run antibody-test --help


Preprocessing

Boughter (training set):

python3 preprocessing/boughter/stage1_dna_translation.py
python3 preprocessing/boughter/stage2_stage3_annotation_qc.py

Jain (test set - Novo parity benchmark):

python3 preprocessing/jain/step1_convert_excel_to_csv.py
python3 preprocessing/jain/step2_preprocess_p5e_s2.py

Harvey (nanobody test set):

python3 preprocessing/harvey/step1_convert_raw_csvs.py
python3 preprocessing/harvey/step2_extract_fragments.py

Shehata (PSR assay test set):

python3 preprocessing/shehata/step1_convert_excel_to_csv.py
python3 preprocessing/shehata/step2_extract_fragments.py


Common Tasks

Adding a New Dataset

  1. Create preprocessing directory:

    mkdir -p preprocessing/{dataset}/
    

  2. Implement preprocessing pipeline:

  3. Convert Excel/CSV to canonical format
  4. Follow patterns in existing preprocessing scripts
  5. See Preprocessing Internals

  6. Create dataset loader:

    # Create src/antibody_training_esm/datasets/{dataset}.py
    # Extend AntibodyDataset base class
    

  7. Add dataset documentation:

    mkdir -p docs/datasets/{dataset}/
    # Document dataset source, preprocessing, quirks
    

  8. Update preprocessing README:

    # Add dataset to preprocessing/README.md
    


Training a New Model

  1. Override parameters from CLI (no need to create new config files):

    uv run antibody-train \
      experiment.name=my_experiment \
      training.model_name=my_model \
      data.train_file="data/train/{dataset}/canonical/VH_only.csv" \
      data.test_file="data/test/{dataset}/canonical/VH_only.csv" \
      classifier.C=1.0 \
      classifier.penalty=l2
    

  2. Model saved to:

    experiments/runs/{experiment.name}/{timestamp}/{model_name}.pkl
    experiments/runs/{experiment.name}/{timestamp}/training.log
    experiments/runs/{experiment.name}/{timestamp}/.hydra/config.yaml
    


Running Hyperparameter Sweeps

  1. See reference implementation:

    cat preprocessing/boughter/train_hyperparameter_sweep.py
    

  2. Create sweep config:

  3. Define parameter grid (e.g., C=[0.01, 0.1, 1.0, 10.0])
  4. Train model for each configuration
  5. Embeddings auto-cached for fast re-runs

  6. Compare results:

  7. Log cross-validation metrics
  8. Select best hyperparameters

Debugging Test Failures

Run specific test with verbose output:

uv run pytest tests/unit/core/test_trainer.py -v

Show print statements:

uv run pytest -s

Drop into debugger on failure:

uv run pytest --pdb

Check test fixtures:

ls tests/fixtures/mock_datasets/
# Deterministic test data lives here


Git Workflow

Main Branches

  • main: Main branch (production-ready code)
  • dev: Development branch (active work)

Branch Strategy: Feature branches merge to devdev merges to main


Commit Conventions

Use Conventional Commits: - fix: - Bug fixes - feat: - New features - docs: - Documentation changes - test: - Test additions/modifications - refactor: - Code refactoring (no behavior change) - chore: - Maintenance tasks

Format: - Imperative mood ("Add feature" not "Added feature") - ≤72 characters for subject line - Body explains what and why (not how)

Example:

fix: Correct PSR threshold to 0.5495 for Novo parity

The PSR assay threshold was incorrectly set to 0.5, causing a 6.3pp
accuracy gap on Shehata dataset. Setting to 0.5495 achieves exact
parity with Novo Nordisk benchmarks (58.8% accuracy).

Fixes: #123


Pull Requests

Requirements: - ✅ All CI checks pass (quality + tests) - ✅ Coverage ≥70% maintained - ✅ All tests have markers (unit, integration, e2e) - ✅ Type annotations complete (mypy strict mode)

PR Description Must Include: 1. Scope summary: What changed and why 2. Issue links: Fixes #123, Relates to #456 3. Commands run: make all, make coverage 4. New artifacts: Call out new data paths, models, configs 5. Testing: How changes were validated

Best Practices: - Keep refactors separate from feature/data work - One logical change per PR - Include before/after examples for user-facing changes - Document breaking changes clearly


Quality Gates

Pre-commit (Local)

Runs automatically on git commit: - ruff format - Code formatting - ruff lint - Linting - mypy - Type checking

If failures occur: Fix issues before committing (commits blocked)


CI Pipeline (Remote)

Runs on all PRs and commits to main branches:

Quality Checks: - ruff (format + lint) - mypy (type checking, strict mode) - bandit (security scanning)

Testing: - Unit tests (fast, < 1s each) - Integration tests (multi-component) - Coverage enforcement (≥70%)

E2E Tests: - Scheduled runs only (expensive) - Full pipeline validation

Merge Requirements: - All quality checks pass - All tests pass - Coverage ≥70% - Bandit shows 0 findings


Last Updated: 2025-11-28 Branch: main