Development Workflow¶
Target Audience: Contributors to the codebase
Purpose: Learn the day-to-day development workflow, commands, git practices, and common tasks
When to Use This Guide¶
Use this guide if you're: - ✅ Setting up your development environment (first-time setup) - ✅ Running tests locally (before commits) - ✅ Formatting/linting code (code quality checks) - ✅ Making commits and pull requests (git workflow) - ✅ Understanding quality gates (CI requirements) - ✅ Performing common tasks (adding datasets, training models, debugging)
Quick Reference¶
Daily Commands:
make format # Auto-format code with ruff
make lint # Check linting with ruff
make typecheck # Type check with mypy (strict mode)
make test # Fast tests (unit + integration; skips e2e/slow/gpu)
make test-e2e # End-to-end suite (honors env flags, e.g., RUN_NOVO_E2E=1)
make test-all # Full pytest suite (env-gated e2e may still skip if data/flags absent)
make all # Run format → lint → typecheck → test (full quality gate)
Before Commit:
Training/Testing:
make train # Train with default config
uv run antibody-test --model experiments/checkpoints/esm1v/logreg/boughter_vh_esm1v_logreg.pkl --data data/test/jain/fragments/VH_only_jain.csv # Test model (hierarchical path)
Related Documentation¶
- Architecture: Architecture Guide - System design and components
- Testing: Testing Strategy - Test architecture and patterns
- Type Checking: Type Checking Guide - Type safety requirements
- Preprocessing: Preprocessing Internals - Dataset preprocessing patterns
Environment Setup¶
Initial Setup¶
Install all dependencies including dev tools:
This installs: - Core dependencies (torch, transformers, scikit-learn, pandas) - Dev tools (pytest, ruff, mypy, pre-commit, bandit) - CLI tools (click, pyyaml)
Pre-commit Hooks¶
Install pre-commit hooks to catch issues before committing:
What runs on commit:
- ruff format - Auto-format code
- ruff lint - Lint checks
- mypy - Type checking (strict mode)
Manual run:
Behavior: Failures block commits (intended - fix issues before committing)
Development Commands¶
Testing¶
Run fast suite (unit + integration):
Run end-to-end suite (opt-in flags honored):
make test-e2e
# Heavy tests:
# RUN_NOVO_E2E=1 make test-e2e # Novo accuracy reproduction (~650MB download)
# RUN_PREDICT_CLI_E2E=1 make test-e2e # Real-weights predict CLI test
Run full pytest (all markers; env-gated tests may still skip):
Run specific test file:
Run specific test by name:
Coverage report:
uv run pytest --cov=. --cov-report=html --cov-report=term-missing --cov-fail-under=70
# HTML report: htmlcov/index.html
# Terminal: Shows missing lines
# Enforced: ≥70% coverage required
Important: All tests must be tagged with unit, integration, e2e, or slow markers. Register new markers in pyproject.toml before using.
Code Quality¶
Format code:
Lint code:
Type check:
Run all quality checks:
Critical: This repo maintains 100% type safety. All functions must have complete type annotations. Mypy runs with disallow_untyped_defs=true.
Training & Testing¶
Train with default config:
Override parameters from CLI:
Test trained model:
uv run antibody-test --model experiments/checkpoints/esm1v/logreg/boughter_vh_esm1v_logreg.pkl --data data/test/jain/fragments/VH_only_jain.csv
# Note: antibody-test auto-writes results under experiments/benchmarks/{backbone}/{classifier}/{dataset}/
All CLI options:
uv run antibody-train --help
uv run antibody-train --cfg job # Show resolved config
uv run antibody-test --help
Preprocessing¶
Boughter (training set):
python3 preprocessing/boughter/stage1_dna_translation.py
python3 preprocessing/boughter/stage2_stage3_annotation_qc.py
Jain (test set - Novo parity benchmark):
python3 preprocessing/jain/step1_convert_excel_to_csv.py
python3 preprocessing/jain/step2_preprocess_p5e_s2.py
Harvey (nanobody test set):
python3 preprocessing/harvey/step1_convert_raw_csvs.py
python3 preprocessing/harvey/step2_extract_fragments.py
Shehata (PSR assay test set):
python3 preprocessing/shehata/step1_convert_excel_to_csv.py
python3 preprocessing/shehata/step2_extract_fragments.py
Common Tasks¶
Adding a New Dataset¶
-
Create preprocessing directory:
-
Implement preprocessing pipeline:
- Convert Excel/CSV to canonical format
- Follow patterns in existing preprocessing scripts
-
Create dataset loader:
-
Add dataset documentation:
-
Update preprocessing README:
Training a New Model¶
-
Override parameters from CLI (no need to create new config files):
-
Model saved to:
Running Hyperparameter Sweeps¶
-
See reference implementation:
-
Create sweep config:
- Define parameter grid (e.g., C=[0.01, 0.1, 1.0, 10.0])
- Train model for each configuration
-
Embeddings auto-cached for fast re-runs
-
Compare results:
- Log cross-validation metrics
- Select best hyperparameters
Debugging Test Failures¶
Run specific test with verbose output:
Show print statements:
Drop into debugger on failure:
Check test fixtures:
Git Workflow¶
Main Branches¶
main: Main branch (production-ready code)dev: Development branch (active work)
Branch Strategy: Feature branches merge to dev → dev merges to main
Commit Conventions¶
Use Conventional Commits:
- fix: - Bug fixes
- feat: - New features
- docs: - Documentation changes
- test: - Test additions/modifications
- refactor: - Code refactoring (no behavior change)
- chore: - Maintenance tasks
Format: - Imperative mood ("Add feature" not "Added feature") - ≤72 characters for subject line - Body explains what and why (not how)
Example:
fix: Correct PSR threshold to 0.5495 for Novo parity
The PSR assay threshold was incorrectly set to 0.5, causing a 6.3pp
accuracy gap on Shehata dataset. Setting to 0.5495 achieves exact
parity with Novo Nordisk benchmarks (58.8% accuracy).
Fixes: #123
Pull Requests¶
Requirements:
- ✅ All CI checks pass (quality + tests)
- ✅ Coverage ≥70% maintained
- ✅ All tests have markers (unit, integration, e2e)
- ✅ Type annotations complete (mypy strict mode)
PR Description Must Include:
1. Scope summary: What changed and why
2. Issue links: Fixes #123, Relates to #456
3. Commands run: make all, make coverage
4. New artifacts: Call out new data paths, models, configs
5. Testing: How changes were validated
Best Practices: - Keep refactors separate from feature/data work - One logical change per PR - Include before/after examples for user-facing changes - Document breaking changes clearly
Quality Gates¶
Pre-commit (Local)¶
Runs automatically on git commit:
- ruff format - Code formatting
- ruff lint - Linting
- mypy - Type checking
If failures occur: Fix issues before committing (commits blocked)
CI Pipeline (Remote)¶
Runs on all PRs and commits to main branches:
Quality Checks:
- ruff (format + lint)
- mypy (type checking, strict mode)
- bandit (security scanning)
Testing: - Unit tests (fast, < 1s each) - Integration tests (multi-component) - Coverage enforcement (≥70%)
E2E Tests: - Scheduled runs only (expensive) - Full pipeline validation
Merge Requirements: - All quality checks pass - All tests pass - Coverage ≥70% - Bandit shows 0 findings
Last Updated: 2025-11-28
Branch: main