CI/CD¶
Target Audience: Developers understanding/modifying CI pipelines
Purpose: Understand and work with CI/CD workflows, quality gates, and branch protection
When to Use This Guide¶
Use this guide if you're: - ✅ Understanding the CI pipeline (what runs on PRs, why builds fail) - ✅ Fixing CI failures (quality gates, test failures, security scans) - ✅ Modifying quality gates (adding checks, changing thresholds) - ✅ Setting up branch protection (required checks, merge requirements) - ✅ Running CI locally (validate before pushing)
Related Documentation¶
- Workflow: Development Workflow - Git workflow, make commands
- Testing: Testing Strategy - Test architecture, running tests
- Security: Security Guide - Security scanning, pickle policy
- Docker: Docker Guide - Container builds, deployment
Workflow Overview¶
Active Workflows¶
The repository has 5 CI/CD workflows in .github/workflows/:
| Workflow | File | Triggers | Duration | Purpose |
|---|---|---|---|---|
| CI Pipeline | ci.yml |
Push, PR to dev/main | ~20 min | Quality gates + tests |
| Docker CI | docker-ci.yml |
Push, PR | ~10 min | Verify Docker builds |
| E2E Benchmarks | benchmark.yml |
Weekly (Sun 2am), Manual | ~120 min | Novo parity validation |
| Security | codeql.yml |
Push to main, Weekly | ~5 min | CodeQL SAST scanning |
| Dependencies | dependencies.yml |
Push, PR, Daily | ~3 min | Dependency security audit |
Trigger Summary¶
Every PR: - ✅ ci.yml - Quality gates, unit tests, integration tests, security - ✅ docker-ci.yml - Docker build verification
Weekly: - ✅ benchmark.yml - Full E2E suite (Sun 2am UTC) - ✅ codeql.yml - CodeQL security scan (weekly) - ✅ dependencies.yml - Dependency update check (daily)
Manual: - ✅ benchmark.yml - Run E2E benchmarks on demand - ✅ dependencies.yml - Check for dependency updates
Quality Gate Workflow (ci.yml)¶
Overview¶
Primary CI pipeline that runs on every push and PR to dev or main.
Total runtime: ~20 minutes
5 jobs run in parallel:
- quality - Code quality gates (ruff, mypy, bandit)
- test-unit - Unit tests with coverage
- test-integration - Integration tests
- security - Dependency security audit
- ci-success - Summary job (requires all others to pass)
Job 1: Code Quality¶
Duration: ~3-5 minutes
Steps:
- Ruff lint (uv run ruff check .)
- Ruff format check (uv run ruff format --check .)
- Mypy type checking (uv run mypy src/ --strict)
- Bandit security scan (uv run bandit -r src/)
What it checks: - Code follows style guidelines (ruff) - No formatting issues (ruff format) - 100% type coverage (mypy strict mode) - No security vulnerabilities in code (bandit)
If it fails:
- Run make format to fix formatting
- Run make lint to see linting errors
- Run make typecheck to see type errors
- Check bandit output for security issues
Job 2: Unit Tests¶
Duration: ~5-8 minutes
Steps:
- Run unit tests (pytest tests/unit/)
- Generate coverage report (--cov=src/antibody_training_esm)
- Upload coverage to Codecov
- Enforce coverage threshold (≥70%)
Coverage requirement: ≥70% (currently at 90.80%)
What it checks: - All unit tests pass - No test failures or errors - Coverage doesn't drop below 70%
If it fails:
- Run uv run pytest tests/unit/ -v locally
- Check which tests failed
- Fix failing tests
- If coverage drops, add tests for uncovered code
Job 3: Integration Tests¶
Duration: ~8-10 minutes
Steps:
- Cache ESM-1v model (~2GB)
- Run integration tests (pytest tests/integration/)
- Upload test results
What it checks: - Multi-component interactions work - ESM model loading works - Dataset loaders work - End-to-end data flow works
If it fails:
- Run uv run pytest tests/integration/ -v locally
- Check integration test failures
- Verify ESM model can be loaded
- Check dataset files exist
Job 4: Security¶
Duration: ~3-5 minutes
Steps:
- Install security tools (pip-audit, safety)
- Run pip-audit (check for CVEs in dependencies)
- Run safety scan (PyPI vulnerability database)
- Upload security reports
What it checks: - No HIGH/CRITICAL CVEs in dependencies - Dependencies are up-to-date - No known security vulnerabilities
If it fails:
- Check pip-audit.json artifact
- Update vulnerable dependencies: uv lock --upgrade
- Re-run tests to ensure updates don't break anything
Job 5: CI Success Summary¶
Duration: ~10 seconds
What it does: - Waits for all 4 jobs to complete - Fails if any job failed - Posts summary to PR
This is the required check for branch protection (ensures all gates passed).
Docker CI Workflow (docker-ci.yml)¶
Overview¶
Verifies Docker builds work without loading images (disk space constraints).
2 jobs:
- test-dev - Build development container
- test-prod - Build production container (includes model weights)
Runtime: ~10 minutes
What it checks¶
Dockerfile.devbuilds successfullyDockerfile.prodbuilds successfully (if exists)- No Docker build errors
- Layers cached for faster rebuilds
GHCR Publishing¶
Only on main branch:
- Tags images with commit SHA and latest
- Pushes to GitHub Container Registry (ghcr.io)
If it fails¶
- Check Dockerfile syntax
- Ensure base images are accessible
- Verify COPY paths exist
- Check Docker build logs in CI
Security Workflows¶
CodeQL (codeql.yml)¶
Purpose: Static Application Security Testing (SAST) for Python code
Triggers: - Push to main branches - Pull requests - Weekly schedule
Duration: ~5 minutes
What it checks: - SQL injection vulnerabilities - Command injection - Path traversal - Hard-coded credentials - Other OWASP Top 10 issues
If it fails: - Review CodeQL alerts in GitHub Security tab - Fix identified vulnerabilities - Re-run scan
Dependencies (dependencies.yml)¶
Purpose: Scan dependencies for known vulnerabilities
Triggers: - Push to any branch - Pull requests - Daily schedule (6am UTC)
Duration: ~3 minutes
What it checks: - Known CVEs in dependencies (pip-audit) - Outdated packages with security fixes
If it fails:
- Check pip-audit.json artifact
- Update vulnerable dependencies: uv lock --upgrade
- Test that updates don't break functionality
Benchmark Workflow (benchmark.yml)¶
Overview¶
Full E2E testing and Novo Nordisk parity validation
Triggers: - Weekly schedule (Sunday 2am UTC) - Manual workflow dispatch - Release tags (v*)
Duration: ~120 minutes (full suite with Harvey)
What it validates¶
- Boughter (training set):
- 10-fold CV accuracy: 67-71%
- Proper stratification
-
No data leakage
-
Jain (test set):
- Confusion matrix: [[40,17],[10,19]] (EXACT NOVO MATCH)
- Accuracy: 68.60%
-
ELISA threshold: 0.5
-
Shehata (PSR test set):
-
Accuracy: 58.29% (auto PSR threshold: 0.5495)
-
Harvey (nanobody test set):
- Accuracy: 61.33% (vs Novo 61.7% = -0.37pp)
- Status: Near-parity (our best gap across all datasets)
- 141k sequences (~90 minutes)
Why weekly, not every PR?¶
- Harvey alone takes 90 minutes
- Too expensive to run on every PR
- Weekly validation catches regressions
If parity fails¶
- Check benchmark artifacts (confusion matrices)
- Compare with archived parity results (
archivebranch:experiments/benchmarks/novo_parity/) - Investigate changes since last passing run
- Issue created automatically on failure
Branch Protection¶
Protected Branches¶
Main branches with protection:
- main (default branch)
Required Checks¶
Before merging to protected branches, PRs must pass:
- ✅ Code Quality (ruff, mypy, bandit) - ci.yml quality job
- ✅ Unit Tests (Python 3.12) - ci.yml test-unit job
- ✅ Integration Tests (Python 3.12) - ci.yml test-integration job
- ✅ Dependency Security Audit - ci.yml security job
- ✅ CI Pipeline Success - ci.yml ci-success job
- ✅ Test Development Container - docker-ci.yml test-dev job
- ✅ Test Production Container - docker-ci.yml test-prod job (if exists)
Merge Requirements¶
- 1 approval required - At least one reviewer must approve
- Dismiss stale approvals - Re-approval needed after new commits
- Conversation resolution - All review comments must be resolved
- Up-to-date branches - PR must be rebased with latest main
- No force pushes - Force push blocked on protected branches
- No deletions - Branch cannot be deleted
Setting Up Branch Protection¶
First-time setup:
- Go to: Settings → Branches
- Click "Add branch protection rule"
- Configure:
Branch name pattern: main
Enable: - ✅ Require a pull request before merging (1 approval) - ✅ Dismiss stale pull request approvals when new commits are pushed - ✅ Require status checks to pass before merging - ✅ Require branches to be up to date before merging - ✅ Require conversation resolution before merging - ✅ Include administrators
Disable: - ❌ Allow force pushes - ❌ Allow deletions
- Add required status checks:
Code Quality (ruff, mypy, bandit)Unit Tests (Python 3.12)Integration Tests (Python 3.12)Dependency Security AuditCI Pipeline Success-
Test Development Container -
Click "Create"
Note: Status checks only appear after CI runs at least once. Save the rule first, then edit after first PR.
Verifying Branch Protection¶
# Test 1: Try to push directly (should fail)
git checkout main
git push origin main
# Expected: "Protected branch update failed"
# Test 2: Create PR (should require checks)
git checkout -b test/branch-protection
git commit --allow-empty -m "test: Verify protection"
git push origin test/branch-protection
# Create PR on GitHub - should show required checks
Local Testing¶
Run Full CI Suite Locally¶
# Run all quality gates
make all
# Equivalent to: make format lint typecheck test (fast suite)
# Individual commands
make format # Ruff format
make lint # Ruff lint
make typecheck # Mypy strict
make test # Fast suite (unit + integration; skips e2e/slow/gpu)
make test-e2e # End-to-end suite (honors env flags for heavy tests)
make test-all # Full pytest (env-gated e2e may still skip without data/flags)
make coverage # Pytest with coverage report
Run Specific Test Suites¶
# Unit tests only (fast)
uv run pytest tests/unit/ -v
# Integration tests only
uv run pytest tests/integration/ -v
# E2E tests (expensive, use sparingly)
uv run pytest tests/e2e/ -v
# With coverage
uv run pytest tests/unit tests/integration \
--cov=src/antibody_training_esm \
--cov-report=term \
--cov-fail-under=70
Verify Before Pushing¶
# Full pre-push validation (recommended)
make all
# Quick validation (format + lint only)
make format lint
# Check if mypy will pass
uv run mypy src/ --strict
# Check if bandit will pass
uv run bandit -r src/
Troubleshooting¶
CI Failure: "Ruff format check failed"¶
Symptom: ruff format --check . fails
Fix:
# Auto-fix formatting
make format
# Or manually
uv run ruff format .
# Commit formatting changes
git add .
git commit -m "style: Fix ruff formatting"
CI Failure: "Mypy type checking failed"¶
Symptom: mypy src/ --strict reports type errors
Fix:
# Check errors locally
uv run mypy src/ --strict
# Fix type annotations in reported files
# Common fixes:
# - Add return type annotations
# - Add parameter type annotations
# - Import types from typing module
# Verify fixed
uv run mypy src/ --strict
CI Failure: "Coverage below threshold"¶
Symptom: coverage report --fail-under=70 fails
Fix:
# Check current coverage
uv run pytest tests/unit tests/integration \
--cov=src/antibody_training_esm \
--cov-report=term-missing
# Identify uncovered lines (look for line numbers)
# Add tests for uncovered code
# Verify coverage increased
uv run coverage report
CI Failure: "Bandit security scan failed"¶
Symptom: bandit -r src/ reports HIGH/CRITICAL issues
Fix:
# Run bandit locally
uv run bandit -r src/ -v
# Review reported issues
# Common issues:
# - Hard-coded passwords/secrets (move to env vars)
# - Unsafe pickle usage (ensure controlled environment)
# - SQL injection (use parameterized queries)
# Fix issues and verify
uv run bandit -r src/
CI Failure: "Unit tests failed"¶
Symptom: One or more unit tests failing
Fix:
# Run failing tests locally
uv run pytest tests/unit/ -v
# Run specific failing test
uv run pytest tests/unit/test_file.py::test_function -v
# Debug with print statements
uv run pytest tests/unit/ -v -s
# Drop into debugger on failure
uv run pytest tests/unit/ --pdb
CI Failure: "Integration tests failed"¶
Symptom: Integration tests failing (often ESM model loading)
Fix:
# Run integration tests locally
uv run pytest tests/integration/ -v
# Check if ESM model can be downloaded
python -c "from transformers import AutoModel; AutoModel.from_pretrained('facebook/esm1v_t33_650M_UR90S_1')"
# Clear HuggingFace cache if corrupted
rm -rf ~/.cache/huggingface
# Re-run tests
uv run pytest tests/integration/ -v
CI Failure: "pip-audit found vulnerabilities"¶
Symptom: Dependency security audit found CVEs
Fix:
# Check which dependencies are vulnerable
uv run pip-audit
# Update dependencies
uv lock --upgrade
# Test that updates don't break functionality
uv run pytest tests/
# Commit updated lock file
git add uv.lock
git commit -m "chore: Update dependencies (security fix)"
CI Failure: "Docker build failed"¶
Symptom: Docker container build fails
Fix:
# Build locally to see full error
docker-compose build dev
# Common issues:
# - Missing files referenced in COPY
# - Base image not found
# - Dependency installation fails
# Check Dockerfile syntax
docker build -f Dockerfile.dev -t test-build .
# Fix issues and rebuild
docker-compose build dev
CI Timeout: "Job exceeded time limit"¶
Symptom: CI job times out (>10 min for quality, >15 min for integration)
Possible causes: - ESM model download very slow - Large dataset causing long test runtime - Infinite loop in code
Fix:
# Check if ESM model is cached properly
# (CI should cache ~/.cache/huggingface)
# Check test runtime locally
uv run pytest tests/ --durations=10
# Identify slow tests and optimize or mark as slow
# Mark slow tests:
# @pytest.mark.slow
# def test_expensive_operation():
# ...
Monitoring & Maintenance¶
GitHub Actions Minutes¶
Free tier: 2000 minutes/month
Current usage estimate: - CI per PR: ~20 min × 20 PR/month = 400 min - Weekly benchmarks: 120 min × 4 = 480 min - Weekly deps: 3 min × 7 = 21 min - Total: ~900 min/month ✅ 45% of limit
Monitor usage: 1. Go to: Settings → Billing and plans 2. Check Actions usage 3. Alert if approaching 80% of limit
Weekly Maintenance¶
Every Monday: - [ ] Review dependency update PRs - [ ] Check security audit results
Every Sunday: - [ ] Review E2E benchmark results - [ ] Verify Novo parity still holds
Monthly Maintenance¶
- Review GitHub Actions minutes usage
- Check for workflow failures
- Rotate secrets if needed
Advanced Topics¶
Adding New Quality Gate¶
Example: Add pytest-xdist for parallel tests
-
Add to workflow:
-
Update requirements:
-
Test locally:
-
Commit workflow change
Modifying Coverage Threshold¶
Current: ≥70%
To increase to 75%:
-
Edit
.github/workflows/ci.ymlline 115: -
Ensure current coverage meets new threshold:
-
Commit change
Running Benchmarks Manually¶
# Via GitHub UI:
# Actions → E2E Benchmarking & Novo Parity → Run workflow
# Select branch: main
# Run Harvey: false (for quick test)
# Via gh CLI:
gh workflow run benchmark.yml -f run_harvey=false
Best Practices¶
Before Opening PR¶
- ✅ Run
make alllocally - ✅ Verify all tests pass
- ✅ Check coverage didn't drop
- ✅ Run bandit security scan
- ✅ Commit formatting/lint fixes
During PR Review¶
- ✅ Wait for all CI checks to pass
- ✅ Address reviewer comments
- ✅ Resolve all conversations
- ✅ Rebase if main has new commits
After PR Merge¶
- ✅ Delete branch
- ✅ Monitor main branch CI
- ✅ Check weekly benchmarks still pass
Last Updated: 2025-11-28
Branch: main