CI/CD¶

Target Audience: Developers understanding/modifying CI pipelines

Purpose: Understand and work with CI/CD workflows, quality gates, and branch protection

When to Use This Guide¶

Use this guide if you're: - ✅ Understanding the CI pipeline (what runs on PRs, why builds fail) - ✅ Fixing CI failures (quality gates, test failures, security scans) - ✅ Modifying quality gates (adding checks, changing thresholds) - ✅ Setting up branch protection (required checks, merge requirements) - ✅ Running CI locally (validate before pushing)

Workflow: Development Workflow - Git workflow, make commands
Testing: Testing Strategy - Test architecture, running tests
Security: Security Guide - Security scanning, pickle policy
Docker: Docker Guide - Container builds, deployment

Workflow Overview¶

Active Workflows¶

The repository has 5 CI/CD workflows in .github/workflows/:

Workflow	File	Triggers	Duration	Purpose
CI Pipeline	`ci.yml`	Push, PR to dev/main	~20 min	Quality gates + tests
Docker CI	`docker-ci.yml`	Push, PR	~10 min	Verify Docker builds
E2E Benchmarks	`benchmark.yml`	Weekly (Sun 2am), Manual	~120 min	Novo parity validation
Security	`codeql.yml`	Push to main, Weekly	~5 min	CodeQL SAST scanning
Dependencies	`dependencies.yml`	Push, PR, Daily	~3 min	Dependency security audit

Trigger Summary¶

Every PR: - ✅ ci.yml - Quality gates, unit tests, integration tests, security - ✅ docker-ci.yml - Docker build verification

Weekly: - ✅ benchmark.yml - Full E2E suite (Sun 2am UTC) - ✅ codeql.yml - CodeQL security scan (weekly) - ✅ dependencies.yml - Dependency update check (daily)

Manual: - ✅ benchmark.yml - Run E2E benchmarks on demand - ✅ dependencies.yml - Check for dependency updates

Quality Gate Workflow (ci.yml)¶

Overview¶

Primary CI pipeline that runs on every push and PR to dev or main.

Total runtime: ~20 minutes

5 jobs run in parallel:

quality - Code quality gates (ruff, mypy, bandit)
test-unit - Unit tests with coverage
test-integration - Integration tests
security - Dependency security audit
ci-success - Summary job (requires all others to pass)

Job 1: Code Quality¶

Duration: ~3-5 minutes

Steps:

- Ruff lint (uv run ruff check .)
- Ruff format check (uv run ruff format --check .)
- Mypy type checking (uv run mypy src/ --strict)
- Bandit security scan (uv run bandit -r src/)

What it checks: - Code follows style guidelines (ruff) - No formatting issues (ruff format) - 100% type coverage (mypy strict mode) - No security vulnerabilities in code (bandit)

If it fails: - Run make format to fix formatting - Run make lint to see linting errors - Run make typecheck to see type errors - Check bandit output for security issues

Job 2: Unit Tests¶

Duration: ~5-8 minutes

Steps:

- Run unit tests (pytest tests/unit/)
- Generate coverage report (--cov=src/antibody_training_esm)
- Upload coverage to Codecov
- Enforce coverage threshold (≥70%)

Coverage requirement: ≥70% (currently at 90.80%)

What it checks: - All unit tests pass - No test failures or errors - Coverage doesn't drop below 70%

If it fails: - Run uv run pytest tests/unit/ -v locally - Check which tests failed - Fix failing tests - If coverage drops, add tests for uncovered code

Job 3: Integration Tests¶

Duration: ~8-10 minutes

Steps:

- Cache ESM-1v model (~2GB)
- Run integration tests (pytest tests/integration/)
- Upload test results

What it checks: - Multi-component interactions work - ESM model loading works - Dataset loaders work - End-to-end data flow works

If it fails: - Run uv run pytest tests/integration/ -v locally - Check integration test failures - Verify ESM model can be loaded - Check dataset files exist

Job 4: Security¶

Duration: ~3-5 minutes

Steps:

- Install security tools (pip-audit, safety)
- Run pip-audit (check for CVEs in dependencies)
- Run safety scan (PyPI vulnerability database)
- Upload security reports

What it checks: - No HIGH/CRITICAL CVEs in dependencies - Dependencies are up-to-date - No known security vulnerabilities

If it fails: - Check pip-audit.json artifact - Update vulnerable dependencies: uv lock --upgrade - Re-run tests to ensure updates don't break anything

Job 5: CI Success Summary¶

Duration: ~10 seconds

What it does: - Waits for all 4 jobs to complete - Fails if any job failed - Posts summary to PR

This is the required check for branch protection (ensures all gates passed).

Docker CI Workflow (docker-ci.yml)¶

Overview¶

Verifies Docker builds work without loading images (disk space constraints).

2 jobs:

test-dev - Build development container
test-prod - Build production container (includes model weights)

Runtime: ~10 minutes

What it checks¶

Dockerfile.dev builds successfully
Dockerfile.prod builds successfully (if exists)
No Docker build errors
Layers cached for faster rebuilds

GHCR Publishing¶

Only on main branch: - Tags images with commit SHA and latest - Pushes to GitHub Container Registry (ghcr.io)

If it fails¶

Check Dockerfile syntax
Ensure base images are accessible
Verify COPY paths exist
Check Docker build logs in CI

Security Workflows¶

CodeQL (codeql.yml)¶

Purpose: Static Application Security Testing (SAST) for Python code

Triggers: - Push to main branches - Pull requests - Weekly schedule

Duration: ~5 minutes

What it checks: - SQL injection vulnerabilities - Command injection - Path traversal - Hard-coded credentials - Other OWASP Top 10 issues

If it fails: - Review CodeQL alerts in GitHub Security tab - Fix identified vulnerabilities - Re-run scan

Dependencies (dependencies.yml)¶

Purpose: Scan dependencies for known vulnerabilities

Triggers: - Push to any branch - Pull requests - Daily schedule (6am UTC)

Duration: ~3 minutes

What it checks: - Known CVEs in dependencies (pip-audit) - Outdated packages with security fixes

If it fails: - Check pip-audit.json artifact - Update vulnerable dependencies: uv lock --upgrade - Test that updates don't break functionality

Benchmark Workflow (benchmark.yml)¶

Overview¶

Full E2E testing and Novo Nordisk parity validation

Triggers: - Weekly schedule (Sunday 2am UTC) - Manual workflow dispatch - Release tags (v*)

Duration: ~120 minutes (full suite with Harvey)

What it validates¶

Boughter (training set):
10-fold CV accuracy: 67-71%
Proper stratification
No data leakage
Jain (test set):
Confusion matrix: [[40,17],[10,19]] (EXACT NOVO MATCH)
Accuracy: 68.60%
ELISA threshold: 0.5
Shehata (PSR test set):
Accuracy: 58.29% (auto PSR threshold: 0.5495)
Harvey (nanobody test set):
Accuracy: 61.33% (vs Novo 61.7% = -0.37pp)
Status: Near-parity (our best gap across all datasets)
141k sequences (~90 minutes)

Why weekly, not every PR?¶

Harvey alone takes 90 minutes
Too expensive to run on every PR
Weekly validation catches regressions

If parity fails¶

Check benchmark artifacts (confusion matrices)
Compare with archived parity results (archive branch: experiments/benchmarks/novo_parity/)
Investigate changes since last passing run
Issue created automatically on failure

Branch Protection¶

Protected Branches¶

Main branches with protection: - main (default branch)

Required Checks¶

Before merging to protected branches, PRs must pass:

✅ Code Quality (ruff, mypy, bandit) - ci.yml quality job
✅ Unit Tests (Python 3.12) - ci.yml test-unit job
✅ Integration Tests (Python 3.12) - ci.yml test-integration job
✅ Dependency Security Audit - ci.yml security job
✅ CI Pipeline Success - ci.yml ci-success job
✅ Test Development Container - docker-ci.yml test-dev job
✅ Test Production Container - docker-ci.yml test-prod job (if exists)

Merge Requirements¶

1 approval required - At least one reviewer must approve
Dismiss stale approvals - Re-approval needed after new commits
Conversation resolution - All review comments must be resolved
Up-to-date branches - PR must be rebased with latest main
No force pushes - Force push blocked on protected branches
No deletions - Branch cannot be deleted

Setting Up Branch Protection¶

First-time setup:

Go to: Settings → Branches
Click "Add branch protection rule"
Configure:

Branch name pattern: main

Enable: - ✅ Require a pull request before merging (1 approval) - ✅ Dismiss stale pull request approvals when new commits are pushed - ✅ Require status checks to pass before merging - ✅ Require branches to be up to date before merging - ✅ Require conversation resolution before merging - ✅ Include administrators

Disable: - ❌ Allow force pushes - ❌ Allow deletions

Add required status checks:
Code Quality (ruff, mypy, bandit)
Unit Tests (Python 3.12)
Integration Tests (Python 3.12)
Dependency Security Audit
CI Pipeline Success
Test Development Container
Click "Create"

Note: Status checks only appear after CI runs at least once. Save the rule first, then edit after first PR.

Verifying Branch Protection¶

# Test 1: Try to push directly (should fail)
git checkout main
git push origin main
# Expected: "Protected branch update failed"

# Test 2: Create PR (should require checks)
git checkout -b test/branch-protection
git commit --allow-empty -m "test: Verify protection"
git push origin test/branch-protection
# Create PR on GitHub - should show required checks

Local Testing¶

Run Full CI Suite Locally¶

# Run all quality gates
make all
# Equivalent to: make format lint typecheck test (fast suite)

# Individual commands
make format      # Ruff format
make lint        # Ruff lint
make typecheck   # Mypy strict
make test        # Fast suite (unit + integration; skips e2e/slow/gpu)
make test-e2e    # End-to-end suite (honors env flags for heavy tests)
make test-all    # Full pytest (env-gated e2e may still skip without data/flags)
make coverage    # Pytest with coverage report

Run Specific Test Suites¶

# Unit tests only (fast)
uv run pytest tests/unit/ -v

# Integration tests only
uv run pytest tests/integration/ -v

# E2E tests (expensive, use sparingly)
uv run pytest tests/e2e/ -v

# With coverage
uv run pytest tests/unit tests/integration \
  --cov=src/antibody_training_esm \
  --cov-report=term \
  --cov-fail-under=70

Verify Before Pushing¶

# Full pre-push validation (recommended)
make all

# Quick validation (format + lint only)
make format lint

# Check if mypy will pass
uv run mypy src/ --strict

# Check if bandit will pass
uv run bandit -r src/

Troubleshooting¶

CI Failure: "Ruff format check failed"¶

Symptom: ruff format --check . fails

Fix:

# Auto-fix formatting
make format

# Or manually
uv run ruff format .

# Commit formatting changes
git add .
git commit -m "style: Fix ruff formatting"

CI Failure: "Mypy type checking failed"¶

Symptom: mypy src/ --strict reports type errors

Fix:

# Check errors locally
uv run mypy src/ --strict

# Fix type annotations in reported files
# Common fixes:
# - Add return type annotations
# - Add parameter type annotations
# - Import types from typing module

# Verify fixed
uv run mypy src/ --strict

CI Failure: "Coverage below threshold"¶

Symptom: coverage report --fail-under=70 fails

Fix:

# Check current coverage
uv run pytest tests/unit tests/integration \
  --cov=src/antibody_training_esm \
  --cov-report=term-missing

# Identify uncovered lines (look for line numbers)
# Add tests for uncovered code

# Verify coverage increased
uv run coverage report

CI Failure: "Bandit security scan failed"¶

Symptom: bandit -r src/ reports HIGH/CRITICAL issues

Fix:

# Run bandit locally
uv run bandit -r src/ -v

# Review reported issues
# Common issues:
# - Hard-coded passwords/secrets (move to env vars)
# - Unsafe pickle usage (ensure controlled environment)
# - SQL injection (use parameterized queries)

# Fix issues and verify
uv run bandit -r src/

CI Failure: "Unit tests failed"¶

Symptom: One or more unit tests failing

Fix:

# Run failing tests locally
uv run pytest tests/unit/ -v

# Run specific failing test
uv run pytest tests/unit/test_file.py::test_function -v

# Debug with print statements
uv run pytest tests/unit/ -v -s

# Drop into debugger on failure
uv run pytest tests/unit/ --pdb

CI Failure: "Integration tests failed"¶

Symptom: Integration tests failing (often ESM model loading)

Fix:

# Run integration tests locally
uv run pytest tests/integration/ -v

# Check if ESM model can be downloaded
python -c "from transformers import AutoModel; AutoModel.from_pretrained('facebook/esm1v_t33_650M_UR90S_1')"

# Clear HuggingFace cache if corrupted
rm -rf ~/.cache/huggingface

# Re-run tests
uv run pytest tests/integration/ -v

CI Failure: "pip-audit found vulnerabilities"¶

Symptom: Dependency security audit found CVEs

Fix:

# Check which dependencies are vulnerable
uv run pip-audit

# Update dependencies
uv lock --upgrade

# Test that updates don't break functionality
uv run pytest tests/

# Commit updated lock file
git add uv.lock
git commit -m "chore: Update dependencies (security fix)"

CI Failure: "Docker build failed"¶

Symptom: Docker container build fails

Fix:

# Build locally to see full error
docker-compose build dev

# Common issues:
# - Missing files referenced in COPY
# - Base image not found
# - Dependency installation fails

# Check Dockerfile syntax
docker build -f Dockerfile.dev -t test-build .

# Fix issues and rebuild
docker-compose build dev

CI Timeout: "Job exceeded time limit"¶

Symptom: CI job times out (>10 min for quality, >15 min for integration)

Possible causes: - ESM model download very slow - Large dataset causing long test runtime - Infinite loop in code

Fix:

# Check if ESM model is cached properly
# (CI should cache ~/.cache/huggingface)

# Check test runtime locally
uv run pytest tests/ --durations=10

# Identify slow tests and optimize or mark as slow
# Mark slow tests:
# @pytest.mark.slow
# def test_expensive_operation():
#     ...

Monitoring & Maintenance¶

GitHub Actions Minutes¶

Free tier: 2000 minutes/month

Current usage estimate: - CI per PR: ~20 min × 20 PR/month = 400 min - Weekly benchmarks: 120 min × 4 = 480 min - Weekly deps: 3 min × 7 = 21 min - Total: ~900 min/month ✅ 45% of limit

Monitor usage: 1. Go to: Settings → Billing and plans 2. Check Actions usage 3. Alert if approaching 80% of limit

Weekly Maintenance¶

Every Monday: - [ ] Review dependency update PRs - [ ] Check security audit results

Every Sunday: - [ ] Review E2E benchmark results - [ ] Verify Novo parity still holds

Monthly Maintenance¶

Review GitHub Actions minutes usage
Check for workflow failures
Rotate secrets if needed

Advanced Topics¶

Adding New Quality Gate¶

Example: Add pytest-xdist for parallel tests

Add to workflow:

- name: Run tests in parallel
  run: uv run pytest tests/unit/ -n auto

Update requirements:
```
uv add --dev pytest-xdist
```
Test locally:
```
uv run pytest tests/unit/ -n auto
```
Commit workflow change

Modifying Coverage Threshold¶

Current: ≥70%

To increase to 75%:

Edit .github/workflows/ci.yml line 115:
```
uv run coverage report --fail-under=75
```

Ensure current coverage meets new threshold:

uv run pytest tests/unit tests/integration --cov=src/antibody_training_esm --cov-report=term
# Must show ≥75%

Commit change

Running Benchmarks Manually¶

# Via GitHub UI:
# Actions → E2E Benchmarking & Novo Parity → Run workflow
# Select branch: main
# Run Harvey: false (for quick test)

# Via gh CLI:
gh workflow run benchmark.yml -f run_harvey=false

Best Practices¶

Before Opening PR¶

✅ Run make all locally
✅ Verify all tests pass
✅ Check coverage didn't drop
✅ Run bandit security scan
✅ Commit formatting/lint fixes

During PR Review¶

✅ Wait for all CI checks to pass
✅ Address reviewer comments
✅ Resolve all conversations
✅ Rebase if main has new commits

After PR Merge¶

✅ Delete branch
✅ Monitor main branch CI
✅ Check weekly benchmarks still pass

Last Updated: 2025-11-28 Branch: main

CI/CD¶

When to Use This Guide¶

Related Documentation¶

Workflow Overview¶

Active Workflows¶

Trigger Summary¶

Quality Gate Workflow (ci.yml)¶

Overview¶

Job 1: Code Quality¶

Job 2: Unit Tests¶

Job 3: Integration Tests¶

Job 4: Security¶

Job 5: CI Success Summary¶

Docker CI Workflow (docker-ci.yml)¶

Overview¶

What it checks¶

GHCR Publishing¶

If it fails¶

Security Workflows¶

CodeQL (codeql.yml)¶

Dependencies (dependencies.yml)¶

Benchmark Workflow (benchmark.yml)¶

Overview¶

What it validates¶

Why weekly, not every PR?¶

If parity fails¶

Branch Protection¶

Protected Branches¶

Required Checks¶

Merge Requirements¶

Setting Up Branch Protection¶

Verifying Branch Protection¶

Local Testing¶

Run Full CI Suite Locally¶

Run Specific Test Suites¶

Verify Before Pushing¶

Troubleshooting¶

CI Failure: "Ruff format check failed"¶

CI Failure: "Mypy type checking failed"¶

CI Failure: "Coverage below threshold"¶

CI Failure: "Bandit security scan failed"¶

CI Failure: "Unit tests failed"¶

CI Failure: "Integration tests failed"¶

CI Failure: "pip-audit found vulnerabilities"¶

CI Failure: "Docker build failed"¶

CI Timeout: "Job exceeded time limit"¶

Monitoring & Maintenance¶

GitHub Actions Minutes¶

Weekly Maintenance¶

Monthly Maintenance¶

Advanced Topics¶

Adding New Quality Gate¶

Modifying Coverage Threshold¶

Running Benchmarks Manually¶

Best Practices¶

Before Opening PR¶

During PR Review¶

After PR Merge¶