Docker¶
Target Audience: Developers using Docker for development and deployment
Purpose: Run the pipeline in reproducible, portable containers for local development, testing, and deployment
When to Use This Guide¶
Use this guide if you're: - ✅ Setting up development environment (first-time setup with Docker) - ✅ Running tests in containers (isolated environment) - ✅ Deploying to production (HuggingFace Spaces, cloud platforms) - ✅ Ensuring reproducibility (lock down Python version, dependencies, models) - ✅ Troubleshooting Docker issues (build failures, performance)
Related Documentation¶
- Workflow: Development Workflow - Non-Docker development commands
- Architecture: Architecture - System design
- Security: Security Guide - Security best practices
Why Docker?¶
Research Reproducibility¶
Problem: Scientific results must be reproducible years later on different hardware/OS.
Solution: Docker locks down:
- Python version (3.12 matches pyproject.toml)
- All dependencies with exact versions (via uv.lock)
- System libraries (transformers, CUDA drivers if needed)
- Model weights (ESM-1v checkpoint)
Collaboration¶
Problem: New contributors spend hours debugging environment setup.
Solution: docker-compose up gets them running in <5 minutes.
Deployment Ready¶
Problem: Need to deploy to platforms like HuggingFace Spaces or cloud services.
Solution: Pre-built Docker image deploys directly without modification.
Clean Environment Validation¶
Problem: Need to prove package works without local hacks (sys.path, editable installs).
Solution: Docker builds from scratch, installs package via uv sync, validates tests pass.
Container Types¶
Development Container¶
Image: antibody-training-dev:latest
Purpose: Local development, testing, debugging
Features:
- Installs package in editable mode (uv sync)
- Mounts local source code as volume (hot reload)
- No model weights cached (downloads on first run)
- Smaller image size (~1.5GB)
Use cases: - Running tests locally - Interactive development - Training on small datasets
Production Container¶
Image: antibody-training-prod:latest
Purpose: Deployment, published results, long-term archival
Features:
- Installs package from built wheel (non-editable)
- Bakes in ESM model weights (~650MB)
- Frozen dependency versions from uv.lock
- Larger image size (~3-4GB)
Use cases: - Reproducing paper results - Deploying to HuggingFace Spaces - CI/CD for releases - Long-term archival
Quick Start (Development)¶
Prerequisites¶
Install Docker Desktop:
- macOS/Windows: Download from https://www.docker.com/products/docker-desktop
- Linux: Install via package manager (apt install docker.io docker-compose)
Verify installation:
Build Development Container¶
This will:
1. Download python:3.12-slim base image (~50MB)
2. Install uv package manager
3. Install all dependencies via uv sync (~200MB)
4. Copy source code
5. Run test suite (validates build)
6. Cache everything for fast rebuilds
Run Tests¶
# Run full test suite
docker-compose run dev pytest tests/
# Run only unit tests (faster)
docker-compose run dev pytest tests/unit/
# Run with coverage
docker-compose run dev pytest tests/ --cov=src/antibody_training_esm --cov-report=term
Interactive Development¶
# Drop into bash shell
docker-compose run dev bash
# Inside container, you have access to:
pytest tests/ # Run tests
antibody-train --help # Training CLI
antibody-test --help # Testing CLI
antibody-preprocess --help # Preprocessing CLI
Hot Reload Development¶
The container mounts ./src and ./tests as volumes, so code changes on your host machine are immediately reflected in the container:
# Terminal 1: Run container
docker-compose run dev bash
# Terminal 2: Edit code on host machine
# Changes are instantly available in container
Common Workflows¶
Train Model¶
# Use default Hydra config
docker-compose run dev antibody-train
# OR override parameters (e.g., increase batch size from default 8)
docker-compose run dev antibody-train \
hardware.device=cpu training.batch_size=16
Test Trained Model¶
docker-compose run dev antibody-test \
--model experiments/checkpoints/esm1v/logreg/boughter_vh_esm1v_logreg.pkl \
--data data/test/jain/fragments/VH_only_jain.csv
Run Preprocessing¶
Run Code Quality Checks¶
# Ruff linting
docker-compose run dev ruff check src/ tests/
# Ruff formatting
docker-compose run dev ruff format src/ tests/
# Mypy type checking
docker-compose run dev mypy src/
Production Deployment¶
Build Production Container¶
Create Dockerfile.prod:
FROM python:3.12-slim
# Install system dependencies (if needed)
RUN apt-get update && apt-get install -y \
build-essential \
&& rm -rf /var/lib/apt/lists/*
# Install uv
RUN pip install uv
WORKDIR /app
# Copy locked dependencies
COPY pyproject.toml uv.lock ./
# Install exact versions (frozen)
RUN uv sync --frozen
# Export .venv/bin to PATH
ENV PATH="/app/.venv/bin:$PATH"
# Copy source code
COPY src/ ./src/
# Pre-download ESM model weights (~650MB)
ENV HF_HOME=/app/.cache/huggingface
RUN python -c "from transformers import AutoModel, AutoTokenizer; \
AutoModel.from_pretrained('facebook/esm1v_t33_650M_UR90S_1'); \
AutoTokenizer.from_pretrained('facebook/esm1v_t33_650M_UR90S_1')"
# Set entrypoint
ENTRYPOINT ["antibody-train"]
CMD ["--help"]
Build:
Run Production Container¶
# Run training pipeline (uses default Hydra config)
docker run -v $(pwd)/data:/app/data antibody-training-prod:1.0
# OR with parameter overrides
docker run -v $(pwd)/data:/app/data antibody-training-prod:1.0 \
hardware.device=cpu
# Test model
docker run -v $(pwd)/data:/app/data antibody-training-prod:1.0 \
antibody-test --model /app/data/boughter_vh_esm1v_logreg.pkl \
--data /app/data/test/jain/fragments/VH_only_jain.csv
Deploy to HuggingFace Spaces¶
# Tag for HuggingFace Container Registry
docker tag antibody-training-prod:1.0 \
registry.huggingface.co/USERNAME/antibody-training:latest
# Push to HuggingFace
docker push registry.huggingface.co/USERNAME/antibody-training:latest
CI/CD Integration¶
GitHub Actions Example¶
.github/workflows/docker-test.yml:
name: Docker Test Suite
on: [push, pull_request]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Build dev container
run: docker-compose build dev
- name: Run tests
run: docker-compose run dev pytest tests/ --cov=src/antibody_training_esm --cov-report=xml
- name: Upload coverage
uses: codecov/codecov-action@v3
with:
file: ./coverage.xml
Push to GitHub Container Registry¶
# Tag for GHCR
docker tag antibody-training-prod:1.0 ghcr.io/USERNAME/antibody-training:1.0
docker tag antibody-training-prod:1.0 ghcr.io/USERNAME/antibody-training:latest
# Push to GHCR
docker push ghcr.io/USERNAME/antibody-training:1.0
docker push ghcr.io/USERNAME/antibody-training:latest
Troubleshooting¶
Build Fails: "Cannot connect to Docker daemon"¶
Problem: Docker Desktop isn't running.
Solution: Start Docker Desktop app and wait for it to fully start (green icon).
Build Fails: "uv sync" errors¶
Problem: Dependency resolution issues or corrupted uv.lock.
Solution:
# On host machine, regenerate lock file
uv lock --upgrade
# Rebuild container
docker-compose build dev --no-cache
Tests Fail During Build¶
Problem: Code changes broke tests.
Solution:
# Run tests locally first
uv run pytest tests/
# Fix failing tests, then rebuild
docker-compose build dev
Container Runs Out of Space¶
Problem: Docker images and volumes fill up disk.
Solution:
# Remove old images
docker image prune -a
# Remove all stopped containers
docker container prune
# Remove unused volumes
docker volume prune
"Module not found" errors in container¶
Problem: PATH doesn't include .venv/bin.
Solution: This should be automatic via ENV PATH="/app/.venv/bin:$PATH" in Dockerfile. If not working:
# Inside container, manually check PATH
echo $PATH
# Should include: /app/.venv/bin
# If missing, export manually
export PATH="/app/.venv/bin:$PATH"
Slow first-time model download¶
Problem: ESM model weights (~650MB) download on first inference.
Solution: Use production container with pre-cached weights (see "Build Production Container" above).
Best Practices¶
1. Cache Model Weights for Production¶
Avoid re-downloading ESM model (~650MB) on every container:
# In Dockerfile.prod
ENV HF_HOME=/app/.cache/huggingface
RUN python -c "from transformers import AutoModel, AutoTokenizer; \
AutoModel.from_pretrained('facebook/esm1v_t33_650M_UR90S_1'); \
AutoTokenizer.from_pretrained('facebook/esm1v_t33_650M_UR90S_1')"
Or use named volume:
2. Use BuildKit for Faster Builds¶
3. Multi-Stage Builds for Smaller Images¶
# Build stage
FROM python:3.12-slim as builder
RUN pip install uv
WORKDIR /app
COPY pyproject.toml uv.lock ./
RUN uv sync --frozen
# Runtime stage
FROM python:3.12-slim
COPY --from=builder /app/.venv /app/.venv
ENV PATH="/app/.venv/bin:$PATH"
COPY src/ /app/src/
4. Pin Base Image Versions¶
5. Regular Security Scans¶
# Scan image for vulnerabilities
docker scan antibody-training-prod:1.0
# Or use GitHub Dependabot for automated scans
6. Never Bake Secrets into Images¶
# ❌ WRONG: Secret in Dockerfile
ENV HF_TOKEN=hf_xxxxxxxxxxxxx
# ✅ RIGHT: Secret via environment variable
docker run -e HF_TOKEN=$HF_TOKEN antibody-training-prod:1.0
Container Architecture¶
What's in the Container?¶
/app/
├── .venv/ # Virtual environment (from uv sync)
│ ├── bin/ # Installed commands (pytest, antibody-train, etc.)
│ └── lib/python3.12/ # Installed packages
├── src/ # Source code (mounted from host in dev)
│ └── antibody_training_esm/
├── tests/ # Tests (mounted from host in dev)
├── data/ # Data directory (mounted from host)
├── pyproject.toml # Project metadata
└── uv.lock # Locked dependencies
Environment Variables¶
| Variable | Value | Purpose |
|---|---|---|
PATH |
/app/.venv/bin:$PATH |
Makes installed commands available |
PYTHONUNBUFFERED |
1 |
Force Python to print output immediately |
HF_HOME |
/app/.cache/huggingface |
Cache HuggingFace model downloads |
Volume Mounts (Development)¶
| Host Path | Container Path | Purpose |
|---|---|---|
./src |
/app/src |
Hot reload source code |
./tests |
/app/tests |
Hot reload tests |
./data |
/app/data |
Persist trained models |
Performance Tips¶
Build Performance¶
Optimize layer caching:
# Copy dependency files first (changes less frequently)
COPY pyproject.toml uv.lock ./
RUN uv sync
# Copy source code last (changes more frequently)
COPY src/ ./src/
Runtime Performance¶
Use named volumes for model cache:
Limit memory/CPU if needed:
Cleanup¶
Stop All Containers¶
Remove Development Image¶
Full Cleanup (Nuclear Option)¶
# Remove ALL Docker images, containers, volumes
docker system prune -a --volumes
# WARNING: This will delete EVERYTHING Docker-related on your machine!
FAQ¶
Q: Can I use this for production deployments?¶
A: Yes, use Dockerfile.prod for production deployments with:
- Frozen dependencies (uv sync --frozen)
- Pre-cached ESM model weights
- Non-editable package install
Q: Why does the first build take so long?¶
A: The first build: 1. Downloads Python 3.12 base image (~50MB) 2. Installs uv (~10MB) 3. Installs all dependencies (~200MB, including PyTorch) 4. Runs full test suite (validates build)
Subsequent builds are MUCH faster due to Docker layer caching.
Q: Can I use this on Windows?¶
A: Yes, install Docker Desktop for Windows from https://www.docker.com/products/docker-desktop
Q: Do I need to install Python on my host machine?¶
A: No! Docker provides a completely isolated Python 3.12 environment. You only need Docker Desktop.
Q: How do I update dependencies?¶
A: Update uv.lock on host, then rebuild:
Q: Can I use GPU acceleration in containers?¶
A: Yes, with NVIDIA Docker runtime:
Last Updated: 2025-11-28
Branch: main