Skip to content

Ralph Wiggum Loop Protocol

Created: 2026-01-18 Author: Ray + Claude Status: Ready for Use


What is the Ralph Wiggum Technique?

The Ralph Wiggum technique (created by Geoffrey Huntley) is an iterative AI development methodology where the same prompt is run repeatedly until objective completion criteria are met. The "self-referential" part is that each iteration sees its previous work in files and git history, not that model output is fed back as input.

Core insight: Fresh context each iteration. No garbage accumulation.

while :; do cat PROMPT.md | claude ; done

Why It Works

  • Same prompt, repeated = Iteration beats one-shot perfection
  • State tracked in files = Progress persists across iterations
  • Atomic commits = Easy to audit, revert, or cherry-pick
  • Sandboxed branch = Safe experimentation
  • Fresh context = No context pollution from failed attempts

Key Quote

"Deterministically bad in an undeterministic world" - failures are predictable, enabling systematic improvement through prompt tuning.


Prerequisites

Tools Required

# Claude Code CLI
npm install -g @anthropic-ai/claude-code

# tmux (for persistent sessions)
brew install tmux  # macOS
apt install tmux   # Linux

# uv (Python package manager)
curl -LsSf https://astral.sh/uv/install.sh | sh

Project Requirements

  1. State file (PROGRESS.md) - Tracks what's done/pending
  2. Prompt file (PROMPT.md) - Instructions for each iteration
  3. Debt/Bug docs (docs/_debt/, docs/_bugs/) - SSOT for sprint work
  4. Specs (docs/_specs/) only when approved for new feature work
  5. Git repo - For atomic commits and history

Setup Protocol

Step 1: Create Branch Structure

CRITICAL: Always sandbox Ralph work in a dedicated branch.

# Start from dev
git checkout dev
git pull --ff-only origin dev  # Human pre-flight on dev (not on the Ralph branch)

# Create Ralph branch
git checkout -b ralph-wiggum-<sprint>

# Push to remote for backup
git push -u origin ralph-wiggum-<sprint>

Branch hierarchy:

main (protected, production)
  └── dev (integration, manual merges)
        └── ralph-wiggum-<sprint> (autonomous work)

Step 2: File Structure

erdos-banger/
├── PROMPT.md                    # Loop prompt (root)
├── PROGRESS.md                  # State tracker (root)
├── logs/ralph/                  # Per-iteration logs (gitignored; safe to delete between runs)
├── docs/
│   ├── _ralphwiggum/
│   │   └── protocol.md          # This file
│   ├── _debt/                   # Active debt decks (SSOT for debt sprints)
│   ├── _bugs/                   # Active bug decks (SSOT for bug sprints)
│   ├── _specs/                  # Active specs + master docs
│   └── _archive/specs/          # Archived specs (implemented SSOT)

Use the provided script (avoids tee-piping Claude output; auto-finalizes sprint docs on completion):

# Launch in tmux
tmux new-session -s erdos-ralph './scripts/ralph-loop.sh'

# Monitor in another terminal
tail -f logs/ralph/iteration_*.log
watch -n5 'git log --oneline -5'

# Detach without killing: Ctrl+B, then D
# Reattach: tmux attach -t erdos-ralph
# Kill session: tmux kill-session -t erdos-ralph

Environment variables (optional): - MAX=50 - Maximum iterations (default: 50) - ITER_TIMEOUT=1200 - Per-iteration timeout in seconds (default: 600; use a higher value for large tasks)

Step 4: Manual Loop (Alternative)

If you need more control, run manually:

cd /path/to/erdos-banger
git checkout ralph-wiggum-<sprint>

# macOS requires gtimeout from coreutils:
#   brew install coreutils

MAX=50
ITER_TIMEOUT=1200
TIMEOUT=14400

TIMEOUT_CMD="${TIMEOUT_CMD:-}"
if [[ -z "$TIMEOUT_CMD" ]]; then
  if command -v timeout >/dev/null 2>&1; then
    TIMEOUT_CMD="timeout"
  elif command -v gtimeout >/dev/null 2>&1; then
    TIMEOUT_CMD="gtimeout"
  else
    echo "Missing timeout command. Install coreutils (macOS) or use Linux `timeout`." >&2
    exit 1
  fi
fi

rm -rf logs/ralph 2>/dev/null || true
mkdir -p logs/ralph

"$TIMEOUT_CMD" "$TIMEOUT" bash -c '
  for i in $(seq 1 '"$MAX"'); do
    n=$(printf "%03d" "$i")
    log="logs/ralph/iteration_${n}.log"
    echo "=== Iteration $i/'"$MAX"' ===" | tee -a "$log"
    # Avoid piping Claude output through tee (EPIPE risk); append directly (timeouts can still trigger EPIPE).
    '"$TIMEOUT_CMD"' '"$ITER_TIMEOUT"' claude --dangerously-skip-permissions -p "$(cat PROMPT.md)" >> "$log" 2>&1
    # Check if all tasks complete (no unchecked boxes)
    if ! grep -q "^\- \[ \]" PROGRESS.md; then
      echo "All tasks complete!" | tee -a "$log"
      break
    fi
    sleep 2
  done
'

erdos-banger Specific Configuration

Quality Gates (MUST PASS)

uv run pre-commit run --all-files  # All hooks
uv run ruff check .                # Lint
uv run ruff format --check .       # Format check
uv run mypy src/ tests/            # Type check
uv run pytest -m "not requires_lean and not requires_network" --cov=erdos --cov-fail-under=80

Or use the Makefile:

make ci  # Runs format-check, lint, typecheck, cov

Final Validation (Run Before Declaring Sprint Complete)

make ci intentionally skips requires_network tests to keep CI deterministic.

Before declaring an overnight sprint “done” (i.e., when PROGRESS.md has no unchecked items), run the full suite at least once:

make test-all

If any tests fail: - file a bug deck in docs/_bugs/ - add a new unchecked item to PROGRESS.md - do not declare the sprint complete

Test Markers

  • requires_lean - Skip if Lean not installed
  • requires_network - Skip for offline tests
  • Default: -m "not requires_lean and not requires_network"

Optional Extras (MCP)

Some tests are skipped unless optional dependencies are installed. Example: - MCP tests will show as “SKIPPED: mcp not installed” unless you install the mcp extra.

To run MCP tests locally:

uv sync --extra mcp
uv run pytest tests/integration/test_mcp_server.py tests/unit/test_mcp_tools.py

Atomic Commit Format

git add -A && git commit -m "$(cat <<'EOF'
[SPEC-010] Implement: arXiv metadata client

- Added src/erdos/core/arxiv_client.py
- Unit tests in tests/unit/test_arxiv_client.py
- Quality gates passed
EOF
)"

Overnight Safety Guardrails (Non-negotiable)

File Modification Boundaries

Allowed to change: - src/erdos/** - tests/** - formal/lean/** (Lean project sources; never commit build artifacts) - scripts/** - docs/_specs/** (active specs only) - docs/_bugs/**, docs/_debt/** - docs/vendor-docs/** - docs/future/** (design notes; non-normative) - docs/INDEX.md, docs/_specs/README.md (indexes) - PROMPT.md, PROGRESS.md, docs/_ralphwiggum/**

Forbidden to change (treat as immutable SSOT unless a human explicitly authorizes it): - Existing files under docs/_archive/** (write-once). Allowed: add new files when archiving completed work; do not edit after archiving. - data/erdosproblems/** (git submodule)

Dependency manifests: - pyproject.toml and uv.lock may be edited only when a spec explicitly requires it (e.g., when a spec's "Dependencies" section lists a new runtime dependency), and must be followed by uv lock --check and uv sync --frozen.

Resource Limits

  • Max files changed per iteration: 10
  • Max net-new lines per iteration: ~500
  • Max iterations per overnight run: 50 (hard stop)
  • Max wall-clock runtime per run: 4 hours (recommended)
  • Max wall-clock runtime per iteration: 10 minutes (recommended)
  • Max “stuck” retries on the same failing gate: 3
  • Treat long runs as potentially expensive (LLM usage + CI cycles). Use MAX, TIMEOUT, and ITER_TIMEOUT as your primary budget caps.
  • Monitor output under logs/ralph/ and watch git log --oneline -10 to confirm forward progress.
  • If you see repeated failures or no commits for multiple iterations, abort and investigate rather than letting the loop burn time/budget.

No-Reward-Hack Rules

  • Never delete/disable tests to make CI green.
  • Never "mock the unit under test"; mock only boundaries (network/subprocess/time).
  • Never lower coverage thresholds, relax lint rules, or bypass mypy strictness.
  • Never write documentation ABOUT work as a substitute for DOING work. If a task is too big, break it into subtasks in PROGRESS.md AND start the first subtask in the same iteration. Escalation documents (debt docs) are only valid if you've genuinely hit a blocker (missing deps, spec contradiction, repeated gate failures) - not because "it's big."
  • "Too big" is not a blocker. Break it down and keep working. The only valid blockers are: missing dependencies, spec contradictions, and quality gate failures after 3 attempts.

Push Strategy (Remote Backup)

After each atomic commit:

git push

If push fails with "branches have diverged":

git push --force-with-lease

This is safe because the Ralph branch is autonomous - you are the only committer. On the Ralph branch during autonomous execution, NEVER use git rebase or git pull - these cause merge conflicts that derail the loop.

If pushing is blocked (auth/CI outage), stop the loop and request human intervention.


TDD Enforcement (Ironclad)

The TDD Cycle

  1. RED: Write a failing test first
  2. GREEN: Write minimal code to pass
  3. REFACTOR: Clean up, keep tests green

TDD Rules (Non-negotiable)

  1. No production code without a failing test first
  2. One test at a time - Don't batch tests
  3. Minimal implementation - Only enough to pass the current test
  4. Refactor only when green - Never refactor with failing tests
  5. Tests document behavior - Tests are the specification

Spec → Test → Code Flow

docs/_debt/debt-XXX-*.md               # Read the debt deck (or a spec doc)
tests/unit/test_arxiv_client.py        # Write failing test
src/erdos/core/arxiv_client.py         # Implement to pass
make ci                                 # Verify all gates pass
git commit                              # Atomic commit

Critical Review Prompt

Before changing code/docs based on feedback (human reviews, CodeRabbit, another model, your own prior output), apply this block and validate claims against SSOT:

Review the claim or feedback (it may be from an internal or external agent).
Validate every claim from first principles. If—and only if—it's true and
helpful, update the system to align with the SSOT, implemented cleanly and
completely (Rob C. Martin discipline). Find and fix all half-measures,
reward hacks, and partial fixes if they exist. Be critically adversarial
with good intentions for constructive criticism. Ship the exact end-to-end
implementation we need.

SPEC-* Self-Review Protocol

Before picking up a new task, check if the MOST RECENTLY completed item is a SPEC-* WITHOUT a [REVIEWED] marker.

If yes, THIS ITERATION IS A REVIEW ITERATION:

  1. Read the spec doc and its acceptance criteria
  2. Verify the implementation matches ALL acceptance criteria
  3. Apply the Critical Review Prompt to your own prior work
  4. Check for half-measures:
  5. Are there TODO comments that should be resolved?
  6. Do tests cover all acceptance criteria?
  7. Does SSOT (code) match the spec?
  8. If issues found:
  9. Create a fixup task in PROGRESS.md
  10. Mark the original as [NEEDS-FIX] instead of [REVIEWED]
  11. Commit and exit
  12. If verified clean:
  13. Add [REVIEWED] marker to the spec line
  14. Commit and exit (do NOT start next task this iteration)

Stop Conditions

The loop stops when:

  1. All tasks complete - No unchecked [ ] in PROGRESS.md
  2. Iteration limit reached - MAX=50 by default
  3. Manual intervention - Ctrl+C
  4. Escalation trigger - A stop condition from PROMPT.md is hit (ambiguous spec, missing deps, repeated gate failures, etc.)

IMPORTANT: State-based verification prevents reward hacking.

# Good: Check actual state
if ! grep -q "^\- \[ \]" PROGRESS.md; then
  echo "All tasks complete"
  break
fi

# Bad: Parse model output for magic phrases (don't do this)
# if output contains "PROJECT COMPLETE"; then break

Monitoring

Watch Progress

# In another terminal/tmux pane
watch -n 5 'head -50 PROGRESS.md'

# Or check git activity
watch -n 5 'git log --oneline -10'

Check Loop Status

# See recent commits
git log --oneline -20

# See what changed
git diff HEAD~1

# Quick verification
make smoke

Post-Loop Audit

Review All Changes

# See all commits from Ralph
git log dev..ralph-wiggum-<sprint> --oneline

# See full diff
git diff dev..ralph-wiggum-<sprint> --stat

# Review specific commit
git show <commit-hash>

Final Verification

# All quality gates
make ci

# Smoke test
make smoke

# Full test suite (if Lean available)
make test-all

Merge if Good

# Option 1: Direct merge
git checkout dev
git merge ralph-wiggum-<sprint>

# Option 2: PR for review
gh pr create --base dev --head ralph-wiggum-<sprint> \
  --title "Ralph Wiggum v1.1: Implement specs 010-017" \
  --body "Automated implementation via Ralph Wiggum loop"

Revert if Bad

# Nuclear option - delete branch entirely
git checkout dev
git branch -D ralph-wiggum-<sprint>

# Or revert specific commits
git revert <bad-commit-hash>

Best Practices

Research Notes (2025–2026)

These sources inform the guardrails and the “why” behind the protocol:

  • Ralph Wiggum technique: repeated identical prompts, state in files, deterministic iteration loop. https://ghuntley.com/ralph/
  • Field report: running agents in loops overnight; emphasizes frequent commits/pushes and keeping scope bounded. https://github.com/repomirrorhq/repomirror/blob/main/repomirror.md
  • Aider edit formats: search/replace (“diff”) blocks reduce brittle line-number failures compared to unified diffs. https://aider.chat/docs/more/edit-formats.html
  • Aider benchmarks: deterministic harnesses; validate success by passing real unit tests; limit the amount of error output fed back to reduce noise. https://aider.chat/docs/benchmarks.html
  • SWE-agent: interface design (agent-computer interface) strongly impacts autonomous SE performance; a structured ACI improves navigation/edit/test execution. https://arxiv.org/abs/2405.15793
  • Context “rot”: long contexts degrade reliability; keep prompts small and use byte/line caps. https://research.trychroma.com/context-rot
  • Reward hacking: feedback loops need explicit constraints and invariants to prevent gaming metrics. https://arxiv.org/html/2402.06627v3

DO

  • Always sandbox in dedicated branch
  • Use detailed specs for each task
  • Require atomic commits
  • Set clear completion criteria
  • Set iteration limits
  • Monitor periodically
  • Audit before merging
  • Follow TDD strictly
  • Keep prompts and error logs small (avoid context rot): prefer summaries and capped error output
  • Prefer “state-based” completion checks (PROGRESS.md), not “model said done”
  • Push after each commit (remote backup)

DON'T

  • Run on main/dev branch directly
  • Skip the state file
  • Allow multi-task iterations
  • Trust without auditing
  • Use vague task descriptions
  • Run without iteration limits
  • Skip the review iteration for SPEC-* tasks
  • Let the agent mutate docs/_archive/ or data/erdosproblems/ as “fixes”
  • Let the agent change quality gates or coverage thresholds to make CI green

Troubleshooting

Loop Stops Unexpectedly

# Check if Claude is running
ps aux | grep claude

# Check tmux session
tmux list-sessions

# Restart loop in tmux
tmux attach -t erdos-ralph

Sprint Complete but Repo Dirty

If the loop stops at the end of a sprint with a “repo dirty” message:

  1. Inspect the most recent logs/ralph/iteration_*.log (and logs/ralph/recovery.log if present).
  2. If only docs/** and/or PROGRESS.md changed, finalize with a docs-only commit:
git status --porcelain=v1
git add PROGRESS.md docs
git commit -m "docs: finalize sprint state"
git push
  1. If production code/tests are dirty, stop and recover manually (don’t guess).

Quality Gates Failing

  • Loop should auto-fix on next iteration
  • If persistent, check the spec for clarity
  • May need manual intervention

Context Drift

Fresh context each iteration prevents this. If issues arise: 1. Check PROGRESS.md is being read first 2. Ensure specs are clear and complete 3. Reset to last known good commit if needed


References


Quick Start Checklist

For a step-by-step pre-flight, see docs/_ralphwiggum/LAUNCH_CHECKLIST.md.

# 1. Create sandbox branch
git checkout dev && git pull --ff-only origin dev
git checkout -b ralph-wiggum-<sprint>

# 2. Ensure PROGRESS.md and PROMPT.md exist in root
ls PROGRESS.md PROMPT.md

# 3. Ensure spec docs exist
# Debt sprint:
ls docs/_debt/README.md docs/_debt/debt-0*.md
#
# Spec work (when approved):
ls docs/_specs/*.md docs/_archive/specs/spec-0*.md

# 4. Start tmux
tmux new -s erdos-ralph

# 5. Run the loop (bounded by timeouts; logs captured)
MAX=50
TIMEOUT=14400
ITER_TIMEOUT=1200

TIMEOUT_CMD="${TIMEOUT_CMD:-}"
if [[ -z "$TIMEOUT_CMD" ]]; then
  if command -v timeout >/dev/null 2>&1; then
    TIMEOUT_CMD="timeout"
  elif command -v gtimeout >/dev/null 2>&1; then
    TIMEOUT_CMD="gtimeout"
  else
    echo "Missing timeout command. Install coreutils (macOS) or use Linux `timeout`." >&2
    exit 1
  fi
fi

rm -rf logs/ralph 2>/dev/null || true
mkdir -p logs/ralph

"$TIMEOUT_CMD" "$TIMEOUT" bash -c '
  for i in $(seq 1 '"$MAX"'); do
    n=$(printf "%03d" "$i")
    log="logs/ralph/iteration_${n}.log"
    echo "=== Iteration $i/'"$MAX"' ===" | tee -a "$log"
    # Avoid piping Claude output through tee (EPIPE risk); append directly (timeouts can still trigger EPIPE).
    '"$TIMEOUT_CMD"' '"$ITER_TIMEOUT"' claude --dangerously-skip-permissions -p "$(cat PROMPT.md)" >> "$log" 2>&1
    if ! grep -q "^\- \[ \]" PROGRESS.md; then
      echo "All tasks complete" | tee -a "$log"
      break
    fi
    sleep 2
  done
'

# 6. Monitor in another pane (optional)
watch -n 5 'git log --oneline -10'

# 7. Audit when done
git log dev..ralph-wiggum-<sprint> --oneline
git diff dev..ralph-wiggum-<sprint> --stat
make ci

# 8. Merge if good
git checkout dev && git merge ralph-wiggum-<sprint>