Ralph Wiggum Loop Protocol¶
Created: 2026-01-18 Author: Ray + Claude Status: Ready for Use
What is the Ralph Wiggum Technique?¶
The Ralph Wiggum technique (created by Geoffrey Huntley) is an iterative AI development methodology where the same prompt is run repeatedly until objective completion criteria are met. The "self-referential" part is that each iteration sees its previous work in files and git history, not that model output is fed back as input.
Core insight: Fresh context each iteration. No garbage accumulation.
Why It Works¶
- Same prompt, repeated = Iteration beats one-shot perfection
- State tracked in files = Progress persists across iterations
- Atomic commits = Easy to audit, revert, or cherry-pick
- Sandboxed branch = Safe experimentation
- Fresh context = No context pollution from failed attempts
Key Quote¶
"Deterministically bad in an undeterministic world" - failures are predictable, enabling systematic improvement through prompt tuning.
Prerequisites¶
Tools Required¶
# Claude Code CLI
npm install -g @anthropic-ai/claude-code
# tmux (for persistent sessions)
brew install tmux # macOS
apt install tmux # Linux
# uv (Python package manager)
curl -LsSf https://astral.sh/uv/install.sh | sh
Project Requirements¶
- State file (
PROGRESS.md) - Tracks what's done/pending - Prompt file (
PROMPT.md) - Instructions for each iteration - Debt/Bug docs (
docs/_debt/,docs/_bugs/) - SSOT for sprint work - Specs (
docs/_specs/) only when approved for new feature work - Git repo - For atomic commits and history
Setup Protocol¶
Step 1: Create Branch Structure¶
CRITICAL: Always sandbox Ralph work in a dedicated branch.
# Start from dev
git checkout dev
git pull --ff-only origin dev # Human pre-flight on dev (not on the Ralph branch)
# Create Ralph branch
git checkout -b ralph-wiggum-<sprint>
# Push to remote for backup
git push -u origin ralph-wiggum-<sprint>
Branch hierarchy:
main (protected, production)
└── dev (integration, manual merges)
└── ralph-wiggum-<sprint> (autonomous work)
Step 2: File Structure¶
erdos-banger/
├── PROMPT.md # Loop prompt (root)
├── PROGRESS.md # State tracker (root)
├── logs/ralph/ # Per-iteration logs (gitignored; safe to delete between runs)
├── docs/
│ ├── _ralphwiggum/
│ │ └── protocol.md # This file
│ ├── _debt/ # Active debt decks (SSOT for debt sprints)
│ ├── _bugs/ # Active bug decks (SSOT for bug sprints)
│ ├── _specs/ # Active specs + master docs
│ └── _archive/specs/ # Archived specs (implemented SSOT)
Step 3: Run the Loop (Recommended)¶
Use the provided script (avoids tee-piping Claude output; auto-finalizes sprint docs on completion):
# Launch in tmux
tmux new-session -s erdos-ralph './scripts/ralph-loop.sh'
# Monitor in another terminal
tail -f logs/ralph/iteration_*.log
watch -n5 'git log --oneline -5'
# Detach without killing: Ctrl+B, then D
# Reattach: tmux attach -t erdos-ralph
# Kill session: tmux kill-session -t erdos-ralph
Environment variables (optional):
- MAX=50 - Maximum iterations (default: 50)
- ITER_TIMEOUT=1200 - Per-iteration timeout in seconds (default: 600; use a higher value for large tasks)
Step 4: Manual Loop (Alternative)¶
If you need more control, run manually:
cd /path/to/erdos-banger
git checkout ralph-wiggum-<sprint>
# macOS requires gtimeout from coreutils:
# brew install coreutils
MAX=50
ITER_TIMEOUT=1200
TIMEOUT=14400
TIMEOUT_CMD="${TIMEOUT_CMD:-}"
if [[ -z "$TIMEOUT_CMD" ]]; then
if command -v timeout >/dev/null 2>&1; then
TIMEOUT_CMD="timeout"
elif command -v gtimeout >/dev/null 2>&1; then
TIMEOUT_CMD="gtimeout"
else
echo "Missing timeout command. Install coreutils (macOS) or use Linux `timeout`." >&2
exit 1
fi
fi
rm -rf logs/ralph 2>/dev/null || true
mkdir -p logs/ralph
"$TIMEOUT_CMD" "$TIMEOUT" bash -c '
for i in $(seq 1 '"$MAX"'); do
n=$(printf "%03d" "$i")
log="logs/ralph/iteration_${n}.log"
echo "=== Iteration $i/'"$MAX"' ===" | tee -a "$log"
# Avoid piping Claude output through tee (EPIPE risk); append directly (timeouts can still trigger EPIPE).
'"$TIMEOUT_CMD"' '"$ITER_TIMEOUT"' claude --dangerously-skip-permissions -p "$(cat PROMPT.md)" >> "$log" 2>&1
# Check if all tasks complete (no unchecked boxes)
if ! grep -q "^\- \[ \]" PROGRESS.md; then
echo "All tasks complete!" | tee -a "$log"
break
fi
sleep 2
done
'
erdos-banger Specific Configuration¶
Quality Gates (MUST PASS)¶
uv run pre-commit run --all-files # All hooks
uv run ruff check . # Lint
uv run ruff format --check . # Format check
uv run mypy src/ tests/ # Type check
uv run pytest -m "not requires_lean and not requires_network" --cov=erdos --cov-fail-under=80
Or use the Makefile:
Final Validation (Run Before Declaring Sprint Complete)¶
make ci intentionally skips requires_network tests to keep CI deterministic.
Before declaring an overnight sprint “done” (i.e., when PROGRESS.md has no unchecked items), run the full suite at least once:
If any tests fail:
- file a bug deck in docs/_bugs/
- add a new unchecked item to PROGRESS.md
- do not declare the sprint complete
Test Markers¶
requires_lean- Skip if Lean not installedrequires_network- Skip for offline tests- Default:
-m "not requires_lean and not requires_network"
Optional Extras (MCP)¶
Some tests are skipped unless optional dependencies are installed. Example:
- MCP tests will show as “SKIPPED: mcp not installed” unless you install the mcp extra.
To run MCP tests locally:
Atomic Commit Format¶
git add -A && git commit -m "$(cat <<'EOF'
[SPEC-010] Implement: arXiv metadata client
- Added src/erdos/core/arxiv_client.py
- Unit tests in tests/unit/test_arxiv_client.py
- Quality gates passed
EOF
)"
Overnight Safety Guardrails (Non-negotiable)¶
File Modification Boundaries¶
Allowed to change:
- src/erdos/**
- tests/**
- formal/lean/** (Lean project sources; never commit build artifacts)
- scripts/**
- docs/_specs/** (active specs only)
- docs/_bugs/**, docs/_debt/**
- docs/vendor-docs/**
- docs/future/** (design notes; non-normative)
- docs/INDEX.md, docs/_specs/README.md (indexes)
- PROMPT.md, PROGRESS.md, docs/_ralphwiggum/**
Forbidden to change (treat as immutable SSOT unless a human explicitly authorizes it):
- Existing files under docs/_archive/** (write-once). Allowed: add new files when archiving completed work; do not edit after archiving.
- data/erdosproblems/** (git submodule)
Dependency manifests:
- pyproject.toml and uv.lock may be edited only when a spec explicitly requires it (e.g., when a spec's "Dependencies" section lists a new runtime dependency), and must be followed by uv lock --check and uv sync --frozen.
Resource Limits¶
- Max files changed per iteration: 10
- Max net-new lines per iteration: ~500
- Max iterations per overnight run: 50 (hard stop)
- Max wall-clock runtime per run: 4 hours (recommended)
- Max wall-clock runtime per iteration: 10 minutes (recommended)
- Max “stuck” retries on the same failing gate: 3
Cost / Budget Awareness (Recommended)¶
- Treat long runs as potentially expensive (LLM usage + CI cycles). Use
MAX,TIMEOUT, andITER_TIMEOUTas your primary budget caps. - Monitor output under
logs/ralph/and watchgit log --oneline -10to confirm forward progress. - If you see repeated failures or no commits for multiple iterations, abort and investigate rather than letting the loop burn time/budget.
No-Reward-Hack Rules¶
- Never delete/disable tests to make CI green.
- Never "mock the unit under test"; mock only boundaries (network/subprocess/time).
- Never lower coverage thresholds, relax lint rules, or bypass mypy strictness.
- Never write documentation ABOUT work as a substitute for DOING work. If a task is too big, break it into subtasks in PROGRESS.md AND start the first subtask in the same iteration. Escalation documents (debt docs) are only valid if you've genuinely hit a blocker (missing deps, spec contradiction, repeated gate failures) - not because "it's big."
- "Too big" is not a blocker. Break it down and keep working. The only valid blockers are: missing dependencies, spec contradictions, and quality gate failures after 3 attempts.
Push Strategy (Remote Backup)¶
After each atomic commit:
If push fails with "branches have diverged":
This is safe because the Ralph branch is autonomous - you are the only committer. On the Ralph branch during autonomous execution, NEVER use git rebase or git pull - these cause merge conflicts that derail the loop.
If pushing is blocked (auth/CI outage), stop the loop and request human intervention.
TDD Enforcement (Ironclad)¶
The TDD Cycle¶
- RED: Write a failing test first
- GREEN: Write minimal code to pass
- REFACTOR: Clean up, keep tests green
TDD Rules (Non-negotiable)¶
- No production code without a failing test first
- One test at a time - Don't batch tests
- Minimal implementation - Only enough to pass the current test
- Refactor only when green - Never refactor with failing tests
- Tests document behavior - Tests are the specification
Spec → Test → Code Flow¶
docs/_debt/debt-XXX-*.md # Read the debt deck (or a spec doc)
↓
tests/unit/test_arxiv_client.py # Write failing test
↓
src/erdos/core/arxiv_client.py # Implement to pass
↓
make ci # Verify all gates pass
↓
git commit # Atomic commit
Critical Review Prompt¶
Before changing code/docs based on feedback (human reviews, CodeRabbit, another model, your own prior output), apply this block and validate claims against SSOT:
Review the claim or feedback (it may be from an internal or external agent).
Validate every claim from first principles. If—and only if—it's true and
helpful, update the system to align with the SSOT, implemented cleanly and
completely (Rob C. Martin discipline). Find and fix all half-measures,
reward hacks, and partial fixes if they exist. Be critically adversarial
with good intentions for constructive criticism. Ship the exact end-to-end
implementation we need.
SPEC-* Self-Review Protocol¶
Before picking up a new task, check if the MOST RECENTLY completed item is a SPEC-* WITHOUT a [REVIEWED] marker.
If yes, THIS ITERATION IS A REVIEW ITERATION:
- Read the spec doc and its acceptance criteria
- Verify the implementation matches ALL acceptance criteria
- Apply the Critical Review Prompt to your own prior work
- Check for half-measures:
- Are there TODO comments that should be resolved?
- Do tests cover all acceptance criteria?
- Does SSOT (code) match the spec?
- If issues found:
- Create a fixup task in PROGRESS.md
- Mark the original as
[NEEDS-FIX]instead of[REVIEWED] - Commit and exit
- If verified clean:
- Add
[REVIEWED]marker to the spec line - Commit and exit (do NOT start next task this iteration)
Stop Conditions¶
The loop stops when:
- All tasks complete - No unchecked
[ ]in PROGRESS.md - Iteration limit reached - MAX=50 by default
- Manual intervention - Ctrl+C
- Escalation trigger - A stop condition from PROMPT.md is hit (ambiguous spec, missing deps, repeated gate failures, etc.)
IMPORTANT: State-based verification prevents reward hacking.
# Good: Check actual state
if ! grep -q "^\- \[ \]" PROGRESS.md; then
echo "All tasks complete"
break
fi
# Bad: Parse model output for magic phrases (don't do this)
# if output contains "PROJECT COMPLETE"; then break
Monitoring¶
Watch Progress¶
# In another terminal/tmux pane
watch -n 5 'head -50 PROGRESS.md'
# Or check git activity
watch -n 5 'git log --oneline -10'
Check Loop Status¶
# See recent commits
git log --oneline -20
# See what changed
git diff HEAD~1
# Quick verification
make smoke
Post-Loop Audit¶
Review All Changes¶
# See all commits from Ralph
git log dev..ralph-wiggum-<sprint> --oneline
# See full diff
git diff dev..ralph-wiggum-<sprint> --stat
# Review specific commit
git show <commit-hash>
Final Verification¶
# All quality gates
make ci
# Smoke test
make smoke
# Full test suite (if Lean available)
make test-all
Merge if Good¶
# Option 1: Direct merge
git checkout dev
git merge ralph-wiggum-<sprint>
# Option 2: PR for review
gh pr create --base dev --head ralph-wiggum-<sprint> \
--title "Ralph Wiggum v1.1: Implement specs 010-017" \
--body "Automated implementation via Ralph Wiggum loop"
Revert if Bad¶
# Nuclear option - delete branch entirely
git checkout dev
git branch -D ralph-wiggum-<sprint>
# Or revert specific commits
git revert <bad-commit-hash>
Best Practices¶
Research Notes (2025–2026)¶
These sources inform the guardrails and the “why” behind the protocol:
- Ralph Wiggum technique: repeated identical prompts, state in files, deterministic iteration loop. https://ghuntley.com/ralph/
- Field report: running agents in loops overnight; emphasizes frequent commits/pushes and keeping scope bounded. https://github.com/repomirrorhq/repomirror/blob/main/repomirror.md
- Aider edit formats: search/replace (“diff”) blocks reduce brittle line-number failures compared to unified diffs. https://aider.chat/docs/more/edit-formats.html
- Aider benchmarks: deterministic harnesses; validate success by passing real unit tests; limit the amount of error output fed back to reduce noise. https://aider.chat/docs/benchmarks.html
- SWE-agent: interface design (agent-computer interface) strongly impacts autonomous SE performance; a structured ACI improves navigation/edit/test execution. https://arxiv.org/abs/2405.15793
- Context “rot”: long contexts degrade reliability; keep prompts small and use byte/line caps. https://research.trychroma.com/context-rot
- Reward hacking: feedback loops need explicit constraints and invariants to prevent gaming metrics. https://arxiv.org/html/2402.06627v3
DO¶
- Always sandbox in dedicated branch
- Use detailed specs for each task
- Require atomic commits
- Set clear completion criteria
- Set iteration limits
- Monitor periodically
- Audit before merging
- Follow TDD strictly
- Keep prompts and error logs small (avoid context rot): prefer summaries and capped error output
- Prefer “state-based” completion checks (PROGRESS.md), not “model said done”
- Push after each commit (remote backup)
DON'T¶
- Run on main/dev branch directly
- Skip the state file
- Allow multi-task iterations
- Trust without auditing
- Use vague task descriptions
- Run without iteration limits
- Skip the review iteration for SPEC-* tasks
- Let the agent mutate
docs/_archive/ordata/erdosproblems/as “fixes” - Let the agent change quality gates or coverage thresholds to make CI green
Troubleshooting¶
Loop Stops Unexpectedly¶
# Check if Claude is running
ps aux | grep claude
# Check tmux session
tmux list-sessions
# Restart loop in tmux
tmux attach -t erdos-ralph
Sprint Complete but Repo Dirty¶
If the loop stops at the end of a sprint with a “repo dirty” message:
- Inspect the most recent
logs/ralph/iteration_*.log(andlogs/ralph/recovery.logif present). - If only
docs/**and/orPROGRESS.mdchanged, finalize with a docs-only commit:
git status --porcelain=v1
git add PROGRESS.md docs
git commit -m "docs: finalize sprint state"
git push
- If production code/tests are dirty, stop and recover manually (don’t guess).
Quality Gates Failing¶
- Loop should auto-fix on next iteration
- If persistent, check the spec for clarity
- May need manual intervention
Context Drift¶
Fresh context each iteration prevents this. If issues arise: 1. Check PROGRESS.md is being read first 2. Ensure specs are clear and complete 3. Reset to last known good commit if needed
References¶
Quick Start Checklist¶
For a step-by-step pre-flight, see docs/_ralphwiggum/LAUNCH_CHECKLIST.md.
# 1. Create sandbox branch
git checkout dev && git pull --ff-only origin dev
git checkout -b ralph-wiggum-<sprint>
# 2. Ensure PROGRESS.md and PROMPT.md exist in root
ls PROGRESS.md PROMPT.md
# 3. Ensure spec docs exist
# Debt sprint:
ls docs/_debt/README.md docs/_debt/debt-0*.md
#
# Spec work (when approved):
ls docs/_specs/*.md docs/_archive/specs/spec-0*.md
# 4. Start tmux
tmux new -s erdos-ralph
# 5. Run the loop (bounded by timeouts; logs captured)
MAX=50
TIMEOUT=14400
ITER_TIMEOUT=1200
TIMEOUT_CMD="${TIMEOUT_CMD:-}"
if [[ -z "$TIMEOUT_CMD" ]]; then
if command -v timeout >/dev/null 2>&1; then
TIMEOUT_CMD="timeout"
elif command -v gtimeout >/dev/null 2>&1; then
TIMEOUT_CMD="gtimeout"
else
echo "Missing timeout command. Install coreutils (macOS) or use Linux `timeout`." >&2
exit 1
fi
fi
rm -rf logs/ralph 2>/dev/null || true
mkdir -p logs/ralph
"$TIMEOUT_CMD" "$TIMEOUT" bash -c '
for i in $(seq 1 '"$MAX"'); do
n=$(printf "%03d" "$i")
log="logs/ralph/iteration_${n}.log"
echo "=== Iteration $i/'"$MAX"' ===" | tee -a "$log"
# Avoid piping Claude output through tee (EPIPE risk); append directly (timeouts can still trigger EPIPE).
'"$TIMEOUT_CMD"' '"$ITER_TIMEOUT"' claude --dangerously-skip-permissions -p "$(cat PROMPT.md)" >> "$log" 2>&1
if ! grep -q "^\- \[ \]" PROGRESS.md; then
echo "All tasks complete" | tee -a "$log"
break
fi
sleep 2
done
'
# 6. Monitor in another pane (optional)
watch -n 5 'git log --oneline -10'
# 7. Audit when done
git log dev..ralph-wiggum-<sprint> --oneline
git diff dev..ralph-wiggum-<sprint> --stat
make ci
# 8. Merge if good
git checkout dev && git merge ralph-wiggum-<sprint>