FUTURE-009: Escalation Pathway (Centralized Multi-Agent Critique)

Status: Backlog (infrastructure exists, logic not implemented) Priority: Medium Created: 2026-01-19 Blocked By: Nothing (ready when needed) Owner: Solo

Summary

Implement the escalation pathway described in docs/architecture/architecture-evolution-plan.md (lines 233-258).

What escalation means: When the default single-agent synthesis produces low-confidence or failed verification results, escalate to deeper research and/or multi-agent critique to get a better answer.

Current state: - suggested_escalation signal is computed in verify.py - Escalation LOGIC is not implemented (was a stub, stub removed per DEBT-038) - The pathway is designed, the wiring is not built

Why This Exists

From the architecture evolution plan:

Default: single orchestrator + deterministic feature extraction + strict validation. Escalation path (high-EV / low-confidence / contradictions): centralized multi-agent critique.

The system is designed to START cheap and ESCALATE when justified. This is cost-efficient and aligns with the arXiv research on agent scaling.

What Escalation Would Do

When verification.suggested_escalation == True:

Option A: Deeper Research (Simple)

1. Re-run Exa with mode=deep (instead of standard)
2. Re-synthesize with more evidence
3. Return improved result

Option B: Model Upgrade (Medium)

1. Switch from Sonnet to Opus
2. Re-synthesize with stronger model
3. Return improved result

Option C: Centralized Critics (Full)

1. Run ResearchCritic (challenge the evidence)
2. Run ConsistencyCritic (check for contradictions)
3. Run CalibrationCritic (check for base rate neglect)
4. Supervisor aggregates critiques
5. Re-synthesize with critic feedback
6. Return improved result

Escalation Triggers (From Architecture Plan)

Trigger	Description
Verification failed	`VerificationReport.passed == False`
Low confidence	`AnalysisResult.confidence == "low"`
High EV opportunity	`abs(predicted_prob - market_prob) > threshold` AND sufficient liquidity
Missing citations	Medium/high confidence but few sources
Cross-market inconsistency	Arbitrage tool flags contradictions

Current Infrastructure

Component	Status	Location
`suggested_escalation` signal	Computed	`agent/verify.py:99`
`VerificationReport` schema	Exists	`agent/schemas.py`
`AgentRunResult.escalated` field	Exists	`agent/schemas.py`
Escalation logic	NOT IMPLEMENTED	Was stub in `orchestrator.py`, removed
Critics (Research/Consistency/Calibration)	NOT IMPLEMENTED	Specced in architecture plan
`escalation.py` module	NOT IMPLEMENTED	Mentioned in SPEC-032 layout

Implementation Plan

Phase 1: Simple Escalation (Re-research)

When suggested_escalation == True, re-run research with mode=deep
Re-synthesize with new evidence
Add --escalation-budget-usd CLI flag
Add --no-escalation to disable

Phase 2: Model Upgrade Option

Add --escalation-model flag (default: same model, option: opus)
Track escalation costs separately

Phase 3: Centralized Critics

Implement agent/escalation.py with critic functions
Add ResearchCritic, ConsistencyCritic, CalibrationCritic
Supervisor aggregation logic
Full cost tracking

Cost Considerations

Escalation is expensive by design. The whole point is: "spend more money when it matters."

Mode	Estimated Cost	When to Use
No escalation	$0.05-0.15	Default, most markets
Re-research (deep)	+$0.50-1.00	Low confidence, missing evidence
Model upgrade	+$0.50-2.00	Verification failed
Full critics	+$2.00-5.00	High EV opportunities only

Budget ceilings are mandatory to prevent runaway costs.

Acceptance Criteria

[ ] Escalation triggers on suggested_escalation == True
[ ] Re-runs research in deep mode
[ ] Re-synthesizes with new evidence
[ ] --escalation-budget-usd enforced
[ ] --no-escalation disables entirely
[ ] Escalation costs tracked in AgentRunResult.total_cost_usd
[ ] Unit tests for escalation gating logic

References

docs/architecture/architecture-evolution-plan.md (lines 233-258)
docs/_specs/SPEC-032-agent-system-orchestration.md (Phase 2)
docs/_archive/debt/DEBT-038-* (stub removal rationale)
arXiv:2512.08296 - Scaling Agent Systems
arXiv:2512.20845 - Multi-Agent Reflection