FUTURE-009: Escalation Pathway (Centralized Multi-Agent Critique)
Status: Backlog (infrastructure exists, logic not implemented) Priority: Medium Created: 2026-01-19 Blocked By: Nothing (ready when needed) Owner: Solo
Summary
Implement the escalation pathway described in docs/architecture/architecture-evolution-plan.md (lines 233-258).
What escalation means: When the default single-agent synthesis produces low-confidence or failed verification results, escalate to deeper research and/or multi-agent critique to get a better answer.
Current state:
- suggested_escalation signal is computed in verify.py
- Escalation LOGIC is not implemented (was a stub, stub removed per DEBT-038)
- The pathway is designed, the wiring is not built
Why This Exists
From the architecture evolution plan:
Default: single orchestrator + deterministic feature extraction + strict validation. Escalation path (high-EV / low-confidence / contradictions): centralized multi-agent critique.
The system is designed to START cheap and ESCALATE when justified. This is cost-efficient and aligns with the arXiv research on agent scaling.
What Escalation Would Do
When verification.suggested_escalation == True:
Option A: Deeper Research (Simple)
1. Re-run Exa with mode=deep (instead of standard)
2. Re-synthesize with more evidence
3. Return improved result
Option B: Model Upgrade (Medium)
1. Switch from Sonnet to Opus
2. Re-synthesize with stronger model
3. Return improved result
Option C: Centralized Critics (Full)
1. Run ResearchCritic (challenge the evidence)
2. Run ConsistencyCritic (check for contradictions)
3. Run CalibrationCritic (check for base rate neglect)
4. Supervisor aggregates critiques
5. Re-synthesize with critic feedback
6. Return improved result
Escalation Triggers (From Architecture Plan)
| Trigger | Description |
|---|---|
| Verification failed | VerificationReport.passed == False |
| Low confidence | AnalysisResult.confidence == "low" |
| High EV opportunity | abs(predicted_prob - market_prob) > threshold AND sufficient liquidity |
| Missing citations | Medium/high confidence but few sources |
| Cross-market inconsistency | Arbitrage tool flags contradictions |
Current Infrastructure
| Component | Status | Location |
|---|---|---|
suggested_escalation signal |
Computed | agent/verify.py:99 |
VerificationReport schema |
Exists | agent/schemas.py |
AgentRunResult.escalated field |
Exists | agent/schemas.py |
| Escalation logic | NOT IMPLEMENTED | Was stub in orchestrator.py, removed |
| Critics (Research/Consistency/Calibration) | NOT IMPLEMENTED | Specced in architecture plan |
escalation.py module |
NOT IMPLEMENTED | Mentioned in SPEC-032 layout |
Implementation Plan
Phase 1: Simple Escalation (Re-research)
- When
suggested_escalation == True, re-run research withmode=deep - Re-synthesize with new evidence
- Add
--escalation-budget-usdCLI flag - Add
--no-escalationto disable
Phase 2: Model Upgrade Option
- Add
--escalation-modelflag (default: same model, option: opus) - Track escalation costs separately
Phase 3: Centralized Critics
- Implement
agent/escalation.pywith critic functions - Add
ResearchCritic,ConsistencyCritic,CalibrationCritic - Supervisor aggregation logic
- Full cost tracking
Cost Considerations
Escalation is expensive by design. The whole point is: "spend more money when it matters."
| Mode | Estimated Cost | When to Use |
|---|---|---|
| No escalation | $0.05-0.15 | Default, most markets |
| Re-research (deep) | +$0.50-1.00 | Low confidence, missing evidence |
| Model upgrade | +$0.50-2.00 | Verification failed |
| Full critics | +$2.00-5.00 | High EV opportunities only |
Budget ceilings are mandatory to prevent runaway costs.
Acceptance Criteria
- [ ] Escalation triggers on
suggested_escalation == True - [ ] Re-runs research in
deepmode - [ ] Re-synthesizes with new evidence
- [ ]
--escalation-budget-usdenforced - [ ]
--no-escalationdisables entirely - [ ] Escalation costs tracked in
AgentRunResult.total_cost_usd - [ ] Unit tests for escalation gating logic
References
docs/architecture/architecture-evolution-plan.md(lines 233-258)docs/_specs/SPEC-032-agent-system-orchestration.md(Phase 2)docs/_archive/debt/DEBT-038-*(stub removal rationale)- arXiv:2512.08296 - Scaling Agent Systems
- arXiv:2512.20845 - Multi-Agent Reflection