SPEC-030: Exa Endpoint Strategy (Cost-Bounded, Verifiable Research)

Status: 🟡 Phase 1 implemented (2026-01-18) Priority: P1 (Research Quality + Cost Control) Created: 2026-01-10 Owner: Solo Effort: ~1–3 days

Summary

Standardize how the platform uses Exa endpoints (/search, /contents, /findSimilar, /answer, /research) so that:

the default path is cheap and fast,
“deep” paths are explicit and gated,
outputs are citation-forward and verifiable (minimize hallucination risk),
caching is consistent and cost tracking is observable.

SSOT for Exa endpoint behavior: ../_vendor-docs/exa-api-reference.md

Goals

One coherent policy for choosing Exa endpoints based on task type + budget.
Deterministic defaults for CLI research commands (bounded calls, bounded result count).
Citation-first outputs (URLs + domains + timestamps) so humans can audit quickly.
Cost controls:
per-command USD budget
predictable request count
Optionally verifiable quotes (lightweight “trust but verify”).

Non-Goals

No “agentic” Exa orchestration beyond the Exa /research endpoint.
No new ML sentiment models.
No permanent storage of every Exa response (news pipeline already persists what it needs).

Current State (SSOT)

Implemented Exa client capabilities

ExaClient supports:

search(...)
search_and_contents(...)
get_contents(...)
find_similar(...)
answer(...)
create_research_task(...) + wait_for_research(...)

SSOT: src/kalshi_research/exa/client.py

Existing usage patterns

Market context research uses Search (news + research paper categories) with caching (SSOT: src/kalshi_research/research/context.py).
Topic research uses Answer + SearchAndContents with caching (SSOT: src/kalshi_research/research/topic.py).
News collector uses SearchAndContents and persists results in SQLite (SSOT: src/kalshi_research/news/collector.py).

Observed gaps

We do not use Find Similar or Research endpoints in any user-facing flow yet.
There is no explicit “budget” per CLI command; cost is only reported after the fact. ✅ Fixed for kalshi research context and kalshi research topic in Phase 1.
“Answer” is helpful but can hallucinate; we currently trust citations without verification.

Principles (First Principles)

Search is retrieval; Answer/Research is synthesis.
Prefer retrieval-first for transparency and auditability.
Never trust synthesis without citations.
Answer/Research outputs must include URLs; otherwise treat as “non-authoritative”.
Budget is a feature.
If users can’t predict cost, they won’t run it at scale.
Determinism beats cleverness.
Default queries should be small, stable, and cache-friendly.

Proposed “Endpoint Selection Policy”

Define a small policy module used by CLI and future code:

ExaPolicy(mode, budget, recency, domains, max_results) -> ExaPlan

Modes

Mode	Intended Use	Endpoints	Default Budget
`fast`	quick context, low stakes	`/search` or `/search`+contents	$0.01–$0.05
`standard`	normal thesis work	`/search_and_contents` + optional `/answer`	$0.05–$0.25
`deep`	high-EV or ambiguous markets	`/research` (+ follow-up `/contents`)	$0.25–$2.00

Decision tree (deterministic)

Need sources + snippets? Use /search with highlights=True, text=False.
Need readable article text for extraction? Use /search_and_contents or /contents for top URLs.
Need a short summary with citations? Use /answer only after retrieval, and only if citations exist.
Need a multi-hop report (lots of sub-questions)? Use /research only in deep mode or when explicitly requested.

SSOT for endpoint semantics: ../_vendor-docs/exa-api-reference.md

Citation Verification (Optional, but recommended)

When we show a quote/highlight from Exa, we can optionally verify it:

Take the citation URL.
Fetch contents (clean text) for that URL.
Confirm the quoted substring appears in the returned text.

Verification is:

On by default only for deep mode (low volume, higher budget).
Off by default for fast/standard (to avoid doubling cost).

If verification fails:

mark the citation as “unverified” in output,
do not include the quote text (only include URL + title).

CLI Surface Changes (Proposed)

1) `kalshi research context`

Add explicit policy flags:

--mode fast|standard|deep (default: standard)
--budget-usd FLOAT (default depends on mode)
--verify-citations/--no-verify-citations (default: mode-based)
--include-domains a.com,b.com / --exclude-domains ...
--max-news INT / --max-papers INT (already exists in spirit; unify naming)

2) `kalshi research topic`

Current behavior is Answer + SearchAndContents.

Adjust to:

Run retrieval first (SearchAndContents).
Run Answer second, with a prompt that includes retrieved URLs and asks Exa Answer to cite from those.
If Exa Answer can’t be constrained that way, keep current behavior but require citations and mark “unverified” until verified via contents.

Add:

--mode
--budget-usd
--no-answer (retrieve-only mode)

3) New: `kalshi research exa ...` (raw, optional)

Expose raw Exa operations for debugging without writing ad-hoc scripts:

kalshi research exa search "query" ...
kalshi research exa answer "query" ...
kalshi research exa research "query" ...

These commands should output JSON only (tooling-friendly).

Implementation Plan

Phase 1: Policy + budgets

✅ Add ExaPolicy and ExaBudget types (pure Python, no network).
✅ Thread mode/budget flags through research context and research topic.
✅ Enforce budgets:
track cumulative cost_dollars.total (SSOT: Exa responses include costDollars)
stop early and warn when budget would be exceeded
✅ Standardize caching keys:
include mode + all request params in the cache key
keep day-level bucketing for “news” queries (already used in context research)

Phase 2: Find Similar + Deep Research (gated) — BACKLOG

Status: Intentionally deferred. Will be implemented when there's demonstrated need.

Add optional "expand" step using /findSimilar:
seed with top 1–3 URLs from Search
fetch similar URLs to diversify sources (avoid single-domain lock-in)
Add /research support behind --mode deep:
create task, poll status, return structured report
always return citations/URLs

Phase 3: Optional citation verification — BACKLOG

Status: Intentionally deferred. Low priority until trust issues arise.

Implement verify_citation(url, quote) -> bool using /contents.
Enable by default in deep mode.

Acceptance Criteria

Phase 1 (Complete)

[x] kalshi research context and kalshi research topic have explicit --mode and --budget-usd controls.
[x] kalshi research context/topic stop early when budget would be exceeded and set budget_exhausted=true.
[x] Other Exa-powered commands have explicit --budget-usd controls (news collect, research similar/deep, thesis flows).
[x] Policy does not introduce /research calls implicitly; /research remains behind kalshi research deep (existing).
[x] Caching remains effective (no accidental cache busting from unstable params).
[x] Unit tests cover:
[x] budget enforcement logic (no network; use mocked responses)
[x] cache key stability

Phase 2/3 (Backlog)

[ ] Citation verification logic