Backtesting (Explanation)
Backtesting answers the question: "If I had bet on all my theses, would I have made money?"
This is the feedback loop that turns prediction market research into a learnable skill.
Why Backtest?
Without backtesting, you're flying blind:
- You might think you're good at predictions, but you're just lucky
- You might avoid certain markets that you're actually good at
- You can't optimize position sizing without historical performance data
- You can't calculate risk metrics (e.g., Sharpe ratio)
Backtesting uses your resolved theses + historical settlement data to simulate what would have happened.
How It Works
Resolved Theses (data/theses.json)
│
▼
ThesisBacktester
│
┌────┴────┐
▼ ▼
Settlements Price Snapshots (optional)
(SQLite) (SQLite)
│ │
└──────┬───────┘
▼
BacktestResult
- Total P&L
- Win rate
- Brier score
- Sharpe ratio
Trade Simulation
For each resolved thesis, the backtester:
- Determines entry price: Uses the
market_probabilityat thesis creation (or nearest price snapshot if available) - Determines exit price: From settlement (YES = 1.0, NO = 0.0)
- Determines side: If your probability > 0.5, you'd bet YES; otherwise NO
- Calculates P&L: Based on entry/exit prices and position size
@dataclass
class BacktestTrade:
ticker: str
side: str # "yes" or "no"
entry_price: float # Price when thesis created (0-1)
exit_price: float # Settlement price (0 or 1)
thesis_probability: float
contracts: int = 1
@property
def pnl(self) -> float:
"""Profit/loss in cents per contract."""
if self.side == "yes":
return (self.exit_price - self.entry_price) * 100 * self.contracts
else:
return (self.entry_price - self.exit_price) * 100 * self.contracts
BacktestResult Metrics
After simulating all trades, you get:
@dataclass
class BacktestResult:
thesis_id: str
period_start: datetime
period_end: datetime
trades: list[BacktestTrade] = field(default_factory=list)
# Trade statistics
total_trades: int = 0
winning_trades: int = 0
losing_trades: int = 0
# P&L
total_pnl: float = 0.0 # Total P&L in cents
avg_pnl: float = 0.0 # Average P&L per trade
max_win: float = 0.0
max_loss: float = 0.0
# Accuracy metrics
accuracy: float = 0.0 # % predictions correct
brier_score: float = 0.0 # Brier score of predictions
win_rate: float = 0.0 # % of trades profitable
# Risk metrics
sharpe_ratio: float = 0.0 # Simplified Sharpe
Understanding the Metrics
Win Rate vs Accuracy
These are different:
- Accuracy: Did your probability correctly predict the direction? (prob > 0.5 and YES, or prob < 0.5 and NO)
- Win Rate: Did you make money on the trade?
You can have high accuracy but low win rate if your edge (difference between your prob and market prob) is small.
Brier Score
Aggregate measure of prediction quality:
brier = mean((forecast - outcome)² for all trades)
- 0.0 = perfect
- 0.25 = random guessing
- Lower is better
Sharpe Ratio
Risk-adjusted return:
sharpe = mean(pnls) / std(pnls)
Higher is better. A Sharpe > 1.0 is generally considered good.
CLI Usage
First, ensure you have settlement data:
uv run kalshi data sync-settlements --db data/kalshi.db
Then run the backtest:
uv run kalshi research backtest \
--start 2024-01-01 \
--end 2024-12-31 \
--db data/kalshi.db
Position Sizing
The backtester uses a configurable default_contracts parameter. In practice, you'd want to:
- Size positions proportional to edge size
- Account for Kelly criterion
- Consider bankroll management
The current implementation uses fixed sizing for simplicity.
Spread Costs
The current backtester does not model spread/slippage costs. Entry prices use:
thesis.market_probability(when no snapshots are available), or- the closest snapshot midpoint (
(yes_bid + yes_ask) / 2) when snapshots are available.
Note: ThesisBacktester has an include_spreads flag, but it is not currently applied in the simulation logic.
Price Snapshots for Timing
If you have historical price snapshots in your database, the backtester can use them to get more accurate entry prices. This matters because:
- The
market_probabilityat thesis creation might not reflect what you'd actually pay - With snapshots, it finds the closest price to when you created the thesis
if snapshots and settlement.ticker in snapshots:
entry_price = self._get_price_at_time(
snapshots[settlement.ticker],
thesis.created_at,
)
else:
entry_price = thesis.market_probability
Void Settlements
Markets that settle as "void" are skipped in backtesting - they don't affect P&L (your money would be returned).
Key Code
- Backtester:
src/kalshi_research/research/backtest.py - CLI command:
src/kalshi_research/cli/research.py - Settlement model:
src/kalshi_research/data/models.py
The CLI prints a summary plus a per-thesis results table (P&L, win rate, Brier score, Sharpe).
See Also
- Thesis System - How predictions are tracked
- Calibration Analysis - Deeper accuracy metrics
- Usage: Research - CLI commands