Backtest Data Sources

Studio backtests use historical market data and supported edge data to evaluate Studio strategy specs. The goal is to approximate the information and execution conditions a strategy would have faced at the time.

Data coverage can vary by venue, market, and time period. Studio should treat unavailable data as a limitation, not as a blank check to assume perfect conditions.

Market data

Studio currently works with supported historical data from:

SourceUsed for
KalshiPrediction market prices, spreads, timing, fills, and market close behavior where supported.
PolymarketPrediction market strategy research where supported by the current Studio surface.

Market data helps answer:

  • what price was available,
  • whether a spread was tradable,
  • whether a market was active enough,
  • how a position would have changed over time,
  • whether the market was near close,
  • whether the strategy traded too often or too rarely.

Edge data

Edge data is external information a strategy can use as a signal.

Current edge data includes:

SourceTypical strategies
Historic Coinbase dataCrypto-linked prediction markets, momentum, mean reversion, cross-signal strategies.
National Weather Service dataWeather-linked prediction markets, forecast fades, observation-based rules.

Edge data helps test whether an external signal would have improved the strategy at the time, not just in hindsight.

Coinbase edge data

Coinbase historical data is useful when the prediction market references crypto prices or crypto-linked outcomes.

Example strategy ideas:

  • BTC spot moves faster than the prediction market price.
  • ETH price momentum confirms an event-market drift.
  • SOL volatility rises while the contract price remains stale.
  • A market overreacts to a short-term move and mean reverts.

Example prompt:

Use Coinbase BTC historical data as the edge signal. Enter only when BTC spot has moved more than 1% in 15 minutes and the prediction market has not repriced by more than 3 cents.

National Weather Service edge data

NWS data is useful for weather-linked contracts and strategies that depend on forecasts, alerts, or observations.

Example strategy ideas:

  • fade markets that overreact to forecast noise,
  • compare forecast-implied probability to market price,
  • avoid trading when weather data is stale,
  • stop opening risk as the observation window approaches resolution.

Example prompt:

Use NWS forecast and observation data. Sell YES when the market trades more than 10 cents above the forecast-implied probability and the spread is under 5 cents.

Data timing

A serious backtest should avoid future information. Edge data should be evaluated as if the strategy only knew what was available at that timestamp.

AI agents should watch for these issues:

  • using a final outcome as if it were known early,
  • using revised data without acknowledging revision risk,
  • ignoring data staleness,
  • treating an unavailable signal as zero,
  • comparing markets that were not open at the same time,
  • assuming liquidity that did not exist.

Missing or stale data

When data is missing or stale, conservative behavior is preferred:

  • skip the entry,
  • keep existing risk bounded,
  • cancel maker quotes if the signal is required,
  • flag the backtest as limited,
  • ask the human whether to use a simpler rule.

Do not fill gaps with optimistic assumptions.

Choosing data for a strategy

Use market data when the strategy is based on book behavior:

  • spread capture,
  • liquidity filters,
  • price momentum,
  • price mean reversion,
  • close-time behavior.

Use edge data when the thesis depends on outside information:

  • crypto price movement,
  • weather forecast disagreement,
  • scheduled observation updates,
  • external volatility.

Use both when the strategy needs a market reaction and an external signal:

Only enter if Coinbase confirms the move and the prediction market has not already priced it in.

AI agent checklist

Before running or reporting a backtest, confirm:

  • the market data source is supported,
  • the edge data source is supported,
  • the data window is meaningful,
  • the strategy does not rely on future information,
  • stale data causes skips or pauses,
  • missing data is disclosed,
  • the result is not presented as a guarantee.