Backtest Strategies
Backtesting is how Turbine Studio helps you reject bad ideas before they reach a live runner. It replays a Studio strategy against supported historical data and reports simulated fills, PnL, drawdown, trade count, and execution assumptions.
A backtest is not a prediction. It is a structured argument about what would have happened under a specific model and a specific data window.
What backtesting answers
Backtesting is useful for questions like:
- Did the strategy trade often enough to evaluate?
- Did profits depend on one market or one outlier?
- Did the strategy require unrealistic fills?
- Did fees or spreads erase the apparent edge?
- Did drawdown exceed the user's risk tolerance?
- Did external data improve the rule or just add complexity?
- Does the strategy still work with smaller position limits?
Backtesting is also useful when the answer is no. A rejected idea is a good outcome if it prevents a weak strategy from going live.
What Studio models
Studio backtests focus on prediction-market execution realities:
- venue market prices where supported,
- spread-aware fills,
- maker and taker behavior where strategy type supports it,
- position limits,
- price bounds,
- fees where available,
- market close behavior,
- external edge-data timing,
- stale-data skips,
- simulated equity curve and trade log.
The exact simulation model can vary by venue, market, data availability, and strategy type. Studio surfaces assumptions in the result so you can decide how much weight to give the backtest.
Supported data
Studio currently uses historical data from:
- Kalshi,
- Polymarket where supported,
- edge data including historic Coinbase data,
- edge data including National Weather Service data.
See Data Sources for details.
How to run a backtest
- Build or open a Studio strategy.
- Confirm the market selector and risk limits.
- Ask Studio to backtest.
- Review summary metrics.
- Inspect trade count, fills, drawdown, and assumptions.
- Revise one thing at a time.
- Backtest again only when the revision has a reason.
Example prompt:
Backtest this strategy over the longest supported window. Focus on fills, spreads, fees, drawdown, and whether the strategy depends on stale edge data.What to look for
Trade count
A strategy with too few simulated trades may not have enough evidence. A beautiful result from three trades should be treated as a hypothesis, not a system.
Drawdown
Drawdown shows how painful the path may have been. A positive PnL with unacceptable drawdown may still be a bad fit.
Fill quality
If the result depends on perfect entries or thin markets, reduce confidence.
Fees and spread
Prediction market edges can be small. Fees and spread can turn a good-looking signal into a bad live strategy.
Data timing
Edge data must be available at the time the strategy would have acted. If the rule depends on data that arrives late or is stale, the backtest should show skipped actions or weaker results.
Market concentration
If one market drove most of the PnL, test whether the thesis generalizes.
Using backtests as filters
Backtests should help filter ideas:
| Result | Good response |
|---|---|
| Low trade count | Narrowly label the result as inconclusive. |
| High PnL, huge drawdown | Reduce risk or reject the strategy. |
| Profit disappears after fees | Reject or redesign the entry rule. |
| Depends on stale edge data | Add stale-data pauses or reject. |
| Works only with loose limits | Do not deploy without explicit approval. |
| Stable under conservative limits | Consider small live deployment. |
What backtests do not prove
Backtests do not guarantee:
- live fills,
- future market behavior,
- continued data availability,
- venue uptime,
- execution speed,
- liquidity,
- settlement outcomes,
- profitability.
They are one input. Use them alongside strategy reasoning, market knowledge, and conservative live limits.
AI agent guidance
When reporting a backtest, an AI agent should explain:
- the strategy tested,
- the data window,
- the venue and market scope,
- the most important assumptions,
- the key metrics,
- the weakest part of the result,
- whether the idea should be rejected, revised, or tested live at small size.
Do not summarize a backtest as "good" without naming why. Do not recommend deployment when the result is fragile.