Backtest Strategies

Backtesting is how Turbine Studio helps you reject bad ideas before they reach a live runner. It replays a Studio strategy against supported historical data and reports simulated fills, PnL, drawdown, trade count, and execution assumptions.

A backtest is not a prediction. It is a structured argument about what would have happened under a specific model and a specific data window.

What backtesting answers

Backtesting is useful for questions like:

Did the strategy trade often enough to evaluate?
Did profits depend on one market or one outlier?
Did the strategy require unrealistic fills?
Did fees or spreads erase the apparent edge?
Did drawdown exceed the user's risk tolerance?
Did external data improve the rule or just add complexity?
Does the strategy still work with smaller position limits?

Backtesting is also useful when the answer is no. A rejected idea is a good outcome if it prevents a weak strategy from going live.

What Studio models

Studio backtests focus on prediction-market execution realities:

venue market prices where supported,
spread-aware fills,
maker and taker behavior where strategy type supports it,
position limits,
price bounds,
fees where available,
market close behavior,
external edge-data timing,
stale-data skips,
simulated equity curve and trade log.

The exact simulation model can vary by venue, market, data availability, and strategy type. Studio surfaces assumptions in the result so you can decide how much weight to give the backtest.

Supported data

Studio currently uses historical data from:

Kalshi,
Polymarket where supported,
edge data including historic Coinbase data,
edge data including National Weather Service data.

See Data Sources for details.

How to run a backtest

Build or open a Studio strategy.
Confirm the market selector and risk limits.
Ask Studio to backtest.
Review summary metrics.
Inspect trade count, fills, drawdown, and assumptions.
Revise one thing at a time.
Backtest again only when the revision has a reason.

Example prompt:

Backtest this strategy over the longest supported window. Focus on fills, spreads, fees, drawdown, and whether the strategy depends on stale edge data.

What to look for

Trade count

A strategy with too few simulated trades may not have enough evidence. A beautiful result from three trades should be treated as a hypothesis, not a system.

Drawdown

Drawdown shows how painful the path may have been. A positive PnL with unacceptable drawdown may still be a bad fit.

Fill quality

If the result depends on perfect entries or thin markets, reduce confidence.

Fees and spread

Prediction market edges can be small. Fees and spread can turn a good-looking signal into a bad live strategy.

Data timing

Edge data must be available at the time the strategy would have acted. If the rule depends on data that arrives late or is stale, the backtest should show skipped actions or weaker results.

Market concentration

If one market drove most of the PnL, test whether the thesis generalizes.

Using backtests as filters

Backtests should help filter ideas:

Result	Good response
Low trade count	Narrowly label the result as inconclusive.
High PnL, huge drawdown	Reduce risk or reject the strategy.
Profit disappears after fees	Reject or redesign the entry rule.
Depends on stale edge data	Add stale-data pauses or reject.
Works only with loose limits	Do not deploy without explicit approval.
Stable under conservative limits	Consider small live deployment.

What backtests do not prove

Backtests do not guarantee:

live fills,
future market behavior,
continued data availability,
venue uptime,
execution speed,
liquidity,
settlement outcomes,
profitability.

They are one input. Use them alongside strategy reasoning, market knowledge, and conservative live limits.

AI agent guidance

When reporting a backtest, an AI agent should explain:

the strategy tested,
the data window,
the venue and market scope,
the most important assumptions,
the key metrics,
the weakest part of the result,
whether the idea should be rejected, revised, or tested live at small size.

Do not summarize a backtest as "good" without naming why. Do not recommend deployment when the result is fragile.