Failure 1 — Slippage modeled as zero
The most common backtest sin: assuming you fill at the displayed mid-price. Real markets give you the bid if you're buying or the ask if you're selling, sometimes worse on size. Backtests that ignore the bid-ask spread routinely overstate returns by 100–300 basis points per year.
The fix: model slippage as at minimum the half-spread of the instrument at the trade size you're modeling. For thinly-traded instruments at non-trivial size, slippage can be 1–3% per round trip — meaningful enough to flip a profitable strategy to a losing one.
Failure 2 — Survivorship bias
Backtests over historical data often run only on instruments that exist today. Companies that delisted, ETFs that were closed, currencies that were repegged — they're not in the backtest because they're not in the dataset. The backtest sees only winners.
This systematically inflates returns. The fix is sourcing point-in-time datasets that include delisted instruments — more expensive, but the only honest path.
Failure 3 — Lookahead bias
The backtest accidentally uses information that wouldn't have been available at the trade time. The classic version: using same-day high/low in a strategy that triggers at the open. Subtle versions include using earnings announcements before they were released, or fundamental data that was revised after the original publication.
The fix: rigorous separation of as-of timestamps. Every data point must be tagged with the moment it became publicly available, and the strategy can only see data with as-of <= now during backtest.
Failure 4 — Adversarial market makers
In backtest, the strategy fills at modeled prices. In live trading, market makers see the order coming and adjust their quotes. The strategy's edge gets partially front-run before the fill happens.
This is invisible in standard backtests. The only way to test for it is to run the strategy in paper trading against real-time market data with realistic latency — and see whether fills come in at expected prices or worse. DayTrading Swarm's paper-trading mode does this; many platforms simulate fills at backtest prices, which understates the adversarial drag.
Failure 5 — Parameter overfit
The strategy was tuned to maximize backtested returns by adjusting parameters (lookback windows, threshold values, holding periods). The tuning fits the strategy to the historical noise rather than to a real edge. Live performance reverts to the underlying signal strength, which is much lower than the tuned backtest suggests.
Detection: out-of-sample testing. Hold out 30%+ of historical data, tune on 70%, evaluate on the held-out 30%. If out-of-sample performance is materially worse than in-sample, the strategy is overfit and won't survive live deployment.
The realistic backtest checklist
- Slippage modeled as at least half-spread, scaled to size
- Trading costs included (commissions, exchange fees, regulatory fees)
- Point-in-time data including delisted instruments
- Strict as-of separation — no lookahead
- Out-of-sample validation with held-out test set
- Walk-forward testing — re-train periodically vs. one-shot training on all history
- Adversarial paper-trading period of at least 30 days against real-time data before any live capital
Strategies that survive all 7 are rare and worth deploying. Strategies that survive 3 of 7 are the norm in retail algo platforms — and explain why most retail algo trading underperforms expectations.
Frequently asked questions
- How do I model slippage if I don't know my fill prices in advance?
- Use the half-spread as a baseline, scaled by your trade size relative to typical volume. For instruments where you'd be a meaningful fraction of typical volume, double the half-spread as a margin of safety.
- Where do I get point-in-time data?
- Norgate Data, Quandl, and Refinitiv all offer point-in-time datasets. Free sources are usually survivor-only. The cost difference (free vs. $50–$200/month) is meaningful but worth it for any serious strategy development.
- How long should the out-of-sample test period be?
- At least 30% of total available history, drawn from the most recent period. The most recent data is the most demanding test because market microstructure changes over time.
- Is paper trading required before live deployment?
- Strongly recommended. The 30-day paper trading period is when you discover slippage modeling errors, latency issues, and adversarial-market drag. The paper-trading P&L tells you what to expect from live deployment within reasonable bounds.
- How do I know if my strategy is overfit?
- Compare in-sample vs. out-of-sample Sharpe ratio. If out-of-sample is less than 60% of in-sample, you're overfit. The fewer parameters your strategy has, the less likely overfit; the more parameters, the more rigorous the out-of-sample testing has to be.