The backtest trap
Every strategy looks good in backtests. That's not a compliment.
A backtest that uses all available data for both development and evaluation is guaranteed to "work" — because you've essentially solved the same test you're grading yourself on. The strategy isn't predicting anything. It's describing what already happened.
The problem isn't the backtest tool. It's the process. Most traders develop a strategy, run it on five years of data, adjust the parameters until the equity curve looks the way they want, and call it done. What they've actually built is a model that describes the past. Not one that predicts the future.
If every strategy you build passes your process, your process isn't filtering anything.
What out-of-sample testing actually means
Before I start developing any strategy, I lock a portion of the price history away. No parameters are tuned against it. No entries, exits, or filters are tested on it. That data doesn't exist until development is finished.
Once the strategy is built and validated on the in-sample window, I run it once — just once — on the out-of-sample data. Whatever the result is, that's the result. No re-optimisation. No "let me just adjust that one parameter." If it fails OOS, the strategy is rejected. That's the whole point.
The OOS window is typically the most recent 20–30% of the available history. This matters: you want the strategy to have seen less data, not more, during development. More data during development just means more opportunity to overfit.
Why a high rejection rate is a feature
I reject more than 80% of the strategies I develop. That number isn't something I'm embarrassed about — it's the mechanism.
If you're not rejecting most of what you build, you're not testing. You're curating. The goal is to fail most strategies quickly, cheaply, and before you trade real money on them — not to find a way to make every strategy look good enough to publish.
The strategies that survive have passed a filter specifically designed to catch overfit curves. The ones that fail are exactly what they look like: backtests that worked on the data they were tuned on, and nothing else.
The three tests every EdgeLab strategy passes
Out-of-sample performance is the first filter — but not the only one. After OOS, every strategy goes through two more validation steps:
-
1
Sensitivity analysis. Entry and exit parameters are shifted by ±20–30% in every direction. If performance collapses when a parameter moves slightly from its optimised value, the edge is fragile. Robust strategies show stable performance across a range of parameter values — not just at the exact setting that looked best in the backtest.
-
2
Monte Carlo simulation. Trade order is randomised across thousands of scenarios. This stress-tests drawdowns in ways the actual historical sequence didn't produce. A strategy that shows a 6% max drawdown in the backtest might show a 14% drawdown in a Monte Carlo scenario — and that's important to know before you trade it.
-
3
Logic review. Does the strategy make conceptual sense? Is there a structural reason the edge should exist going forward — or is it just a pattern that happened to show up in the historical data? Strategies that pass the quantitative tests but fail the logic review don't get published either.
What to do if you're building strategies now
If you're running strategies without a locked OOS window, you're not testing. You're fitting. The fix isn't more data or a better backtest tool — it's committing to the process before you start, not after the equity curve already looks good.
Set aside the last 20% of your available history before you open the backtest. Don't look at it. Don't reference it. Develop your strategy entirely on the in-sample data, then run a single forward test on the OOS window when you're done. Whatever comes out, that's your real result.
If that process kills most of what you build — it's working.