How do you experiment as a systematic trader?
As far as I know, you’ve got two options when it comes to testing systematic trading strategies:- You can find out the actual performance of your strategy
- You can find out the likely performance of your strategy
Look-Ahead Bias aka Peeking Bias
If you could travel back in time, you’d probably be quite tactical about your “creative” work wherever you landed. Google? Your idea. Predicting future events to become a local deity? Hold my feather crown. In trading, you might want to hold off on all that. Look-ahead bias is a type of backtesting bias introduced by allowing future knowledge to affect your decisions around historical scenarios or events. As a trader running backtests, this bias impacts your trade decisions by acting upon knowledge that would not have been available at the time the original trade decision was taken. What does this look like in practice? A popular example is executing an intra-day trade on the basis of the day’s closing price, when that closing price is not actually known until the end of the day. Even if you use backtesting software that’s designed against lookahead bias, you need to be careful. A subtle but potentially serious mistake is to use the entire simulation period to calculate a trade parameter (for example, a portfolio optimization parameter) which is then retrospectively applied at the beginning of the simulation. This error is so common that you must always double check for it. And triple check if your backtest looks really good.Overfitting Bias aka Over-Optimization Bias
If you run a backtest producing annual returns in the ballpark of thousands of percent, don’t quit your day job – you’ve likely stumbled across overfitting bias. Apart from being comedic firewood for your favourite FX forum, these backtests are useless for real, systematic trading purposes. Check out the following plots to see overfitting as backtesting bias in action. The blue squares in the figure below are an artificially generated quadratic function with some noise added to distort the underlying signal. The lines represent various models fitted to the data points. The red line is a linear regression line; the green, blue and orange lines are quadratic, cubic and quartic functions respectively. Apart from the linear regression line, these all do a decent job of modelling the data, for this region in the parameter space. The pink line is a high-order polynomial regression: notice that it fits this data best of all:
But do these models hold up out-of-sample? What I’m really asking is, how well do they generalize to data that was not used in the model-fitting process?
Well, the next plot shows the performance of the quadratic, cubic and quartic functions in a new region of the observed variable space, meaning an out-of-sample data set. In this case, the quadratic function is clearly the best performer, and we know that it most closely matches the underlying generating function – this is an example of a well-fit model.
The other models do a pretty crummy job of predicting the value of the function for this new, unseen region of parameter space, even though they looked pretty attractive on the in-sample data.
The best model on the in-sample data set, the high-order polynomial, does a terrible job of modeling this out-of-sample region. In fact, in order to see it, we have to look at a completely different portion of the y-axis, and even use a logarithmic scale to make sense of it:
This model is predicting hugely negative values of our function when we know that it could never generate a single negative value (thanks to the quadratic term in the underlying function). The function looks nothing like a quadratic function: it is more like a hyperbolic function. Or bad modern art.
This misrepresentation of the underlying process is a classic example of overfitting, and it’ll have you banging your head against the wall a lot in your early days of algo trading. In fact, you’ll face this problem every day in your strategy development, you just learn how to eat its punches with experience.
Overfitting bias affects strategies that are tested on in-sample data. The same data is used to optimize and then test the strategy. Common sense will tell you that a strategy will perform well on the data with which it was optimized – that’s the whole point of optimization! What’s more, exhaustively searching the parameter space and choosing a local performance maximum will undoubtedly lead to overfitting and failure in an out-of-sample test.
It is crucial to understand that the purpose of the in-sample data set is not to measure the performance of a strategy. The in-sample data is used to develop the strategy and find parameter values that may be suitable. At best, you should consider the in-sample results to be indicative of whether the strategy can be profitable at all.
Avoid using the in-sample results to benchmark likely future performance of a strategy.
Look again at the figures above. The model with the best in-sample performance was the high-order polynomial shown by the magenta line. That said, its out-of-sample performance was as enviable as stepping on a plug. The quadratic, cubic and fourth-order models all performed reasonably well in the in-sample test, but the quadratic model was the clear star performer in the out-of-sample test. Obviously, you can infer little about performance on unseen data using in-sample testing.
Here’s the real insidious part….
When you fit a model (a trading strategy) to a noisy data set (and financial data is a rave), you risk fitting your model to the noise, rather than the underlying signal. The underlying signal is the anomaly or price effect that you believe provides profitable trading opportunities, and this signal is what you are actually trying to capture with your model.
Noise gets between you and the money. It’s a random process, and it’s unlikely to repeat itself exactly the same way. If you fit your model to the noise, you’ll end up with a random model. Unless you enjoy paying for your broker’s 12oz rib-eye steak, this isn’t something you should ever trade.
So what’s the overarching lesson from all this?
Well, in-sample data is only useful in the following ways:
- Finding out whether a strategy can be profitable and under what conditions
- Determining which parameters have a significant impact on performance
- Determining sensible ranges over which parameters might be optimized
- Debugging the strategy, that is, ensuring trades are being entered as expected
- Keeping trades simple. The fewer the number of fittable parameters, the better.
- Favouring trades that can be rationalised in a sentence, over blindly data mining for trading rules.
- Optimising for robustness, not in-sample performance (more on this later).
- Avoid the temptation to be precise in your model specification. Market data is noisy and fickle, and any signal is weak.
- Avoid trades that will, at best, marginally cover retail trading costs, such as scalping.



This is a good post on robust trading. If the parameter during back-test has a big range , it is more robust else it is more fragile.