Benchmarking backtest results against random strategies

Posted on Oct 18, 2015 by Kris Longmore
5 comments
737 Views
Picture this: A developer has coded up a brilliant strategy, taking great care not to over-optimize. There is no look-ahead bias and the developer has accounted for data-mining bias. The out of sample backtest looks great. Is it time to go live?
 
 I would’ve said yes, until I read Ernie Chan’s Algorithmic Trading and realised that I hadn’t adequately accounted for randomness. Whenever we compute a performance metric from a backtest, we face the problem of a finite sample size. We can’t know the true value of the performance metric, and the value we computed may or may not be representative of this true value. We may have been simply fooled by randomness into thinking we had a profitable strategy. Put another way, was the strategy’s performance simply due to being in the market at the right time?
There are a number of empirical methods that can be used to address this issue. Chan describes three in his book mentioned above, and there are probably others. I am going to implement the approach described by Lo, Mamaysky and Wang (2000), who simulated sets of trades constraining their quantity in each direction to be the same as in the backtest, and with the same average holding period and distributed randomly over the price series used in the backtest. These random strategies are run a large number of times and a frequency histogram of the performance metric of interest constructed. The strategy’s backtest performance is compared with this histogram to reveal insight into whether it is in fact better than random and does have predictive power.
 
 One pitfall that springs to mind is curve-fitting bias. If the comparison was done on the same data used to optimize the strategy, you would absolutely expect the strategy to outperform the random trader almost every time; otherwise something probably went wrong in the optimization process. Therefore, this method is valid when used on out of sample data only. The strategy should not have ever ‘seen’ this data before. Violating this principle would very likely lead to overly-optimistic results.
 
 I think that this method has real value when (and this is why I implemented it) the developer cherry picks a portfolio of strategies depending on their performance in an out of sample test. For example, I optimized a strategy on a dozen different markets separately in the long and short directions. I then tested the portfolio of strategies on out of sample data and selected only those that performed well for the live portfolio. Using the proposed method of benchmarking in this scenario is essentially a counter for the selection bias introduced by cherry picking the strategies for the final portfolio.
 
 This is the approach I used in the investigation:
 
  1. Constructed random strategies that mirror the trade distribution and frequency of the original strategy
  2. Ran the random strategy 5,000 times and constructed a histogram of the profit factor
  3. Compared the profit factor of the strategy against the distribution of randomly obtained profit factors
 Here’s a code for a random strategy written for the Zorro trading platform in Lite-C. It can be set up to match the trade frequency and duration of any strategy by modifying the parameters in the switch-case function (controlling trade duration and total number of trades) and the random number generator in the trade function (controlling the frequency of trades).

I ran this strategy 5,000 times and extracted the profit factor of each run using this Unix script that I ran with Cygwin.


This feels like a crude approach, but at the present time it is aligned with my level of programming skill. In order to use this on a random strategy with different assets, the lines extracted from the performance report would need to be modified accordingly. I do believe that Zorro can be programmaticly set to run a certain number of times using the NumTotalCycles parameter. The profit factor metric can also be calculated and stored in a histogram. I haven’t quite mastered this approach, but this would simplify things a great deal. I’ll update this post accordingly once I’ve gotten my head around this technique.
In the next post, I’ll explore some results obtained with this approach.

References
Chan, Ernest, Algorithmic Trading, 2013, John Wiley and Sons
Lo, Andrew, Mamyskey, Harry and Wang, Jiang, 2000, Foundations of Technical Analysis: Computational Algorithms, Statistical Inference, and Empirical Implementation, Journal of Finance, Volume IV, Number 4.

(5) Comments

[…] the first part of this article, I described a procedure for empirically testing whether a trading strategy has predictive power by […]

[…] inspiration for a great deal of my own research. My earlier posts about accounting for randomness (here and here) were inspired by the first chapter of Algorithmic Trading. Ernie works in MATLAB, but […]

Emlyn
January 27, 2016 at 8:49 pm

Hi Robot Master,
Nice post on the application of random portfolios for strategy evaluation.
In your example application, you say that you pick the optimal strategy from a range of strategies based on out of sample performance. If this is the case, then you need to compare the statistics of this optimal strategy to the histogram of the ‘optimal’ profit factor.
A simple way to do this is on each of your 5000 runs, have the same number of random strategies running as in your original portfolio of strategies from which you choose from. Then for each of your 5000 runs, pick the optimal random strategy and record its profit factor. This is the correct distribution to compare your actual optimal strategy’s profit factor against.
See David Aronson’s 2011 textbook on technical analysis or look up White’s (2000) Reality Check method (see Hansen (2005) for an improved version).

February 4, 2016 at 10:02 pm

Thanks Emlyn! One limitation with the approach I documented above is that in order for it to work, one has to assume that no data mining bias has taken place. This would be almost impossible to achieve in reality.
If I understood you correctly, picking the optimal random strategy from each of the 5000 runs would be equivalent to Wihte’s bootstrap method? This seems like a much simpler implementation than keeping track of all the tested strategy variants and detrending and sampling from their equity curves, as is my understanding of White’s Reality Check.
Even if it is not precisely equivalent, I can see that your suggestion is a more robust benchmark to use than the one I wrote about. Thanks for sharing it!

[…] randomness to the sample and then comparing performance is analogous to the approach I use to benchmark my systems against a random trader with a similar trade […]

Leave a Comment


FREE VIDEO COURSE


MASTER ZORRO IN 90 MINUTES:

CRACK ALGO TRADING'S SECRET WEAPON

  • Cut hours off connecting to brokers and FXCM
  • Run accurate and reliable backtests with less code
  • Manage, clean and convert most types of data
  • bullseye
    Follow along: code and optimize a simple strategy