# Benchmarking backtest results against random strategies

Posted on Oct 18, 2015 by

5 comments

744 Views

Picture this: A developer has coded up a brilliant strategy, taking great care not to over-optimize. There is no look-ahead bias and the developer has accounted for data-mining bias. The out of sample backtest looks great. Is it time to go live?

I would’ve said yes, until I read Ernie Chan’s

*Algorithmic Trading*and realised that I hadn’t adequately accounted for randomness. Whenever we compute a performance metric from a backtest, we face the problem of a finite sample size. We can’t know the true value of the performance metric, and the value we computed may or may not be representative of this true value. We may have been simply fooled by randomness into thinking we had a profitable strategy. Put another way, was the strategy’s performance simply due to being in the market at the right time?There are a number of empirical methods that can be used to address this issue. Chan describes three in his book mentioned above, and there are probably others. I am going to implement the approach described by Lo, Mamaysky and Wang (2000), who simulated sets of trades constraining their quantity in each direction to be the same as in the backtest, and with the same average holding period and distributed randomly over the price series used in the backtest. These random strategies are run a large number of times and a frequency histogram of the performance metric of interest constructed. The strategy’s backtest performance is compared with this histogram to reveal insight into whether it is in fact better than random and does have predictive power.

One pitfall that springs to mind is curve-fitting bias. If the comparison was done on the same data used to optimize the strategy, you would absolutely expect the strategy to outperform the random trader almost every time; otherwise something probably went wrong in the optimization process. Therefore, this method is valid when used on out of sample data only. The strategy should not have ever ‘seen’ this data before. Violating this principle would very likely lead to overly-optimistic results.

I think that this method has real value when (and this is why I implemented it) the developer cherry picks a portfolio of strategies depending on their performance in an out of sample test. For example, I optimized a strategy on a dozen different markets separately in the long and short directions. I then tested the portfolio of strategies on out of sample data and selected only those that performed well for the live portfolio. Using the proposed method of benchmarking in this scenario is essentially a counter for the selection bias introduced by cherry picking the strategies for the final portfolio.

This is the approach I used in the investigation:

- Constructed random strategies that mirror the trade distribution and frequency of the original strategy
- Ran the random strategy 5,000 times and constructed a histogram of the profit factor
- Compared the profit factor of the strategy against the distribution of randomly obtained profit factors

Here’s a code for a random strategy written for the Zorro trading platform in Lite-C. It can be set up to match the trade frequency and duration of any strategy by modifying the parameters in the switch-case function (controlling trade duration and total number of trades) and the random number generator in the trade function (controlling the frequency of trades).

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 |
//Randomly generate entries to match a strategy's simulation period, number of trades and trade duration #define ASSETLOOP while(asset(loop("GBP/JPY", "GBP/USD", "USD/CAD"))) #define TradesLong AlgoVar[0] #define TradesShort AlgoVar[1] function getOpt(string param) { switch (Asset) { case "GBP/JPY": switch (param) { case "strategyTradesLong" : return 114; case "strategyTradesShort" : return 0; case "strategyDuration" : return 4; } case "GBP/USD": switch (param) { case "strategyTradesLong" : return 128; case "strategyTradesShort" : return 0; case "strategyDuration" : return 4; } case "USD/CAD": switch (param) { case "strategyTradesLong" : return 0; case "strategyTradesShort" : return 113; case "strategyDuration" : return 4; } } } function tradeRandom() { var StrategyTradesLong = getOpt("strategyTradesLong"); //Number of trades in the actual strategy var StrategyTradesShort = getOpt("strategyTradesShort"); int StrategyDuration = getOpt("strategyDuration"); //Average trade length in the actual strategy strategy ExitTime = StrategyDuration; var NumOpen = NumOpenLong + NumOpenShort; if (random() > 0) { //prevents trades being taken on the same bars each run of the strategy if(NumOpen == 0 and TradesLong < StrategyTradesLong and random() > 0) { //may need to modify this line to get correct random trade frequency enterLong(); TradesLong += 1; } if (NumOpen == 0 and TradesShort < StrategyTradesShort and random() < 0) { //may need to modify this line to get correct random trade frequency enterShort(); TradesShort += 1; } } } function run() { set(LOGFILE); BarPeriod = 1440; StartDate = 2010; //Set start and end dates to match those from the acutal strategy's simulation EndDate = 2014; LookBack = 0; if (is(INITRUN)) TradesLong = 0; if (is(INITRUN)) TradesShort = 0; ASSETLOOP while(algo(loop("RAND"))) { if(Algo == "RAND") tradeRandom(); } } |

I ran this strategy 5,000 times and extracted the profit factor of each run using this Unix script that I ran with Cygwin.

1 2 3 4 5 6 7 8 9 10 11 12 |
#!/bin/bash i=0 while [ $i -lt 5000 ] do cd "C:/~/Zorro" ./Zorro -run random_trades cd "C:/~/Zorro/Log" sed -n 56p random_trades.txt >> RandomTrades_GBPJPY.txt sed -n 57p random_trades.txt >> RandomTrades_GBPUSD.txt sed -n 58p random_trades.txt >> RandomTrades_USDCAD.txt i=`expr $i + 1` done |

This feels like a crude approach, but at the present time it is aligned with my level of programming skill. In order to use this on a random strategy with different assets, the lines extracted from the performance report would need to be modified accordingly. I do believe that Zorro can be programmaticly set to run a certain number of times using the NumTotalCycles parameter. The profit factor metric can also be calculated and stored in a histogram. I haven’t quite mastered this approach, but this would simplify things a great deal. I’ll update this post accordingly once I’ve gotten my head around this technique.

In the next post, I’ll explore some results obtained with this approach.

**References**

Chan, Ernest, *Algorithmic Trading*, 2013, John Wiley and Sons

Lo, Andrew, Mamyskey, Harry and Wang, Jiang, 2000, *Foundations of Technical Analysis: Computational Algorithms, Statistical Inference, and Empirical Implementation*, Journal of Finance, Volume IV, Number 4.

## (5) Comments

[…] the first part of this article, I described a procedure for empirically testing whether a trading strategy has predictive power by […]

[…] inspiration for a great deal of my own research. My earlier posts about accounting for randomness (here and here) were inspired by the first chapter of Algorithmic Trading. Ernie works in MATLAB, but […]

Hi Robot Master,

Nice post on the application of random portfolios for strategy evaluation.

In your example application, you say that you pick the optimal strategy from a range of strategies based on out of sample performance. If this is the case, then you need to compare the statistics of this optimal strategy to the histogram of the ‘optimal’ profit factor.

A simple way to do this is on each of your 5000 runs, have the same number of random strategies running as in your original portfolio of strategies from which you choose from. Then for each of your 5000 runs, pick the optimal random strategy and record its profit factor. This is the correct distribution to compare your actual optimal strategy’s profit factor against.

See David Aronson’s 2011 textbook on technical analysis or look up White’s (2000) Reality Check method (see Hansen (2005) for an improved version).

Thanks Emlyn! One limitation with the approach I documented above is that in order for it to work, one has to assume that no data mining bias has taken place. This would be almost impossible to achieve in reality.

If I understood you correctly, picking the optimal random strategy from each of the 5000 runs would be equivalent to Wihte’s bootstrap method? This seems like a much simpler implementation than keeping track of all the tested strategy variants and detrending and sampling from their equity curves, as is my understanding of White’s Reality Check.

Even if it is not precisely equivalent, I can see that your suggestion is a more robust benchmark to use than the one I wrote about. Thanks for sharing it!

[…] randomness to the sample and then comparing performance is analogous to the approach I use to benchmark my systems against a random trader with a similar trade […]