Benchmarking backtest results against random strategies

Picture this: A developer has coded up a brilliant strategy, taking great care not to over-optimize. There is no look-ahead bias and the developer has accounted for data-mining bias. The out of sample backtest looks great. Is it time to go live?
 I would’ve said yes, until I read Ernie Chan’s Algorithmic Trading and realised that I hadn’t adequately accounted for randomness. Whenever we compute a performance metric from a backtest, we face the problem of a finite sample size. We can’t know the true value of the performance metric, and the value we computed may or may not be representative of this true value. We may have been simply fooled by randomness into thinking we had a profitable strategy. Put another way, was the strategy’s performance simply due to being in the market at the right time?
There are a number of empirical methods that can be used to address this issue. Chan describes three in his book mentioned above, and there are probably others. I am going to implement the approach described by Lo, Mamaysky and Wang (2000), who simulated sets of trades constraining their quantity in each direction to be the same as in the backtest, and with the same average holding period and distributed randomly over the price series used in the backtest. These random strategies are run a large number of times and a frequency histogram of the performance metric of interest constructed. The strategy’s backtest performance is compared with this histogram to reveal insight into whether it is in fact better than random and does have predictive power.
 One pitfall that springs to mind is curve-fitting bias. If the comparison was done on the same data used to optimize the strategy, you would absolutely expect the strategy to outperform the random trader almost every time; otherwise something probably went wrong in the optimization process. Therefore, this method is valid when used on out of sample data only. The strategy should not have ever ‘seen’ this data before. Violating this principle would very likely lead to overly-optimistic results.
 I think that this method has real value when (and this is why I implemented it) the developer cherry picks a portfolio of strategies depending on their performance in an out of sample test. For example, I optimized a strategy on a dozen different markets separately in the long and short directions. I then tested the portfolio of strategies on out of sample data and selected only those that performed well for the live portfolio. Using the proposed method of benchmarking in this scenario is essentially a counter for the selection bias introduced by cherry picking the strategies for the final portfolio.
 This is the approach I used in the investigation:
  1. Constructed random strategies that mirror the trade distribution and frequency of the original strategy
  2. Ran the random strategy 5,000 times and constructed a histogram of the profit factor
  3. Compared the profit factor of the strategy against the distribution of randomly obtained profit factors
 Here’s a code for a random strategy written for the Zorro trading platform in Lite-C. It can be set up to match the trade frequency and duration of any strategy by modifying the parameters in the switch-case function (controlling trade duration and total number of trades) and the random number generator in the trade function (controlling the frequency of trades).
//Randomly generate entries to match a strategy's simulation period, number of trades and trade duration
#define ASSETLOOP while(asset(loop("GBP/JPY", "GBP/USD", "USD/CAD")))
#define TradesLong AlgoVar[0]
#define TradesShort AlgoVar[1]
function getOpt(string param) {
    switch (Asset) {
        case "GBP/JPY":
        switch (param)
            case "strategyTradesLong"  : return 114;
            case "strategyTradesShort" : return 0;
            case "strategyDuration"     : return 4;
        case "GBP/USD":
        switch (param)
            case "strategyTradesLong"  : return 128;
            case "strategyTradesShort" : return 0;
            case "strategyDuration"     : return 4;
        case "USD/CAD":
        switch (param)
            case "strategyTradesLong"  : return 0;
            case "strategyTradesShort" : return 113;
            case "strategyDuration"     : return 4;
function tradeRandom() {
    var StrategyTradesLong = getOpt("strategyTradesLong"); //Number of trades in the actual strategy
    var StrategyTradesShort = getOpt("strategyTradesShort");
    int StrategyDuration = getOpt("strategyDuration"); //Average trade length in the actual strategy strategy
    ExitTime = StrategyDuration;
    var NumOpen = NumOpenLong + NumOpenShort;
    if (random() > 0) { //prevents trades being taken on the same bars each run of the strategy
    if(NumOpen == 0 and TradesLong < StrategyTradesLong and random() > 0) { //may need to modify this line to get correct random trade frequency
            TradesLong += 1;
        if (NumOpen == 0 and TradesShort < StrategyTradesShort and random() < 0) { //may need to modify this line to get correct random trade frequency
            TradesShort += 1;
function run() {
    BarPeriod = 1440;
    StartDate = 2010; //Set start and end dates to match those from the acutal strategy's simulation
    EndDate = 2014;
    LookBack = 0;
    if (is(INITRUN)) TradesLong = 0;
    if (is(INITRUN)) TradesShort = 0;
    if(Algo == "RAND")

I ran this strategy 5,000 times and extracted the profit factor of each run using this Unix script that I ran with Cygwin.

while [ $i -lt 5000 ]
    cd "C:/~/Zorro"
        ./Zorro -run random_trades
    cd "C:/~/Zorro/Log"
        sed -n 56p random_trades.txt  >> RandomTrades_GBPJPY.txt
        sed -n 57p random_trades.txt  >> RandomTrades_GBPUSD.txt
        sed -n 58p random_trades.txt  >> RandomTrades_USDCAD.txt
    i=`expr $i + 1`

This feels like a crude approach, but it let me move fast, which counts for a lot. In order to use this on a random strategy with different assets, the lines extracted from the performance report would need to be modified accordingly. Zorro can also be programmed to run a certain number of times using the NumTotalCycles parameter. The profit factor metric can then be calculated and stored in a histogram.

In the next post, I’ll explore some results obtained with this approach.

Chan, Ernest, Algorithmic Trading, 2013, John Wiley and Sons
Lo, Andrew, Mamyskey, Harry and Wang, Jiang, 2000, Foundations of Technical Analysis: Computational Algorithms, Statistical Inference, and Empirical Implementation, Journal of Finance, Volume IV, Number 4.

7 thoughts on “Benchmarking backtest results against random strategies”

  1. Hi Robot Master,
    Nice post on the application of random portfolios for strategy evaluation.
    In your example application, you say that you pick the optimal strategy from a range of strategies based on out of sample performance. If this is the case, then you need to compare the statistics of this optimal strategy to the histogram of the ‘optimal’ profit factor.
    A simple way to do this is on each of your 5000 runs, have the same number of random strategies running as in your original portfolio of strategies from which you choose from. Then for each of your 5000 runs, pick the optimal random strategy and record its profit factor. This is the correct distribution to compare your actual optimal strategy’s profit factor against.
    See David Aronson’s 2011 textbook on technical analysis or look up White’s (2000) Reality Check method (see Hansen (2005) for an improved version).

    • Thanks Emlyn! One limitation with the approach I documented above is that in order for it to work, one has to assume that no data mining bias has taken place. This would be almost impossible to achieve in reality.
      If I understood you correctly, picking the optimal random strategy from each of the 5000 runs would be equivalent to Wihte’s bootstrap method? This seems like a much simpler implementation than keeping track of all the tested strategy variants and detrending and sampling from their equity curves, as is my understanding of White’s Reality Check.
      Even if it is not precisely equivalent, I can see that your suggestion is a more robust benchmark to use than the one I wrote about. Thanks for sharing it!

  2. It looks like it does not work. That the message from my Zorro platform:

    ‘tradeRandom’ undeclared identifier

    A1 compiling………..
    Error 055: GBP/JPY no 2010 history (History\GBPJPY.t6)
    Error 055: No bars generated

    • That’s odd…the tradeRandom is defined before main in the code. Maybe try declaring the function as void rather than function – perhaps the latter no longer works in recent Zorro version.


Leave a Comment