Can you apply factors to
trade performance?

Posted on Sep 10, 2019 by Kris Longmore
1 Comment

When tinkering with trading ideas, have you ever wondered whether a certain variable might be correlated with the success of the trade?

For instance, maybe you wonder if your strategy tends to do better when volatility is high?

In this case, you can get very binary feedback by, say, running backtests with and without a volatility filter.

But this can mask interesting insights that might surface if the relationship could be explored in more detail.

Zorro has some neat tools that allow us to associate data of interest with particular trading decisions, and then export that data for further analysis. Here’s how it works:

Zorro implements a TRADE struct for holding information related to a particular position. This struct is a data container which holds information about each trade throughout the life of our simulation. We can also add our own data to this struct via the TRADEVAR array, which we can populate with values associated with a particular trade.

Zorro stores this array, along with all the other information about each and every position, as members of the TRADE struct. We can access the TRADE struct members in two ways: inside a trade management function (TMF) and inside a trade enumeration loop.

Here’s an example of exporting the last estimated volatility at the time a position was entered, along with the return associated with that position (this is a simple, long only moving average cross over strategy, data is loaded from Alpha Vantage):

Example of exporting data from a Zorro simulation.

#define VOL TradeVar[0]

int recordVol(var volatility)
    VOL = volatility;
    return 16;

function run()
    StartDate = 2007;
    EndDate = 2019;
    BarPeriod = 1440;
    LookBack = 200;
    MaxLong = MaxShort = 1;
    string Name;
    while(Name = loop("AAPL", "MSFT", "GOOGL", "IBM", "MMM", "AMZN", "CAT", "CL")) 
        assetHistory(Name, FROM_AV);
        Spread = Commission = Slippage = 0;
        vars close = series(priceClose());
        vars smaFast = series(SMA(close, 10));
        vars smaSlow = series(SMA(close, 50));
        var vol = Moment(series(ROCP(close, 1)), 50, 2);  // rolling 50-period standard deviation of returns
        if(crossOver(smaFast, smaSlow))
            enterLong(recordVol, vol);
        else if(crossUnder(smaFast, smaSlow))
        plot("volatility", vol, NEW, BLUE);
        int count = 0;
        char line[100];
        string filename = "Log\\vol.csv";
            printf("\nFound existing file. Deleting.");
        printf("\n writing vol file...");
        sprintf(line, "Asset, EntryDate, TradeReturn, EntryVol");
        file_append(filename, line);
            sprintf(line, "\n%s, %i, %.6f, %.5f", Asset, ymd(TradeDate), (-2*TradeIsShort+1)*(TradePriceClose-TradePriceOpen)/TradePriceOpen, VOL);
            file_append(filename, line);
        printf("\nTrades: %i", count);

The general pattern for accomplishing this is:

  1. Define a meaningful name for the element of the TradeVar that we’ll use to hold our volatility data (line 5)
  2. Define a Trade Management Function to expose the TRADE struct and use it to assign our variable to our TradeVar (lines 7-12). A return value of 16 tells Zorro to run the TMF only when the position is entered and exited.
  3. Calculate the variable of interest in the Zorro script. Here we calculate the rolling 50-day standard deviation of returns (line 34).
  4. Pass the TMF and the variable of interest to Zorro’s enter function (line 38).
  5. In the EXITRUN (the last thing Zorro does after finishing a simulation), loop through all the positions using a trade enumeration loop and write the details, along with the volatility calculated just prior to entry, to a csv file.

Running this script results in a small csv file being written to Zorro’s Log folder. A sample of the data looks like this:

Once we’ve got that data, we can easily read it into our favourite data analysis tool for a closer look. Here, I’ll read it into R and use the tidyverse libraries to dig deeper. (This will be very cursory. You could and should go a lot deeper if this were a serious strategy.)

First, read the data in, and process it by adding a couple of columns that might be interesting:


# analysis of gap size and trade profit
path <- "C:/Zorro/Log/"
file <- "vol.csv" 
df <- read.csv(paste0(path, file), header=TRUE, stringsAsFactors=FALSE, strip.white=TRUE)

# make some additional columns
df$AbsTradeReturn <- abs(df$TradeReturn)
df$Result <- factor(ifelse(df$TradeReturn>0, "win", "loss"))

If we head the resulting data frame, we find that it looks like this:

#    Asset EntryDate TradeReturn EntryVol AbsTradeReturn Result
# 1  MSFT  20190906   -0.011359  0.00019       0.011359   loss
# 2  MSFT  20190904    0.017583  0.00020       0.017583    win
# 3  MSFT  20190829   -0.015059  0.00020       0.015059   loss
# 4    CL  20190828   -0.010946  0.00017       0.010946   loss
# 5  MSFT  20190819   -0.036269  0.00017       0.036269   loss
# 6 GOOGL  20190712    0.052325  0.00023       0.052325    win

Sweet! Looks like we’re in business!

Now we can start to answer some interesting questions. First, is volatility at the time of entry related to the magnitude of the trade return? Intuitively we’d expect this to be the case, as higher volatility implies larger price swings and therefore larger absolute trade returns:

# is volatility related to the magnitude of the trade return?
ggplot(data=df[, c("AbsTradeReturn", "EntryVol")], aes(x=EntryVol, y=AbsTradeReturn)) +
  geom_point(alpha=0.4) +
  geom_smooth(method="lm", se=TRUE)

Nice! Just what we’d expect to see.

Does this relationship hold for each individual asset that we traded?

# what about by asset?
ggplot(data=df[, c("AbsTradeReturn", "EntryVol", "Asset")], aes(x=EntryVol, y=AbsTradeReturn)) +
  geom_point(alpha=0.4) +
  geom_smooth(method="lm", se=TRUE) +

Looks like the relationship generally holds at the asset level, but note that we have a small sample size so take the results with a grain of salt:

# note that we have a small sample size:
df %>%
  group_by(Asset) %>%

# Asset     n
# <chr> <int>
# 1 AAPL     36
# 2 AMZN     30
# 3 CAT      38
# 4 CL       47
# 5 GOOGL    36
# 6 IBM      37
# 7 MMM      32
# 8 MSFT     37

Is volatility related to the actual trade return?

# is volatility related to the actual trade return?
ggplot(data=df[, c("TradeReturn", "EntryVol")], aes(x=EntryVol, y=TradeReturn)) +
  geom_point(alpha=0.4) +
  geom_smooth(method="lm", se=TRUE)

Looks like it might be. But this was a long-only strategy that made money in a period where everything went up, so I wouldn’t read too much into this without controlling for that effect.

Is there a significant difference in the entry volatility for winning and losing trades?

# what's the spread of volatility for winning and losing trades?
ggplot(data=df[, c("EntryVol", "Result")], aes(x=Result, y=EntryVol)) +

Finally, we can treat our volatility variable as a “factor” to which our trade returns are exposed. Is this factor useful in predicting trade returns?

First, we’ll need some functions for bucketing our trade results by factor quantile:

# factor functions
get_factor_quantiles <- function(factor_df, n_quantiles = 5, q_type = 7) {
  n_assets <- factor_df %>% ungroup %>% select(Asset) %>% n_distinct()
  factor_df %>%
    mutate(rank = rank(factor, ties.method='first'),
           quantile = get_quantiles(factor, n_quantiles, q_type))

get_quantiles <- function(factors, n_quantiles, q_type) {
  cut(factors, quantile(factors, seq(0,1,1/n_quantiles), type = q_type), FALSE, TRUE)

If we bucket our results by factor quantile, do any buckets account for significantly more profit and loss? Are there any other interesting relationships?

# if we bucket the vol, do any buckets account for more profit/loss?
factor_df <- df[, c("Asset", "EntryVol", "TradeReturn")]
names(factor_df)[names(factor_df) == "EntryVol"] <- "factor"

quantiles <- get_factor_quantiles(factor_df)
r <- quantiles %>%
  group_by(quantile) %>%

ggplot(data=r, aes(quantile, MeanTradeReturn)) +
  geom_col() +
  ggtitle("Returns by volatility quantile")

Looks like there might be something to that fifth quantile (but of course beware the small sample size).

We can retrieve the cutoff value for the fifth quantile by sorting our factor and finding the value four-fifths the length of the resulting vector:

# fifth bin cutoff
sorted <- sort(factor_df$factor)
# [1] 0.00043


There you have it. This was a simple example of exporting potentially relevant data from a Zorro simulation and reading it into a data analysis package for further research.

How might you apply this approach to more serious strategies? What data do you think is potentially relevant? Tell us your thoughts in the comments.

Want a broader understanding of algorithmic trading? See why it’s the only sustainable approach to profiting from the markets and how you can use it to your success inside the free Algo Basics PDF below….

Get the free PDF instantly

Want to see how we use algos to trade for a living — so you can too?

Discover how systematic retail traders generate profit long-term, and where to start:

Comment (1)

September 13, 2019 at 12:26 pm

Hi ,
I ‘m thinking like this ; 1-calculate the maximum favourable excursion minus maximum adverse excursion and high minus low for each bar.Then use mutual information to measure the dependence.2-Apply the same method for each closed trade.Is this possible?Does it make sense?

Leave a Comment