When tinkering with trading ideas, have you ever wondered whether a certain variable might be correlated with the success of the trade?
For instance, maybe you wonder if your strategy tends to do better when volatility is high?
In this case, you can get very binary feedback by, say, running backtests with and without a volatility filter.
But this can mask interesting insights that might surface if the relationship could be explored in more detail.
Zorro has some neat tools that allow us to associate data of interest with particular trading decisions, and then export that data for further analysis. Here’s how it works:
Zorro implements a TRADE
struct for holding information related to a particular position. This struct is a data container which holds information about each trade throughout the life of our simulation. We can also add our own data to this struct via the TRADEVAR
array, which we can populate with values associated with a particular trade.
Zorro stores this array, along with all the other information about each and every position, as members of the TRADE
struct. We can access the TRADE
struct members in two ways: inside a trade management function (TMF) and inside a trade enumeration loop.
Here’s an example of exporting the last estimated volatility at the time a position was entered, along with the return associated with that position (this is a simple, long only moving average cross over strategy, data is loaded from Alpha Vantage):
/* Example of exporting data from a Zorro simulation. */ #define VOL TradeVar[0] int recordVol(var volatility) { VOL = volatility; return 16; } function run() { set(PLOTNOW); StartDate = 2007; EndDate = 2019; BarPeriod = 1440; LookBack = 200; MaxLong = MaxShort = 1; string Name; while(Name = loop("AAPL", "MSFT", "GOOGL", "IBM", "MMM", "AMZN", "CAT", "CL")) { assetHistory(Name, FROM_AV); asset(Name); Spread = Commission = Slippage = 0; vars close = series(priceClose()); vars smaFast = series(SMA(close, 10)); vars smaSlow = series(SMA(close, 50)); var vol = Moment(series(ROCP(close, 1)), 50, 2); // rolling 50-period standard deviation of returns if(crossOver(smaFast, smaSlow)) { enterLong(recordVol, vol); } else if(crossUnder(smaFast, smaSlow)) { exitLong(); } plot("volatility", vol, NEW, BLUE); } if(is(EXITRUN)) { int count = 0; char line[100]; string filename = "Log\\vol.csv"; if(file_length(filename)) { printf("\nFound existing file. Deleting."); file_delete(filename); } printf("\n writing vol file..."); sprintf(line, "Asset, EntryDate, TradeReturn, EntryVol"); file_append(filename, line); for(closed_trades) { sprintf(line, "\n%s, %i, %.6f, %.5f", Asset, ymd(TradeDate), (-2*TradeIsShort+1)*(TradePriceClose-TradePriceOpen)/TradePriceOpen, VOL); file_append(filename, line); count++; } printf("\nTrades: %i", count); } }
The general pattern for accomplishing this is:
- Define a meaningful name for the element of the
TradeVar
that we’ll use to hold our volatility data (line 5) - Define a Trade Management Function to expose the
TRADE
struct and use it to assign our variable to ourTradeVar
(lines 7-12). A return value of 16 tells Zorro to run the TMF only when the position is entered and exited. - Calculate the variable of interest in the Zorro script. Here we calculate the rolling 50-day standard deviation of returns (line 34).
- Pass the TMF and the variable of interest to Zorro’s
enter
function (line 38). - In the
EXITRUN
(the last thing Zorro does after finishing a simulation), loop through all the positions using a trade enumeration loop and write the details, along with the volatility calculated just prior to entry, to a csv file.
Running this script results in a small csv file being written to Zorro’s Log folder. A sample of the data looks like this:
Once we’ve got that data, we can easily read it into our favourite data analysis tool for a closer look. Here, I’ll read it into R and use the tidyverse
libraries to dig deeper. (This will be very cursory. You could and should go a lot deeper if this were a serious strategy.)
First, read the data in, and process it by adding a couple of columns that might be interesting:
library(ggplot2) library(tidyverse) # analysis of gap size and trade profit path <- "C:/Zorro/Log/" file <- "vol.csv" df <- read.csv(paste0(path, file), header=TRUE, stringsAsFactors=FALSE, strip.white=TRUE) # make some additional columns df$AbsTradeReturn <- abs(df$TradeReturn) df$Result <- factor(ifelse(df$TradeReturn>0, "win", "loss"))
If we head
the resulting data frame, we find that it looks like this:
head(df) # Asset EntryDate TradeReturn EntryVol AbsTradeReturn Result # 1 MSFT 20190906 -0.011359 0.00019 0.011359 loss # 2 MSFT 20190904 0.017583 0.00020 0.017583 win # 3 MSFT 20190829 -0.015059 0.00020 0.015059 loss # 4 CL 20190828 -0.010946 0.00017 0.010946 loss # 5 MSFT 20190819 -0.036269 0.00017 0.036269 loss # 6 GOOGL 20190712 0.052325 0.00023 0.052325 win
Sweet! Looks like we’re in business!
Now we can start to answer some interesting questions. First, is volatility at the time of entry related to the magnitude of the trade return? Intuitively we’d expect this to be the case, as higher volatility implies larger price swings and therefore larger absolute trade returns:
# is volatility related to the magnitude of the trade return? ggplot(data=df[, c("AbsTradeReturn", "EntryVol")], aes(x=EntryVol, y=AbsTradeReturn)) + geom_point(alpha=0.4) + geom_smooth(method="lm", se=TRUE)
Nice! Just what we’d expect to see.
Does this relationship hold for each individual asset that we traded?
# what about by asset? ggplot(data=df[, c("AbsTradeReturn", "EntryVol", "Asset")], aes(x=EntryVol, y=AbsTradeReturn)) + geom_point(alpha=0.4) + geom_smooth(method="lm", se=TRUE) + facet_wrap(~Asset)
Looks like the relationship generally holds at the asset level, but note that we have a small sample size so take the results with a grain of salt:
# note that we have a small sample size: df %>% group_by(Asset) %>% count() # Asset n # <chr> <int> # 1 AAPL 36 # 2 AMZN 30 # 3 CAT 38 # 4 CL 47 # 5 GOOGL 36 # 6 IBM 37 # 7 MMM 32 # 8 MSFT 37
Is volatility related to the actual trade return?
# is volatility related to the actual trade return? ggplot(data=df[, c("TradeReturn", "EntryVol")], aes(x=EntryVol, y=TradeReturn)) + geom_point(alpha=0.4) + geom_smooth(method="lm", se=TRUE)
Looks like it might be. But this was a long-only strategy that made money in a period where everything went up, so I wouldn’t read too much into this without controlling for that effect.
Is there a significant difference in the entry volatility for winning and losing trades?
# what's the spread of volatility for winning and losing trades? ggplot(data=df[, c("EntryVol", "Result")], aes(x=Result, y=EntryVol)) + geom_boxplot()
Finally, we can treat our volatility variable as a “factor” to which our trade returns are exposed. Is this factor useful in predicting trade returns?
First, we’ll need some functions for bucketing our trade results by factor quantile:
# factor functions get_factor_quantiles <- function(factor_df, n_quantiles = 5, q_type = 7) { n_assets <- factor_df %>% ungroup %>% select(Asset) %>% n_distinct() factor_df %>% mutate(rank = rank(factor, ties.method='first'), quantile = get_quantiles(factor, n_quantiles, q_type)) } get_quantiles <- function(factors, n_quantiles, q_type) { cut(factors, quantile(factors, seq(0,1,1/n_quantiles), type = q_type), FALSE, TRUE) }
If we bucket our results by factor quantile, do any buckets account for significantly more profit and loss? Are there any other interesting relationships?
# if we bucket the vol, do any buckets account for more profit/loss? factor_df <- df[, c("Asset", "EntryVol", "TradeReturn")] names(factor_df)[names(factor_df) == "EntryVol"] <- "factor" quantiles <- get_factor_quantiles(factor_df) r <- quantiles %>% group_by(quantile) %>% summarise(MeanTradeReturn=mean(TradeReturn)) ggplot(data=r, aes(quantile, MeanTradeReturn)) + geom_col() + ggtitle("Returns by volatility quantile")
Looks like there might be something to that fifth quantile (but of course beware the small sample size).
We can retrieve the cutoff value for the fifth quantile by sorting our factor and finding the value four-fifths the length of the resulting vector:
# fifth bin cutoff sorted <- sort(factor_df$factor) sorted[as.integer(4/5*length(sorted))] # [1] 0.00043
Conclusion
There you have it. This was a simple example of exporting potentially relevant data from a Zorro simulation and reading it into a data analysis package for further research.
How might you apply this approach to more serious strategies? What data do you think is potentially relevant? Tell us your thoughts in the comments.
Want a broader understanding of algorithmic trading? See why it’s the only sustainable approach to profiting from the markets and how you can use it to your success inside the free Algo Basics PDF below….
[thrive_leads id=’10392′]
Hi ,
I ‘m thinking like this ; 1-calculate the maximum favourable excursion minus maximum adverse excursion and high minus low for each bar.Then use mutual information to measure the dependence.2-Apply the same method for each closed trade.Is this possible?Does it make sense?