The post Kalman Filter Pairs Trading with Zorro and R appeared first on Robot Wealth.

]]>- Implemented a Kalman filter in R
- Implemented a simple pairs trading algorithm in Zorro
- Connected Zorro and R and exchanged data between the two platforms

In this fourth and final post, we’re going to put it all together and develop a pairs trading script that uses Zorro for all the simulation aspects (data handling, position tracking, performance reporting and the like) and our Kalman implementation for updating the hedge ratio in real-time.

You can download the exact script used in this post for free down at the very bottom. Let’s go!

Encapsulating our Kalman routine in a function makes it easy to call from our Zorro script – it reduces the call to a single line of code.

Save the following R script, which implements the iterative Kalman operations using data sent from Zorro, in your Zorro strategy folder:

###### KALMAN FILTER ####### delta <- 0.0001 Vw <- delta/(1-delta)*diag(2) Ve <- 0.01 R <- matrix(rep(0, 4), nrow=2) P <- matrix(rep(0, 4), nrow=2) kalman_iterator <- function(y, x, beta) { beta <- matrix(c(beta, 0), nrow=1) x <- matrix(c(x, 1), nrow=1) R <<- P + Vw # state cov prediction y_est <- x[1, ] %*% beta[1, ] # measurement prediction Q <- x[1, ] %*% R %*% x[1, ] + Ve # measurement variance prediction # error between observation of y and prediction e <- y - y_est K <- R %*% t(x) / drop(Q) # Kalman gain # state update beta <- beta[1, ] + K * e[1, ] P <<- R - K %*% x[1, ] %*% R return(list(beta[1], e, Q)) }

Recall that this implementation of the Kalman filter is *almost* parameterless. There are however two parameters that impact the speed at which the hedge ratio is updated by the Kalman algorithm,

deltain line 3 and

Vein line 5.

You can experiment with these parameters, but note that changes here will generally require changes in the Zorro script, such as the spacing between trade levels (more on this below).

Experimentation is a good thing (it’s useful to understand how these parameters impact the algorithm), but a nice, stable pair trade should be relatively robust to changes in these parameters. A pair that depends on just the right values of these parameters is one I’d think twice about trading.

Having said that, a sensible use of these parameters is to adjust the trade frequency of your pairs in line with transaction costs and risk management approach (not to optimise the strategy’s backtested performance).

Here’s our simple pairs trading script modified to call the Kalman iterator function to update the hedge ratio. To experiment with this Zorro script you’ll need:

- an Alpha Vantage API key (we load price history directly from Alpha Vantage)
- to set up trading conditions in a Zorro assets list (although if you don’t want to model costs, you don’t need to do this)

/* KALMAN PAIRS TRADING */ #include <r.h> #define Asset1 "GDX" #define Asset2 "GLD" #define MaxTrades 1 #define Spacing 1 // #define COSTS int Portfolio_Units = 1000; //units of the portfolio to buy/sell (more --> better fidelity to hedge ratio) var calculate_spread(var hedge_ratio) { var spread = 0; asset(Asset1); #ifndef COSTS Spread = Commission = Slippage = 0; #endif spread += priceClose(); #ifndef COSTS Spread = Commission = Slippage = 0; #endif asset(Asset2); spread -= hedge_ratio*priceClose(); return spread; } function run() { set(PLOTNOW); setf(PlotMode, PL_FINE); StartDate = 20060525; EndDate = 2019; BarPeriod = 1440; LookBack = 1; MaxLong = MaxShort = MaxTrades; // --------------------------------------- // Startup and data loading // --------------------------------------- if(is(INITRUN)) { // start R and source the kalman iterator function if(!Rstart("kalman.R", 2)) { print("Error - can't start R session!"); quit(); } // load data from Alpha Vantage string Name; int n = 0; while(Name = loop(Asset1, Asset2)) { assetHistory(Name, FROM_AV); n++; } } // --------------------------------------- // calculate hedge ratio and trade levels // --------------------------------------- asset(Asset1); #ifndef COSTS Spread = Commission = Slippage = 0; #endif vars prices1 = series(priceClose()); asset(Asset2); #ifndef COSTS Spread = Commission = Slippage = 0; #endif vars prices2 = series(priceClose()); static var beta; if(is(INITRUN)) beta = 0; // use kalman iterator to calculate paramters Rset("y", prices1[0]); Rset("x", prices2[0]); Rset("beta", beta); Rx("kalman <- kalman_iterator(y, x, beta)"); beta = Rd("kalman[[1]][1]"); vars e = series(Rd("kalman[[2]]")); var Q = Rd("kalman[[3]]"); // set up trade levels var Levels[MaxTrades]; int i; for(i=0; i<MaxTrades; i++) { Levels[i] = (i+1)*Spacing*sqrt(Q); } // --------------------------------------- // trade logic // --------------------------------------- // enter positions at defined levels for(i=0; i<MaxTrades; i++) { if(crossUnder(e, -Levels[i])) { asset(Asset1); Lots = Portfolio_Units; enterLong(); asset(Asset2); Lots = Portfolio_Units * beta; enterShort(); } if(crossOver(e, Levels[i])) { asset(Asset1); Lots = Portfolio_Units; enterShort(); asset(Asset2); Lots = Portfolio_Units * beta; enterLong(); } } // exit positions at defined levels for(i=1; i<MaxTrades-1; i++) { if(crossOver(e, -Levels[i])) { asset(Asset1); exitLong(0, 0, Portfolio_Units); asset(Asset2); exitShort(0, 0, Portfolio_Units * beta); } if(crossUnder(e, Levels[1])) { asset(Asset1); exitShort(0, 0, Portfolio_Units); asset(Asset2); exitLong(0, 0, Portfolio_Units * beta); } } // --------------------------------------- // plots // --------------------------------------- plot("beta", beta, NEW, PURPLE); if(abs(e[0]) < 20) { plot("error", e, NEW, BLUE); int i; for(i=0; i<MaxTrades; i++) { plot(strf("#level_%d", i), Levels[i], 0, BLACK); plot(strf("#neglevel_%d", i), -Levels[i], 0, BLACK); } } }

Like in our original vectorised backtest, this strategy is always in the market, simply entering a long position when the prediction error of the Kalman filter drops below its minus one standard deviation level and holding it until the prediction error crosses above its plus one standard deviation level, at which point the trade is reversed and a short position held.

This is not the optimal way to trade a spread, so we’ve left the door open to trade at multiple levels (line 9) with a user-specified spacing between levels (line 10).

*But before we get to that, there’s an important box we need to tick…*

Before we go further, we’ll aim to reproduce the results we got in the vectorised backtest we wrote in R way back in the first post of this series. That way, we can validate that our Zorro setup is working as expected.

This is an important (and easily overlooked) step because we’ll surely tinker with the strategy implementation (Zorro is *really *useful for efficiently doing that sort of experimentation), and we need to have confidence in our setup before we make any changes, do further research, and make decisions based on what we find.

*If you’ve ever had to rewind a whole bunch of research because of a faulty implementation at the outset, you know what I’m talking about….*

We’d expect some differences since Zorro provides an event-driven sequential backtester with very different assumptions to my hacky vectorised backtest. But we should see consistency in the hedge ratio, the positions taken, and the shape of the equity curve.

Here’s the Zorro output when we trade at one standard deviation of the prediction errors:

The hedge ratio, prediction errors, positions and equity curve shape all look very similar to the original vectorised R version.

We also ran a more aggressive version through our vectorised backtester, which traded at half a standard deviation of the prediction errors. Here’s what that looks like in Zorro (simply change line 10 to `#define Spacing 0.5`

):

Again, virtually identical to the output of our vectorised backtest.

*I’m calling that a win. Time to move on to some fun stuff. *

There are a bunch of things we can try with our pairs trading implementation. A few of them include:

- Exiting positions when the prediction error crosses zero
- Limiting the hold time of individual positions (that is, closing out early if the spread hasn’t converged fast enough)
- Entering at multiple levels
- Using more or less aggressive entry level spacing

Here’s an example of trading quite aggressively every 0.25 standard deviations of the prediction error, up to a maximum of eight levels:

Of course, when you trade like this you’re going to pay a ton in fees. But it gives you a taste of the sorts of things you can experiment with using this framework.

This concludes our mini-series on pairs trading with Zorro and R via the Kalman filter. We saw how you might:

- Implement the Kalman filter in R
- Implement a pairs trading algorithm in Zorro
- Make Zorro and R talk to one another
- Put it all together in an integrated pairs trading strategy

We’d love to know what you thought of the series in the comments. In particular, can you suggest any pairs you’d like to see us test? Can you suggest any improvements to the pairs trading algorithm itself? Are there any other approaches you’d like us to implement or test?

Thanks for reading!

The post Kalman Filter Pairs Trading with Zorro and R appeared first on Robot Wealth.

]]>The post Integrating R with the Zorro Backtesting and Execution Platform appeared first on Robot Wealth.

]]>The goal is to get the best of both worlds and use our dynamic hedge ratio within the Zorro script.

Rather than implement the Kalman filter in Lite-C, it’s much easier to make use of Zorro’s R bridge, which facilitates easy communication between the two applications. In this post, we’ll provide a walk-through of configuring Zorro and R to exchange data with one another.

While Zorro and R are useful as standalone tools, they have different strengths and weaknesses.

Zorro was built to simulate trading strategies, and it does this very well. It’s fast and accurate. It lets you focus on your strategies by handling the nuts and bolts of simulation behind the scenes. It implements various tools of interest to traders, such as portfolio optimization and walk-forward analysis, and was designed to prevent common bugs, like lookahead bias.

Zorro does a lot, **but it can’t do everything.**

An overlooked aspect of the software is its ability to integrate R and its thousands of add-on libraries. From machine learning and artificial intelligence to financial modeling, optimization, and graphics, R packages have been developed to cover all these fields and more. And since R is widely used in academia, when a researcher develops a new algorithm or tool it is often implemented as an open source R package long before it appears in commercial or other open-source software.

Zorro’s R bridge unlocks these tools for your trading applications and combines them with Zorro’s fast and accurate market simulation features.

In this post, I’ll show you how to set up and use Zorro’s R bridge. Once that’s out of the way, we’ll be in a position to put all the pieces together and run a simulation of our pairs trade that uses the Kalman filter we wrote for R.

Zorro’s R bridge is designed to enable a Zorro script to control and communicate with an R environment running on the same machine. The assumption is that the user will want to send market data (sometimes lots of it) from Zorro to R for processing, and then return the output of that processing, usually consisting of just one or a small number of results, back to Zorro.

Lite-C is generally much faster than R code, so it’s preferable to perform as much computation on the Zorro side as possible, reserving R for computations that are difficult or inconvenient to implement in Zorro. Certainly, you’ll want to avoid doing any looping in R. Having said that, vector and matrix operations are no problem for R, and might even run quicker than in Lite-c.

Zorro orders time series data differently to most platforms – newest elements *first*. R’s functions generally expect time series with newest elements *last*. Fortunately Zorro implements the `rev`

function for reversing the order of a time series, which we’ll need to use prior to sending data across to R. I’ll show you an example of how this works.

Finally, debugging R bridge functions requires a little care. For example, executing an R statement with a syntax error from Zorro will cause the R session to fall over and subsequent commands to also fail – sometimes silently. For basic debugging, you can return R output to Zorro’s GUI or use a debugging tool, as well as use an R bridge function for checking that the R sessions is still “alive” (more on these below). But it always pays to execute R commands in the R console before setting them loose from a Lite-C script.

Assuming you have Zorro installed, here’s a walk-through of configuring Zorro and R to talk to one another.

Go to http://cran.r-project.org. and install R.

Open *Zorro/Zorro.ini* (or *Zorro/ZorroFix.ini* if using the persistent version of the configuration file) and enter the path to *RTerm.exe* for the `RTermPath`

variable. This tells Zorro how to start an R session.

Here’s an example of the location of *RTerm.exe*:

And how the `RTermPath`

setting in *Zorro.ini* might look:

Of course, the path to *RTerm.exe* will be specific to your machine.

In *Zorro/Strategy*, you’ll find a script named *RTest.c*. Open a Zorro instance, select this script, and press *Test*. If R is installed correctly and your *Zorro.ini* settings are correct, you should get output that looks like this:

If that worked as expected, then you’re ready to incorporate R functionality in your Zorro scripts. If the test script failed, most likely the path specified in *Zorro.ini* is incorrect.

*Next, we’ll run through a brief tutorial with examples on using the R bridge functions. *

`r.h`

header file to a Zorro scriptTo use the R bridge in your script, you need to include the r.h header file. Simply add this line at the beginning of your Zorro script:

#include <r.h>

In order to use the other R bridge functions run

Rstart()in the Zorro script’s

INITRUN. Here’s the function’s general form:

Rstart(string source, int debuglevel)

Both parameters are optional.

sourceis an R file that is sourced at start up, and loads any predefined functions, variables or data that your R session will use.

We can also specify the optional

debuglevelargument, which takes an integer value of either 0, 1, or 2 (0 by default) defining the verbosity of any R output, such as errors and warnings:

**0:**output fatal errors only**1:**output fatal errors, warnings and notes**2:**output every R message (this is like the output you see in the R console).

You can use Microsoft’s Debug View tool to see the output of the R session. There’s a more convenient way to display the output of the R session directly in the Zorro GUI too – more on this shortly.

Rstart()returns zero if the R session could not be started, and returns non-zero otherwise. Therefore, we can use

Rstart()to check that the R session started.

This next script attempts to start a new R session via

Rstart(), but raises the alarm and quits if unsuccessful.

#include <r.h> function run() { if(!Rstart("", 2)) { print("Error - could not start R session!"); quit(); } }

Rrun()checks the status of the R session and returns 0 if the session has been terminated or has failed, 1 if the session is ready for input, and 2 if the session is busy with a computation or operation. Use it regularly!

The R session will terminate upon encountering any fatal error (which can arise from a syntax error, unexpected data, and other issues that can arise in real time). But here’s the thing:* if the R session is terminated, the R bridge simply stops sending messages and silently ignores further commands. *

That means that your script will only throw an error if some Lite-C computation depends on data that wasn’t received back from the R bridge.

It’s a bad idea to assume that this will be picked up, so use

Rrun()to check the status of your R connection – typically you’ll want to do this at the end of every bar in a backtest, and possibly prior to critical computations, raising an appropriate error when a failure is detected.

The script below builds on the previous example to also include a call to

Rrun()every bar:

#include <r.h> function run() { if(!Rstart("", 2)) { print("Error - could not start R session!"); quit(); } if(!Rrun()) { print("Error - R session has been terminated!"); quit(); } }

Rx(string code, int mode)is a powerful function – it enables the execution of a line of R code directly from a Lite-C script. We simply provide the R code as a string (the

codeargument, which can be up to 1,000 characters in length). Optionally, we can provide

modewhich specifies how

Rx()passes control back to Zorro during execution of

codein R.

Normally, the Zorro GUI is unresponsive while the R bridge is busy;

modecan modify this behaviour. It takes the following values:

**0:**Execute code synchronously (that is, freeze Zorro until the computation is finished). This is the default behaviour.**1:**Execute code asynchronously, returning immediately and continuing to execute the Lite-C script. Since the R bridge can only handle one request at a time, you’ll need to useRrun()

to determine when the next command can be sent to the R session. This is useful when you want to run R and Zorro computations in parallel.**2:**Execute code asynchronously, enabling the user to access the Zorro GUI buttons, and returning 1 whencode

has finished executing and 0 when an error is encountered or the**[Stop]**button on the Zorro GUI is pressed. This is useful when your R computations take a long time, and you think you might want to interrupt them with the**[Stop]**button.**3:**Execute code asynchronously, likemode = 2

, but also printing R output to Zorro’s message window. The verbosity of this output is controlled by thedebuglevel

argument toRstart()

; in order to output everything (that is, mimic the output of the R console), setdebuglevel

to 2. This is a convenient alternative to using the Debug Tool mentioned above.

Here’s a script that runs two lines of R code: one line that generates a vector of random normal numbers and calculates its mean; and another that prints the mean, returning the value to the Zorro GUI.

#include <r.h> function run() { if(!Rstart("", 2)) //enable verbose output { print("Error - could not start R session!"); quit(); } Rx("x <- mean(rnorm(100, 0, 1))", 0); //default mode: wait until R code completes Rx("print(x)", 3); //execute asynchronously, print output to debug view and Zorro GUI window if(!Rrun()) { print("Error - R session has been terminated!"); quit(); } }

Here’s the output in the Zorro GUI:

You can see that with every iteration of the `run`

function, Zorro tells R to generate a new vector of random numbers – hence the changing mean.

To send data from Zorro to R, use

Rset(string name, data_type, data_length).

On the R side, the data will be stored in a variable named

name.

The actual usage of

Rset()depends on what type of data is being sent from Zorro: a single int, a single float, or an array (or series) of float type variables. The latter can be sent to R as either a vector or a two-dimensional matrix.

When sending a single int or float to R, we simply specify the name of that variable.

For sending arrays, we need to specify a pointer to the array and either the number of elements (for sending the array to R as a vector) or the number of rows and columns (for sending the array to R as a matrix).

Specifying a pointer is not as scary as it sounds; in Lite-C we can simply use the name of the array or series, as these are by definition pointers to the actual variables.

Here are some examples of sending the different data types from Zorro to R:

#include <r.h> function run() { if(!Rstart("", 2)) //enable verbose output { print("Error - could not start R session!"); quit(); } // make some variables int today = dow(); var last_return = 0.003; var my_params[5] = {2.5, 3.0, 3.5, 4.0, 4.5}; // send those variables to R Rset("my_day", today); Rset("last_ret", last_return); Rset("params", my_params, 5); //specify number of elements // operate on those variables in the R session Rx("if(my_day == 1) x <- last_ret * params[1] else x <- 0", 0); //note params[1] is my_params[0] due to R's 1-based indexing and C's 0-based Rx("print(x)", 3); if(!Rrun()) { print("Error - R session has been terminated!"); quit(); } }

In lines 11-14, we create some arbitrary variables named

today(an int),

last_return(a float) and

my_params(an array of float). In lines 16-19, we send those variables to the R session, assigning them to R objects named

my_day,

last_ret, and

paramsrespectively. When we send the array

my_paramsto the R session, we have to specify the number of elements in the array.

In line 22, we perform an operation on the variables in our R session. Note that R’s indexing is one-based, while C’s is zero-based, so if we want to access the value associated with

my_params[0]in the R session, we need to use

params[1].

Here’s an example of the output:

Sending price data (or other time series, such as returns, indicators, and the like) follows a process like the one shown above, but there are one or two issues you need to be aware of.

First, during the lookback period, the values of such time series are undefined. Sending an undefined value via the R bridge will cause a fatal error and the subsequent termination of the R session. To get around this issue, we can wrap our calls to

Rset()in an

ifcondition which evaluates to

Trueoutside the lookback period:

if(!is(LOOKBACK)).

The other problem is that Zorro’s time series are constructed with the newest values *first*. R functions expect time series data in chronological order with the newest elements *last*. That means that we need to reverse the order of our Zorro time series before sending them to R.

This is fairly painless, since Zorro implements the

rev()function for that very purpose. Simply provide

rev()with the time series to be reversed, and optionally the number of values to be sent to R (if this argument is omitted,

LookBackvalues are used instead).

Here’s an example of sending price data to R that deals with these two issues:

#include <r.h> function run() { if(!Rstart("", 2)) //enable verbose output { print("Error - could not start R session!"); quit(); } vars Close = series(priceClose()); int size = 20; vars revClose = rev(Close, size); if(!is(LOOKBACK)) { printf("\n#########\nZorro's most recent close:\n%.5f", Close[0]); Rset("closes", revClose, size); Rx("last_close <- round(closes[length(closes)],5)", 0); printf("\nR's most recent close:\n"); Rx("print(last_close, 5)", 3); } if(!Rrun()) { print("Error - R session has been terminated!"); quit(); } }

Here’s an example of the output:

The three functions

Ri(), Rd(), Rv()evaluate a given R expression, much like

Rx(), but they return the result of the expression back to the Zorro session as either an int, float, or vector respectively. We can supply any variable, valid R code or function to

Ri(), Rd(), Rv(), so long as it evaluates to the correct variable type.

Ri()and

Rd()work in much the same way: we only need to supply an R expression as a string, and the functions return the result of the expression. This means that in the Lite-C script, we can set a variable using the output of

Ri()or

Rd().

For example, to define the variable

my_varand use it to store the mean of the R vector

my_data, we would do:

var my_var = Rd("mean(my_data)");

Rv()works in a slightly different way. We supply as arguments an R expression that evaluates to a vector, and we also supply a pointer to the Lite-C var array to be filled with the results of the R expression. We also supply the number of elements in the vector.

Here’s an example where we fill the float array

my_vectorwith the output of R’s

rnorm()function (which produces a vector of normally distributed random variables of a given length, mean and standard deviation):

var my_vector[10]; Rv("rnorm(10, 0, 1)", my_vector, 10);

Here’s an example where we put both of these together – we populate a vector in our Lite-C script with some random numbers generated in R. Then we send that vector back to R to calculate the mean before printing the results. Of course this is a very convoluted way to get some random numbers and their mean, but it illustrates the point:

#include <r.h> function run() { if(!Rstart("", 2)) //enable verbose output { print("Error - could not start R session!"); quit(); } var my_vector[10]; // initialise array of float if(!is(LOOKBACK)) { Rv("rnorm(10, 0, 1)", my_vector, 10); Rset("my_data", my_vector, 10); var my_mean = Rd("mean(my_data)"); int i; printf("\n#################"); for(i=0; i<10; i++) { printf("\nmy_vector[%i]: %.3f", i, my_vector[i]); } printf("\nmean: %.3f", my_mean); } if(!Rrun()) { print("Error - R session has been terminated!"); quit(); } }

And here’s the output:

The intent of Zorro’s R bridge is to:

- Facilitate the sending of large amounts of data from Zorro to R,
- Enable analysis of this data in R by executing R code from the Lite-C script, and
- Return single numbers or vectors from R to Zorro.

With that in mind, it makes sense to do as much of the data acquisition, cleaning and processing on the Lite-C side as possible. Save the R session for analysis that requires the use of specialized packages or functions not available in Zorro.

In particular, avoid executing loops in R *(these can be painfully slow). *But if operations can be vectorized, they may be more efficiently performed in R.

It is wise to test the R commands you supply to

Rx(),

Ri(),

Rd(), and

Rv()in an R console prior to running them in a Lite-C script. Any syntax error or bad data will cause the R session to terminate and all subsequent R commands to fail – potentially without raising a visible error. For that reason, use the

Rrun()function regularly (at least once per bar) and keep on eye on the Debug View tool’s output, or the Zorro GUI.

A frozen Zorro instance is often indicative of an incomplete R command, such as a missing bracket. Such a mistake will not throw an error, but R will wait for the final bracket, causing Zorro to freeze.

Another common error is to attempt to load an R package that hasn’t been installed. This will cause the R session to terminate, so make sure your required packages are all installed before trying to load them. The source of the resulting error may not be immediately obvious, so keep an eye on the Debug View tool’s output.

Depending on your setup, *the packages available to your R terminal may not be the same as those available in your R Studio environment* (if you’re using that particular IDE).

Here’s a short R script that specifies some arbitrary required packages, checks if they are installed, and attempts to install them from CRAN if they are not already installed:

required.packages <- c('deepnet', 'caret', 'kernlab') new.packages <- required.packages[!(required.packages %in% installed.packages()[,"Package"])] if(length(new.packages)) install.packages(new.packages, repos='https://cran.us.r-project.org')

You can include this script in the file specified as the

sourceparameter to

Rstart()to ensure that your required packages are always present.

Another common issue with the R bridge arises from passing *backslashes* in file names from Lite-C to R. R uses *forward* *slashes* instead. You can modify these manually, or use Zorro’s

slash()function, which automatically converts all backslashes in a string to forward slashes. For example,

slash(ZorroFolder)returns the file path to the Zorro folder as a string, with forward slashes instead of backslashes.

OK, that was a lengthy tutorial, but it will be worth it!

So far we’ve used fairly simple R functions – stuff that you can easily do in Lite-C, like calcualting the mean of a bunch of numbers. But in the next post, we’ll put together our Zorro pairs tradng script that makes use of the Kalman filter that we wrote in R.

**More importantly, if you can master the R bridge functions we’ve discussed, you’ll be able to use any R tool directly in your trading scripts.**

The post Integrating R with the Zorro Backtesting and Execution Platform appeared first on Robot Wealth.

]]>The post Pairs Trading in Zorro appeared first on Robot Wealth.

]]>You know, light reading…

We saw that while R makes it easy to implement a relatively advanced algorithm like the Kalman filter, there are drawbacks to using it as a backtesting tool.

Setting up anything more advanced than the *simplest* possible vectorised backtesting framework is tough going and error-prone. Plus, it certainly isn’t simple to experiment with strategy design – for instance, incorporating costs, trading at multiple levels, using a timed exit, or incorporating other trade filters.

To be fair, there are good native R backtesting solutions, such as Quantstrat. But in my experience none of them let you experiment as efficiently as the Zorro platform.

And as an independent trader, the ability to move fast – writing proof of concept backtests, invalidating bad ideas, exploring good ones in detail, and ultimately moving to production efficiently – is quite literally a superpower.

*I’ve already invalidated 3 ideas since starting this post*

The downside with Zorro is that it would be pretty nightmarish implementing a Kalman filter in its native Lite-C code. But thanks to Zorro’s R bridge, I can use the R code for the Kalman filter that I’ve already written, with literally only a couple of minor tweaks. *We can have the best of both worlds!*

This post presents a script for a pairs trading algorithm using Zorro. We’ll stick with a static hedge ratio and focus on the pairs trading logic itself. In the next post, I’ll show you how to configure Zorro to talk to R and thus make use of the Kalman filter algorithm.

*Let’s get to it. *

Even the briefest scan of the pairs trading literature reveals many approaches to constructing spreads. For example, using:

- Prices
- Log-prices
- Ratios
- Factors
- Cointegration
- Least squares regression
- Copulas
- State space models

Ultimately, the goal is to find a spread that is both mean-reverting and volatile enough to make money from.

In my view, *how* you do that is much less important than it’s ability to make money. From personal experience, I know that the tendency is to get hung up on the “correct” way to implement a pairs trade. Such a thing doesn’t exist — I’ve seen money-printing pairs trading books that younger me, being more hung up on “correctness”, would have scoffed at.

Instead, understand that pairs trading is ultimately a numbers game and that universe selection is more important than the specifics of the algorithm. Sure, you can tweak your implementation to squeeze a little more out of it, and even find pockets of conditional or seasonal mean-reversion, but the specifics of the implementation are unlikely to be the ultimate source of alpha.

Anyway, that’s for you to mull over and keep in mind as you read this series. Right now, we’re just going to present one version of a pairs trade in Zorro.

This is a pairs trade that uses a price-based spread for its signals. First, here’s the code that calculates the spread, given two tickers `Y`

and `X`

(lines 5 – 6) and a hedge ratio, `beta`

(line 39). The spread is simply \(Y – \beta X\)

Here’s the code:

/* Price-based spread in Zorro */ #define Y "GDX" #define X "GLD" var calculate_spread(var hedge_ratio) { var spread = 0; asset(Y); spread += priceClose(); asset(X); spread -= hedge_ratio*priceClose(); return spread; } function run() { set(PLOTNOW); StartDate = 20100101; EndDate = 20191231; BarPeriod = 1440; LookBack = 100; // load data from Alpha Vantage in INITRUN if(is(INITRUN)) { string Name; while(Name = loop(Y, X)) { assetHistory(Name, FROM_AV); } } // calculate spread var beta = 0.4; vars spread = series(calculate_spread(beta)); // plot asset(Y); var asset1Prices = priceClose(); asset(X); plot(strf("%s-LHS", Y), asset1Prices, MAIN, RED); plot(strf("%s-RHS", X), priceClose(), 0|AXIS2, BLUE); plot("spread", spread, NEW, BLACK); }

Using GDX and GLD as our Y and X tickers respectively and a hedge ratio of 0.4, Zorro outputs the following plot:

The spread looks like it was reasonably stationary during certain subsets of the simulation, but between 2011 and 2013 it trended – not really a desirable property for a strategy based on mean-reversion.

Even this period in late 2013, where one could imagine profiting from a mean-reversion strategy, the spread hasn’t been very well behaved. The buy and sells levels are far from obvious in advance:

One way to tame the spread is to apply a rolling z-score transformation. That is, take a window of data, say 100 days, and calculate its mean and standard deviation. The z-score of the next point is the raw value less the window’s mean, divided by its standard deviation. Applying this in a rolling fashion is a one-liner in Zorro:

vars ZScore = series(zscore(spread[0], 100));

Our z-scored spread then looks like this:

The z-scored spread has some nice properties. In particular, it tends to oscillate between two extrema, eliminating the need to readjust buy and sell levels (although we’d need to decide on the actual values to use). On the other hand, it does introduce an additional parameter, namely the window length used in its calculation.

To implement the rest of our pairs trade, we need to decide the z-score levels at which to trade and implement the logic for buying and selling the spread.

Zorro makes that fairly easy for us. Here’s the complete backtesting framework code:

/* Price-based spread trading in Zorro */ #define Y "GDX" #define X "GLD" #define MaxTrades 5 #define Spacing 0.5 // #define COSTS int ZSLookback = 100; int Portfolio_Units = 100; //units of the portfolio to buy/sell (more --> better fidelity to dictates of hedge ratio) var calculate_spread(var hedge_ratio) { var spread = 0; asset(Y); #ifndef COSTS Spread = Commission = Slippage = 0; #endif spread += priceClose(); asset(X); #ifndef COSTS Spread = Commission = Slippage = 0; #endif spread -= hedge_ratio*priceClose(); return spread; } function run() { set(PLOTNOW); setf(PlotMode, PL_FINE); StartDate = 20100101; EndDate = 20191231; BarPeriod = 1440; LookBack = ZSLookback; MaxLong = MaxShort = MaxTrades; // load data from Alpha Vantage in INITRUN if(is(INITRUN)) { string Name; while(Name = loop(Y, X)) { assetHistory(Name, FROM_AV); } } // calculate spread var beta = 0.4; vars spread = series(calculate_spread(beta)); vars ZScore = series(zscore(spread[0], 100)); // set up trade levels var Levels[MaxTrades]; int i; for(i=0; i<MaxTrades; i++) { Levels[i] = (i+1)*Spacing; } // ------------------------------- // trade logic // ------------------------------- // exit on cross of zero line if(crossOver(ZScore, 0) or crossUnder(ZScore, 0)) { asset(X); exitLong(); exitShort(); asset(Y); exitLong(); exitShort(); } // entering positions at Levels for(i=0; i<=MaxTrades; i++) { if(crossUnder(ZScore, -Levels[i])) // buying the spread (long Y, short X) { asset(Y); Lots = Portfolio_Units; enterLong(); asset(X); Lots = Portfolio_Units * beta; enterShort(); } if(crossOver(ZScore, Levels[i])) // shorting the spread (short Y, long X) { asset(Y); Lots = Portfolio_Units; enterShort(); asset(X); Lots = Portfolio_Units * beta; enterLong(); } } // exiting positions at Levels for(i=1; i<=MaxTrades-1; i++) { if(crossOver(ZScore, -Levels[i])) // covering long spread (exiting long Y, exiting short X) { asset(Y); exitLong(0, 0, Portfolio_Units); asset(X); exitShort(0, 0, Portfolio_Units * beta); } if(crossUnder(ZScore, Levels[1])) // covering short spread (exiting short Y, exiting long X) { asset(Y); exitShort(0, 0, Portfolio_Units); asset(X); exitLong(0, 0, Portfolio_Units * beta); } } // plots if(!is(LOOKBACK)) { plot("zscore", ZScore, NEW, BLUE); int i; for(i=0; i<MaxTrades; i++) { plot(strf("#level_%d", i), Levels[i], 0, BLACK); plot(strf("#neglevel_%d", i), -Levels[i], 0, BLACK); } plot("spread", spread, NEW, BLUE); } }

The trade levels are controlled by the `MaxTrades`

and `Spacing`

variables (lines 7 – 8). These are implemented as `#define`

statements to make it easy to change these values, enabling fast iteration.

As implemented here, with `MaxTrades`

equal to 5 and `Spacing`

equal to 0.5, Zorro will generate trade levels every 0.5 standard deviations above and below the zero line of our z-score.

The generation of the levels happens in lines 57 – 63.

The trade logic is quite simple:

- Buy the spread if the z-score crosses under a negative level
- Short the spread if the z-score crosses over a positive level
- If we’re long the spread, cover a position if the z-score crosses over a negative level
- If we’re short the spread, cover a position if the z-score crosses under a positive level
- Cover whenever z-score crosses the zero line

By default, we’re trading 100 units of the spread at each level. We’re trading in and out of the spread as the z-score moves around and crosses our levels. If the z-score crosses more than one level in a single period, we’d be entering positions for each crossed level at market.

Essentially, it’s a bet on the mean-reversion of the z-scored spread translating into profitable buy and sell signals in the underlyings.

The strategy returns a Sharpe ratio of about 0.6 before costs (you can enable costs by uncommenting `#define COSTS`

in line 9, but you’ll need to set up a Zorro assets list with cost details, or tell Zorro about costs via script) and the following equity curve:

There you have it – a Zorro framework for price-based pairs trading. More than this particular approach to pairs trading itself, I hope that I’ve demonstrated Zorro’s efficiency for implementing such frameworks quickly. And once they’re implemented, you can run experiments and iterate on the design, as well as the utility of the trading strategy, efficiently.

For instance, it’s trivial to:

- Add more price levels, tighten them up, or space them out
- Get feedback on the impact of changing the z-score window length
- Explore what happens when you change the hedge ratio
- Change the simulation period
- Swap out GLD and GDX for other tickers

You can even run Zorro on the command line and pass most of the parameters controlling these variables as command line arguments – which means you can write a batch file to run hundreds of backtests and really get into some serious data mining – if that’s your thing.

All that aside, in the next post I want to show you how to incorporate the dynamic estimate of the hedge ratio into our Zorro pairs trading framework by calling the Kalman filter implemented in R directly from our Zorro script.

**You can grab the code for the Kalman Filter we used in the previous post for free below:**

The post Pairs Trading in Zorro appeared first on Robot Wealth.

]]>The post Kalman Filter Example:<br>Pairs Trading in R appeared first on Robot Wealth.

]]>Anyone who’s tried pairs trading will tell you that real financial series don’t exhibit truly stable, cointegrating relationships.

If they did, pairs trading would be the easiest game in town. But the reality is that relationships are constantly evolving and changing. At some point, we’re forced to make uncertain decisions about how best to capture those changes.

One way to incorporate both uncertainty and dynamism in our decisions is to use the Kalman filter for parameter estimation.

The Kalman filter is a state space model for estimating an unknown (‘hidden’) variable using observations of related variables and models of those relationships. The Kalman filter is underpinned by Bayesian probability theory and enables an estimate of the hidden variable in the presence of noise.

There are plenty of tutorials online that describe the mathematics of the Kalman filter, so I won’t repeat those here (this article is a wonderful read). Instead, this Kalman Filter Example post will show you how to implement the Kalman filter framework to provide a * dynamic estimate of the hedge ratio in a pairs trading strategy*. I’ll provide just enough math as is necessary to follow the implementation.

For this Kalman Filter example, we need four variables:

- A vector of our observed variable
- A vector of our hidden variable
- A state transition model (which describes how the hidden variable evolves from one state to the next)
- An observation model (a matrix of coefficients for the other variable – we use a hedge coefficient and an intercept)

For our hedge ratio/pairs trading application, the observed variable is one of our price series \(p_1\) and the hidden variable is our hedge ratio, \(\beta\). The observed and hidden variables are related by the familiar spread equation: \[p_1 = \beta * p_2 + \epsilon\] where \(\epsilon\) is noise (in our pairs trading framework, we are essentially making bets on the mean reversion of \(\epsilon\)). In the Kalman framework, the other price series, \(p_2\) provides our observation model.

We also need to define a state transition model that describes the evolution of \(\beta\) from one time period to the next. If we assume that \(\beta\) follows a random walk, then our state transition model is simply \[\beta_t = \beta_{t-1} + \omega\]

Here’s the well-known iterative Kalman filter algorithm.

For every time step:

- Predict the next state of the hidden variable given the current state and the state transition model
- Update the state covariance prediction
- Predict the next value of the observed variable given the prediction for the hidden variable and the observation model
- Update the measured covariance prediction
- Calculate the error between the observed and predicted values of the observed variable
- Calculate the Kalman gain
- Update the estimate of the hidden variable
- Update the state covariance prediction

To start the iteration, we need initial values for the covariances of the measurement and state equations. Methods exist to estimate these from data, but for our purposes we will start with some values that result in a relatively slowly changing hedge ratio. To make the hedge ratio change faster, increase the values of

deltaand

Vein the R code below. The initial estimates of these values are as close to ‘parameters’ that we have in our Kalman filter framework.

Here’s some R code for implementing the Kalman filter.

The two price series used are daily adjusted closing prices for the “Hello world” of pairs trading: GLD and GDX (you can download the data at the end of this post).

First, read in and take a look at the data:

library(xts) path <- "C:/Path/To/Your/Data/" assets <- c("GLD", "GDX") df1 <- xts(read.zoo(paste0(path, assets[1], ".csv"), tz="EST", format="%Y-%m-%d", sep=",", header=TRUE)) df2 <- xts(read.zoo(paste0(path, assets[2], ".csv"), tz="EST", format="%Y-%m-%d", sep=",", header=TRUE)) xy <- merge(df1$Close, df2$Close, join="inner") colnames(xy) <- assets plot(xy, legend.loc=1)

Here’s what the data look like:

Looks OK at first glance.

Here’s the code for the iterative Kalman filter estimate of the hedge ratio:

x <- xy[, assets[1]] y <- xy[, assets[2]] x$int <- rep(1, nrow(x)) delta <- 0.0001 Vw <- delta/(1-delta)*diag(2) Ve <- 0.001 R <- matrix(rep(0, 4), nrow=2) P <- matrix(rep(0, 4), nrow=2) beta <- matrix(rep(0, nrow(y)*2), ncol=2) y_est <- rep(0, nrow(y)) e <- rep(0, nrow(y)) Q <- rep(0, nrow(y)) for(i in 1:nrow(y)) { if(i > 1) { beta[i, ] <- beta[i-1, ] # state transition R <- P + Vw # state cov prediction } y_est[i] <- x[i, ] %*% beta[i, ] # measurement prediction Q[i] <- x[i, ] %*% R %*% t(x[i, ]) + Ve # measurement variance prediction # error between observation of y and prediction e[i] <- y[i] - y_est[i] K <- R %*% t(x[i, ]) / Q[i] # Kalman gain # state update beta[i, ] <- beta[i, ] + K * e[i] P = R - K %*% x[i, ] %*% R } beta <- xts(beta, order.by=index(xy)) plot(beta[2:nrow(beta), 1], type='l', main = 'Kalman updated hedge ratio') plot(beta[2:nrow(beta), 2], type='l', main = 'Kalman updated intercept')

And here is the resulting plot of the dynamic hedge ratio:

The value of this particular Kalman filter example is immediately apparent – you can see how drastically the hedge ratio changed over the years.

We could use that hedge ratio to construct our signals for a trading strategy, but we can actually use the other by-products of the Kalman filter framework to generate them directly *(hat tip to Ernie Chan for this one):*

The prediction error (

ein the code above) is equivalent to the deviation of the spread from its predicted value. Some simple trade logic could be to buy and sell our spread when this deviation is very negative and positive respectively.

We can relate the actual entry levels to the standard deviation of the prediction error. The Kalman routine also computes the standard deviation of the error term for us: it is simply the square root of

Qin the code above.

Here’s a plot of the trading signals at one standard deviation of the prediction error (we need to drop a few leading values as the Kalman filter takes a few steps to warm up):

# plot trade signals e <- xts(e, order.by=index(xy)) sqrtQ <- xts(sqrt(Q), order.by=index(xy)) signals <- merge(e, sqrtQ, -sqrtQ) colnames(signals) <- c("e", "sqrtQ", "negsqrtQ") plot(signals[3:length(index(signals))], ylab='e', main = 'Trade signals at one-standard deviation', col=c('blue', 'black', 'black'), lwd=c(1,2,2))

Cool! Looks OK, except the number of signals greatly diminishes in the latter half of the simulation period. Later, we might come back and investigate a more aggressive signal, but let’s press on for now.

At this point, we’ve got a time series of trade signals corresponding to the error term being greater than one standard deviation from its (estimated) mean. We could run a vectorised backtest by calculating positions corresponding to these signals, then determine the returns of holding those positions.

In fact, let’s do that next:

# vectorised backtest sig <- ifelse((signals[1:length(index(signals))]$e > signals[1:length(index(signals))]$sqrtQ) & (lag.xts(signals$e, 1) < lag.xts(signals$sqrtQ, 1)), -1, ifelse((signals[1:length(index(signals))]$e < signals[1:length(index(signals))]$negsqrtQ) & (lag.xts(signals$e, 1) > lag.xts(signals$negsqrtQ, 1)), 1, 0)) colnames(sig) <- "sig" ## trick for getting only the first signals sig[sig == 0] <- NA sig <- na.locf(sig) sig <- diff(sig)/2 plot(sig) ## simulate positions and pnl sim <- merge(lag.xts(sig,1), beta[, 1], x[, 1], y) colnames(sim) <- c("sig", "hedge", assets[1], assets[2]) sim$posX <- sim$sig * -1000 * sim$hedge sim$posY <- sim$sig * 1000 sim$posX[sim$posX == 0] <- NA sim$posX <- na.locf(sim$posX) sim$posY[sim$posY == 0] <- NA sim$posY <- na.locf(sim$posY) pnlX <- sim$posX * diff(sim[, assets[1]]) pnlY <- sim$posY * diff(sim[, assets[2]]) pnl <- pnlX + pnlY plot(cumsum(na.omit(pnl)), main="Cumulative PnL, $")

*Just a quick explanation of my hacky backtest…*

The ugly nested

`ifelse`

statement in line 2 creates a time series of trade signals where sells are represented as -1, buys as 1 and no signal as 0. The buy signal is the prediction error crossing under its -1 standard deviation from above; the sell signal is the prediction error crossing over its 1 standard deviation from below.The problem with this signal vector is that we can get consecutive sell signals and consecutive buy signals. We don’t want to muddy the waters by holding more than one position at a time, so we use a little trick in lines 7 – 10 to firstly replace any zeroes with

`NA`

, and then use the`na.locf`

function to fill forward the`NA`

values with the last real value. We then recover the original (non-consecutive) signals by taking the`diff`

and dividing by 2.If that seems odd, just write down on a piece of paper a few signals of -1, 1 and 0 in a column and perform on them the operations described. You’ll quickly see how this works.

Then, we calculate our positions in each asset according to our spread and signals, taking care to lag our signals so that we don’t introduce look-ahead bias. We’re trading 1,000 units of our spread per trade. Our estimated profit and loss is just the sum of the price differences multiplied by the positions in each asset.

Here’s the result:

Looks interesting!

But recall that our trading signals were few and far between in the latter half of the simulation? If we plot the signals, we see that we were actually holding the spread for well over a year at a time:

I doubt we’d want to trade the spread this way, so let’s make our signals more aggressive:

# more aggressive trade signals signals <- merge(e, .5*sqrtQ, -.5*sqrtQ) colnames(signals) <- c("e", "sqrtQ", "negsqrtQ") plot(signals[3:length(index(signals))], ylab='e', main = 'Trade signals at one-standard deviation', col=c('blue', 'black', 'black'), lwd=c(1,2,2))

Better! A smarter way to do this would probably be to adapt the trade level (or levels) to the recent volatility of the spread – I’ll leave that as an exercise for you.

These trade signals lead to this impressive and highly dubious equity curve:

Why is it dubious?

Well, you probably noticed that there are some pretty out-there assumptions in this backtest. To name the most obvious:

- We’re trading at the daily closing price with no market impact or slippage
- We’re trading for free

My gut feeling is that this would need a fair bit of work to cover costs of trading – but that gets tricky to assess without a more accurate simulation tool.

You can see that it’s a bit of a pain to backtest – particularly if you want to incorporate costs. To be fair, there are native R backtesting solutions that are more comprehensive than my quick-n-dirty vectorised version. But in my experience none of them lets you move quite as fast as the Zorro platform, which also allows you to go from backtest to live trading with almost the click of a button.

You can see that R makes it quite easy to incorporate an advanced algorithm* (well, at least I think it’s advanced; our clever readers probably disagree).* But tinkering with the strategy itself – for instance, incorporating costs, trading at multiple standard deviation levels, using a timed exit, or incorporating other trade filters – is a recipe for a headache, not to mention a whole world of unit testing and bug fixing.

On the other hand, Zorro makes tinkering with the trading aspects of the strategy easy. Want to get a good read on costs? That’s literally a line of code. Want to filter some trades based on volatility? Yeah, you might need two lines for that. What about trading the spread at say half a dozen levels and entering and exiting both on the way up and on the way down? OK, you might need four lines for that.

The downside with Zorro is that it would be pretty nightmarish implementing a Kalman filter in its native Lite-C code. But thanks to Zorro’s R bridge, I can use the R code for the Kalman filter example that I’ve already written, with literally only a couple of minor tweaks. We can have the best of both worlds.

*Which leads to my next post…*

In Kalman Filter Example part 2, I’ll show you a basic pairs trading script in Zorro, using a more vanilla method of calculating the hedge ratio. After that, I’ll show you how to configure Zorro to talk to R and thus make use of the Kalman filter algorithm.

*I’d love to know if this series is interesting for you, and what else you’d like to read about on Robot Wealth. Let us know in the comments.
*

The post Kalman Filter Example:<br>Pairs Trading in R appeared first on Robot Wealth.

]]>The post Pattern Recognition with the Frechet Distance appeared first on Robot Wealth.

]]>*ah, I see a blue star pattern on my chart… a good omen.*

The problem is that such an approach is *inherently subjective* since price action almost never matches perfectly with the idealized version of price patterns you see in every beginner’s guide to trading. It is up to you, the individual, to determine whether a particular chart formation matches closely enough with a particular pattern for it to be considered valid.

This is quite tricky! It’s *very difficult* to codify a trading system based on their use. By extension, it is difficult to test the efficacy of these patterns in an objective, evidence-based manner.

That won’t stop smart people from trying, though. An attempt to do just this was made by MIT’s Andrew Lo, Harry Mamaysky and Jian Wang back in 20001and, perhaps surprisingly, they found statistically significant evidence that *some* patterns provide useful incremental information in *some *markets.

Lo, Mamaysky and Wang were also generally enthusiastic about using automated detection methods. There are many possible approaches to pattern detection algorithms, of which Zorro implements one: a function that calculates the Frechet distance.

*So, l**et’s dive in and explore it!*

**The Frechet distance between two curves is a measure of their similarity —** it’s often described like so:

Suppose a man is walking his dog and that he is forced to walk on a particular path and his dog on another path. Both the man and the dog are allowed to control their speed independently but are not allowed to go backwards. Then, the Fréchet distance of the two paths is the minimal length of a leash that is necessary to keep man and dog joined throughout their walk.

*there are a lot more variables in the real world….*

If the Frechet distance between two curves is small, it follows that the curves are similar. Conversely, a large Frechet distance implies that the curves are not similar.

So, we can leverage the Frechet distance as a pattern detection algorithm by comparing sections of the price curve to a curve corresponding to a pattern of interest, for example, a triangle. A small Frechet distance implies that the section of the price curve that was analyzed is similar to the pre-defined pattern.

There have been a whole bunch of algorithms proposed over the years for calculating Frechet distance (it was first described in 1906), and Zorro implements a simple variant that enables the comparison of some part of a price curve with a known and previously described pattern. Zorro’s

frechet()function returns a number between approximately 0 and 80 that measures the similarity of the part of the price curve being analyzed and the pattern.

**Note that this is proportional to the inverse of the Frechet distance, in that a larger similarity measure implies a smaller Frechet distance. **

frechet()takes the following arguments:

- A series (usually asset prices) to be compared with a predefined pattern
- An integer
TimeFrame

which sets the number of price bars to use in the comparison, that is, the horizontal length of the pattern in the price curve (setting this to zero tells Zorro to simply use the same length as the predefined pattern). - A var,
Scale

, specifying the vertical size of the pattern in the price chart. Setting this to a negative number inverts the pattern. - An array of positive numbers specifying the shape of the pattern to be detected. The final value of the array must be zero, which is used by the algorithm to signal the termination of the pattern.

There are several complications and considerations to be aware of in setting these arguments, so let’s go through each of them in more detail, starting with the array specifying the shape of the pattern.

If we want to detect chart patterns, the first thing we need to define is the shape of that pattern. Note that in describing our pattern in an array, we only need to be concerned with its *shape. *We can deal with its *size* (both horizontally and vertically) using other

frechet()arguments. Therefore don’t focus too much on the absolute values of the numbers that describe the pattern – their relative values are much more important here.

To define a pattern in an array, think of an \(x, y\) coordinate plane. The array indexes are the \(x\) values; the numbers stored in each index are the corresponding \(y\) values. We then map our pattern as a series of \(x,y\) pairs.

Here’s an example for a triangle pattern:

Remembering that zero terminates the pattern, the corresponding array would consist of the numbers 1, 8, 2, 7, 3, 6, 4, 5, 0. We would define such an array as

var Triangle[9] = {1, 8, 2, 7, 3, 6, 4, 5, 0};

The obvious question that arises from this approach is how well does the algorithm detect patterns that we would consider a triangle, but which deviate from the idealized triangle shown above?

For example, what about asymmetry in the legs of the triangle? That is, what if the early legs take longer to complete than later legs? In the example above, all the legs take the same amount of time. What about triangles that don’t terminate at the apex?

By way of example, the following would probably fit the definition of a triangle:

But now our array would be given by 1, 3, 5, 8, 6, 4, 2, 5, 7, 6, 4, 3, 6, 0. That is,

var Triangle[14] = {1,3,5,8,6,4,2,5,7,6,4,3,6,0};

Would these two patterns return different Frechet similarities when applied to the same price curve?

The answer is yes. But, bear in mind that a pattern corresponding to the first triangle will still be *somewhat* similar to the second triangle. In practice, this means that in order to use this approach effectively, we would need to cover our bases and check for multiple variants of the intended pattern, perhaps using some sort of confirmation between different variations of the pattern. We’ll see in the example below how much variation we see in our similarity measure for different variants of the same intended pattern.

The

Scaleparameter controls the vertical height of the pattern being searched in the price curve. This is the same as stretching the predefined pattern in the direction of price in a price chart. In most cases, it will make sense to set the height of the pattern to the range of the price action over the period of interest. We do this automatically by setting the

Scaleparameter based on the maximum and minimum values of the price series over the time period of interest via Zorro’s

MaxVal()and

MinVal()functions:

Scale = MaxVal(Data, TimeFrame) - MinVal(Data, TimeFrame);

The

TimeFrameparameter controls the pattern’s horizontal length and corresponds to the number of bars over which to apply the pattern. This is the same as stretching the predefined pattern in the direction of time on a price chart. This parameter requires a little more thought because there are no hard and fast rules regarding how long these patterns should take to form. Again, we must deal with the inherent subjectivity of the method.

Rather than constraining our pattern detection algorithm to a single time period, why not simply look over multiple time periods?

We could do this by calling

frechet()within a

for()loop that increments the

TimeFrameparameter on every iteration, like so:

for(i=5;i<100;i+=5) { frechet(Price, i, 100*PIP, Triangle); }

This will search for a pattern called “Triangle” that is 100 pips high over multiple time ranges, from 5 bars to 100 bars. In practice, we don’t really need to cover every incremental number of bars (1,2,3,4 etc) because patterns evolving over similar time horizons will tend to return similar

frechet()values. For example, a pattern evolving over 10 bars will be similar to the same pattern evolving over 11 bars. In the example above, we increment our search length by 5 bars.

We are now in a position to start detecting patterns and analyzing their usefulness as trading signals.

The code below plots the Frechet similarity metric for our symmetric and asymmetric triangles, and their inverses, over a number of time horizons. We use the

strf()function to enable us to pass a variable (in this case, the integer

i) into a string (in this case, the name of the plot) so that we can plot the different Frechet similarities from within the

for()loop:

/* PLOT FRECHET SIMILARITY */ function run() { set(PLOTNOW); StartDate = 20150712; EndDate = 20150826; BarPeriod = 1440; LookBack = 100; asset("SPX"); vars Price = series(priceClose()); static var Tri_Sym[9] = {1,8,2,7,3,6,5,4,0}; static var Tri_Asym[14] = {1,3,5,8,6,4,2,5,7,6,4,3,6,0}; int i; for(i=10;i<=30;i+=10) { plot(strf("Tri_Sym_%d", i),frechet(Price, i, MaxVal(Price,i) - MinVal(Price,i), Tri_Sym),NEW,RED); plot(strf("Tri_Asym_%d", i),frechet(Price, i, MaxVal(Price,i) - MinVal(Price,i), Tri_Asym),0,BLUE); plot(strf("Tri_Sym_Inv_%d", i),frechet(Price, i, -(MaxVal(Price,i) - MinVal(Price,i)), Tri_Sym),NEW,BLACK); plot(strf("Tri_Asym_Inv_%d", i),frechet(Price, i, -(MaxVal(Price,i) - MinVal(Price,i)), Tri_Asym),0,GREEN); } PlotWidth = 800; PlotScale = 15; PlotHeight1 = 500; PlotHeight2 = 125; }

Here we zoom into an area of the S&P500 index that saw a fairly obvious triangle develop from early July through to mid-August 2015:

You can see that most variants of our Frechet algorithm detected the triangle at some point during its evolution. In particular, the asymmetric inversive triangle measured over 20 days did a particularly good job of recognizing the triangle, reaching a similarity score of approximately 50 as the triangle approached its apex.

Looking more closely at other regions will reveal that the algorithm is far from perfect, sometimes scoring patterns that we would rather exclude relatively highly. This makes it difficult to differentiate the “true” patterns on the basis of some threshold similarity score. To overcome that, we could perhaps continue to refine our pattern definitions or implement a series of confirmations from different variations, but that would get tedious fairly quickly.

Here’s an example of a simple trading strategy that looks for our asymmetric triangle pattern across several time horizons. Again using the

strf()function, we switch to a new

Algofor each time horizon. I read somewhere that gold is particularly prone to triangle formations, so we’ll use the GLD Exchange Traded Fund. I also read that triangles are allegedly indicators of a strong breakout in any direction.

*I don’t know whether this has any basis in fact, but let’s go with it for the purpose of the exercise.*

On the basis of the alleged triangle behavior, when one is detected we bracket the market at a quarter of the 20-day ATR. We leave our pending orders for a maximum of 10 days, but no longer than the time horizon used to detect the pattern. Likewise, we close our trades after a maximum of 20 days, but no longer than the time horizon.

There’s much more you could do here, for example cancelling the remaining pending order when the opposite one is executed.2The code uses Alpha Vantage’s API for getting the required GLD historical data, so you’ll need to set this up in your Zorro.ini file if you don’t already have this data.

/* FRECHET TRADING */ var threshold = 30; function run() { set(PLOTNOW); StartDate = 2007; EndDate = 2017; BarPeriod = 1440; AssetList = "AssetsIB"; if(is(INITRUN)) assetHistory("GLD", FROM_AV); asset("GLD"); vars Price = series(priceClose()); var Tri_Asym[14] = {1,3,5,8,6,4,2,5,7,6,4,3,6,0}; int i; for(i=10;i<=50;i+=10) { algo(strf("_%d_Asym", i)); if(frechet(Price, i, MaxVal(Price,i)-MinVal(Price,i), Tri_Asym) > threshold) { Entry = 0.25*ATR(20); EntryTime = min(i, 10); LifeTime = min(i, 20); if(NumOpenLong == 0) enterLong(); if(NumOpenShort == 0) enterShort(); } } PlotWidth = 800; PlotHeight1 = 500; PlotHeight2 = 125; }

Here’s the equity curve:

You’ll find that such a trading strategy is difficult to apply directly to a universe of potential assets.

The effect of randomness, combined with the difficulty in refining pattern definitions sufficiently, invites overfitting to the parameters of the

frechet()function, not to mention selection bias. You’ll also be faced with the decision to trade a pattern detected at multiple time horizons.

Maybe a system of confirmations from numerous pattern variations would help, but perhaps a more practical application for the trader interested in patterns is to use

frechet()to scan a universe of assets and issue email or SMS alerts, or display a message listing assets where a pattern was detected for further manual analysis. Maybe we’ll cover this in a future blog post if you’re interested, let me know in the comments below.

Here are some rather idealized patterns to get you started. I leave it up to you to experiment with various departures from their idealized forms.

var rectangle[5] = {1,2,1,2,0}; var cup[10] = {6,3,2,1,1,1,2,3,6,0}; var zigzag[5] = {1,7,2,8,0}; var headShldrs[17] = {1,2,3,3,3,4,5,6,6,5,4,3,3,3,2,1,0}; var cup[10] = { 6,3,2,1,1,1,2,3,6,0 }; var triangle_symmetric[9] = {1,8,2,7,3,6,5,4,0}; var triangle_assymetric[14] = {1,3,5,8,6,4,2,5,7,6,4,3,6,0};

**Want a more robust and profitable approach to trading? Gain a broader understanding of how we use algorithms to trade systematically and make our capital grow by downloading the free Algo Basics PDF below.**

After that, check out our other blog post where we outline how we approach the markets in a way that allows us to trade for a living.

The post Pattern Recognition with the Frechet Distance appeared first on Robot Wealth.

]]>The post Can you apply factors to<br> trade performance? appeared first on Robot Wealth.

]]>For instance, maybe you wonder if your strategy tends to do better when volatility is high?

In this case, you can get very binary feedback by, say, running backtests with and without a volatility filter.

But this can mask interesting insights that might surface if the relationship could be explored in more detail.

Zorro has some neat tools that allow us to associate data of interest with particular trading decisions, and then export that data for further analysis. Here’s how it works:

Zorro implements a `TRADE`

struct for holding information related to a particular position. This struct is a data container which holds information about each trade throughout the life of our simulation. We can also add our own data to this struct via the `TRADEVAR`

array, which we can populate with values associated with a particular trade.

Zorro stores this array, along with all the other information about each and every position, as members of the `TRADE`

struct. We can access the `TRADE`

struct members in two ways: inside a trade management function (TMF) and inside a trade enumeration loop.

Here’s an example of exporting the last estimated volatility at the time a position was entered, along with the return associated with that position *(this is a simple, long only moving average cross over strategy, data is loaded from Alpha Vantage):*

/* Example of exporting data from a Zorro simulation. */ #define VOL TradeVar[0] int recordVol(var volatility) { VOL = volatility; return 16; } function run() { set(PLOTNOW); StartDate = 2007; EndDate = 2019; BarPeriod = 1440; LookBack = 200; MaxLong = MaxShort = 1; string Name; while(Name = loop("AAPL", "MSFT", "GOOGL", "IBM", "MMM", "AMZN", "CAT", "CL")) { assetHistory(Name, FROM_AV); asset(Name); Spread = Commission = Slippage = 0; vars close = series(priceClose()); vars smaFast = series(SMA(close, 10)); vars smaSlow = series(SMA(close, 50)); var vol = Moment(series(ROCP(close, 1)), 50, 2); // rolling 50-period standard deviation of returns if(crossOver(smaFast, smaSlow)) { enterLong(recordVol, vol); } else if(crossUnder(smaFast, smaSlow)) { exitLong(); } plot("volatility", vol, NEW, BLUE); } if(is(EXITRUN)) { int count = 0; char line[100]; string filename = "Log\\vol.csv"; if(file_length(filename)) { printf("\nFound existing file. Deleting."); file_delete(filename); } printf("\n writing vol file..."); sprintf(line, "Asset, EntryDate, TradeReturn, EntryVol"); file_append(filename, line); for(closed_trades) { sprintf(line, "\n%s, %i, %.6f, %.5f", Asset, ymd(TradeDate), (-2*TradeIsShort+1)*(TradePriceClose-TradePriceOpen)/TradePriceOpen, VOL); file_append(filename, line); count++; } printf("\nTrades: %i", count); } }

The general pattern for accomplishing this is:

- Define a meaningful name for the element of the
`TradeVar`

that we’ll use to hold our volatility data (line 5) - Define a Trade Management Function to expose the
`TRADE`

struct and use it to assign our variable to our`TradeVar`

(lines 7-12). A return value of 16 tells Zorro to run the TMF only when the position is entered and exited. - Calculate the variable of interest in the Zorro script. Here we calculate the rolling 50-day standard deviation of returns (line 34).
- Pass the TMF and the variable of interest to Zorro’s
`enter`

function (line 38). - In the
`EXITRUN`

(the last thing Zorro does after finishing a simulation), loop through all the positions using a trade enumeration loop and write the details, along with the volatility calculated just prior to entry, to a csv file.

Running this script results in a small csv file being written to Zorro’s Log folder. A sample of the data looks like this:

Once we’ve got that data, we can easily read it into our favourite data analysis tool for a closer look. Here, I’ll read it into R and use the `tidyverse`

libraries to dig deeper. *(This will be very cursory. You could and should go a lot deeper if this were a serious strategy.)*

First, read the data in, and process it by adding a couple of columns that might be interesting:

library(ggplot2) library(tidyverse) # analysis of gap size and trade profit path <- "C:/Zorro/Log/" file <- "vol.csv" df <- read.csv(paste0(path, file), header=TRUE, stringsAsFactors=FALSE, strip.white=TRUE) # make some additional columns df$AbsTradeReturn <- abs(df$TradeReturn) df$Result <- factor(ifelse(df$TradeReturn>0, "win", "loss"))

If we `head`

the resulting data frame, we find that it looks like this:

head(df) # Asset EntryDate TradeReturn EntryVol AbsTradeReturn Result # 1 MSFT 20190906 -0.011359 0.00019 0.011359 loss # 2 MSFT 20190904 0.017583 0.00020 0.017583 win # 3 MSFT 20190829 -0.015059 0.00020 0.015059 loss # 4 CL 20190828 -0.010946 0.00017 0.010946 loss # 5 MSFT 20190819 -0.036269 0.00017 0.036269 loss # 6 GOOGL 20190712 0.052325 0.00023 0.052325 win

*Sweet! Looks like we’re in business!*

Now we can start to answer some interesting questions. First, is volatility at the time of entry related to the magnitude of the trade return? Intuitively we’d expect this to be the case, as higher volatility implies larger price swings and therefore larger absolute trade returns:

# is volatility related to the magnitude of the trade return? ggplot(data=df[, c("AbsTradeReturn", "EntryVol")], aes(x=EntryVol, y=AbsTradeReturn)) + geom_point(alpha=0.4) + geom_smooth(method="lm", se=TRUE)

Nice! Just what we’d expect to see.

Does this relationship hold for each individual asset that we traded?

# what about by asset? ggplot(data=df[, c("AbsTradeReturn", "EntryVol", "Asset")], aes(x=EntryVol, y=AbsTradeReturn)) + geom_point(alpha=0.4) + geom_smooth(method="lm", se=TRUE) + facet_wrap(~Asset)

Looks like the relationship generally holds at the asset level, but note that we have a small sample size so take the results with a grain of salt:

# note that we have a small sample size: df %>% group_by(Asset) %>% count() # Asset n # <chr> <int> # 1 AAPL 36 # 2 AMZN 30 # 3 CAT 38 # 4 CL 47 # 5 GOOGL 36 # 6 IBM 37 # 7 MMM 32 # 8 MSFT 37

Is volatility related to the actual trade return?

# is volatility related to the actual trade return? ggplot(data=df[, c("TradeReturn", "EntryVol")], aes(x=EntryVol, y=TradeReturn)) + geom_point(alpha=0.4) + geom_smooth(method="lm", se=TRUE)

Looks like it might be. But this was a long-only strategy that made money in a period where everything went up, so I wouldn’t read too much into this without controlling for that effect.

Is there a significant difference in the entry volatility for winning and losing trades?

# what's the spread of volatility for winning and losing trades? ggplot(data=df[, c("EntryVol", "Result")], aes(x=Result, y=EntryVol)) + geom_boxplot()

Finally, we can treat our volatility variable as a “factor” to which our trade returns are exposed. Is this factor useful in predicting trade returns?

First, we’ll need some functions for bucketing our trade results by factor quantile:

# factor functions get_factor_quantiles <- function(factor_df, n_quantiles = 5, q_type = 7) { n_assets <- factor_df %>% ungroup %>% select(Asset) %>% n_distinct() factor_df %>% mutate(rank = rank(factor, ties.method='first'), quantile = get_quantiles(factor, n_quantiles, q_type)) } get_quantiles <- function(factors, n_quantiles, q_type) { cut(factors, quantile(factors, seq(0,1,1/n_quantiles), type = q_type), FALSE, TRUE) }

If we bucket our results by factor quantile, do any buckets account for significantly more profit and loss? Are there any other interesting relationships?

# if we bucket the vol, do any buckets account for more profit/loss? factor_df <- df[, c("Asset", "EntryVol", "TradeReturn")] names(factor_df)[names(factor_df) == "EntryVol"] <- "factor" quantiles <- get_factor_quantiles(factor_df) r <- quantiles %>% group_by(quantile) %>% summarise(MeanTradeReturn=mean(TradeReturn)) ggplot(data=r, aes(quantile, MeanTradeReturn)) + geom_col() + ggtitle("Returns by volatility quantile")

Looks like there might be something to that fifth quantile (but of course beware the small sample size).

We can retrieve the cutoff value for the fifth quantile by sorting our factor and finding the value four-fifths the length of the resulting vector:

# fifth bin cutoff sorted <- sort(factor_df$factor) sorted[as.integer(4/5*length(sorted))] # [1] 0.00043

There you have it. This was a simple example of exporting potentially relevant data from a Zorro simulation and reading it into a data analysis package for further research.

How might you apply this approach to more serious strategies? What data do you think is potentially relevant? Tell us your thoughts in the comments.

*Want a broader understanding of algorithmic trading? See why it’s the only sustainable approach to profiting from the markets and how you can use it to your success inside the free Algo Basics PDF below….*

The post Can you apply factors to<br> trade performance? appeared first on Robot Wealth.

]]>The post Time is NOT the Enemy:<br> Grow Your Capital by Showing Up appeared first on Robot Wealth.

]]>This assumption can lead us down long and unnecessary rabbit holes and away from the more mundane fundamentals that account for 80% of our day-to-day trading decisions.

When you run a trading business you quickly get to the meat of *practical* market theory – the 80-90% that matters. From our experience, there are **two fundamental concepts** you need to know that are absolutely vital to profiting from the markets.

These concepts are:

- The Time Value of Money
- The Principle of No-Arbitrage

This post is the first in a series of *Quant Basics* where we’ll explore these fundamentals, as well as others*. *We’ll focus solely on the Time Value of Money today as it’s the cornerstone of any profitable trading approach…. including what we do here at Robot Wealth!

So down tools on those deep neural networks for a second. Let’s look at why this first concept is so important for your success as a retail trader.

This is easily the most fundamental concept in finance — let’s break it down.

Simply put, $100 today is worth more to you than $100 received in a year’s time. *Call me captain obvious.*

If you have $100 today you can do potentially valuable things with it now. You could start a business, or you could invest it in a financial asset like a share of a company. By doing this you expect to get positive returns on your $100 for taking on this risk, so it seems like a good idea that is likely to pay off over the *long-run*.

But what if you don’t have a* long-run?*

What if you are going to need that $100 in a year’s time and you want to make sure you preserve that capital, but you also want to put it to work to make more money in the meantime?

In that case, you can lend it to someone who has a slightly longer investment horizon than you. In exchange for them having the use of your money, they will pay you a small amount of (theoretically) guaranteed interest when they give you your money back.

This is what your bank does. When you deposit money in the bank, you’re really *lending* it to the bank. They take various well-diversified risks with that money and they pay interest to their depositors for the use of that money. If they manage things appropriately they will receive a return greater than the interest they pay to depositors – producing a profit.

This simple idea that money has a time value is fundamental to *all of finance. *

What’s in all this for you?

Well, if you have money now you can use it to make more money in the future if you know what to do with it.

*(Stuffing your money under the mattress is a poor choice when someone with ideas and plans is willing to pay you for the use of that money…)*

The Time Value of Money principle gives us a baseline for our risk-taking. If we can get a certain guaranteed yield in the bank then we should only be taking on extra *unguaranteed* risk if we are confident that our expected rewards for taking that risk are greater than the baseline amount we get from the bank.

**In essence, trading is risk-taking.**

As investors, we are rewarded for taking on certain long-term risks, which include buying assets that are sensitive to disappointment in economic growth, inflation and interest rates. Here are some risks associated with various financial products:

So to put our capital to work we get long assets which are exposed to these risks. Getting long lots of them dramatically reduces portfolio volatility through diversification – *which is a good idea!*

**We primarily do this by harvesting risk premia.**

In extremely simple and general terms:

- Buying stocks is a good idea
- Buying bonds is a good idea
- Buying real estate is a good idea
- Selling a little bit of volatility is a good idea.

**By getting those in your portfolio you buy yourself the maximum chance of making money under most conditions. This also adds a portfolio “tailwind” to your active strategies.**

Crucially, this also means that you’ll always be trading one way or another, even when your more active strategies inevitably stop working….which *will* happen.

We incorporate this in our trading at Robot Wealth. Our risk premia strategy has returned around 17% at a CAGR of 26% since going live 9 months ago:

We provide this strategy to our Bootcamp participants, too

You can read more about why and how we harvest risk premia here.

The good news is that there’s no need to make this approach more complicated than it has to be — **80% of success here is just showing up. **

The precise way you execute the above risk-taking matters a lot less than the fact that *you do it at all. *If you find technical details like volatility and covariance management daunting, know that this makes up just 20%.

The 80% is just buying the right assets and keeping hold of them – which is simple to understand and implement, and will put you in a strong position when you get involved in more active, riskier trades.

Harvesting risk premia is a simple way of taking risk to earn above-baseline interest on your capital, especially as a small-time retail trader. It’s the most sensible way to use the time value of money to your advantage, before getting involved in more active strategies. You just need to show up and have a long time horizon.

*That’s Quant Basic number one.*

**In the next Quant Basics post we’ll investigate the pricing efficiency of the markets, why it’s hard to trade given this efficiency, and how you can do it anyway!**

**In the meantime, you can learn more about the fundamentals of algo trading by downloading the free PDF below:**

The post Time is NOT the Enemy:<br> Grow Your Capital by Showing Up appeared first on Robot Wealth.

]]>The post A Quant’s Approach to Drawdown:<br> The Cold Blood Index appeared first on Robot Wealth.

]]>Specifically, they’d:

- do the best job possible of designing and building their trading strategy to be robust to a range of future market conditions
- chill out and let the strategy do its thing, understanding that drawdowns are business-as-usual
- go and look for other opportunities to trade.

*Of course, at some point, you have to retire strategies. Alpha doesn’t persist forever.*

In our own trading, we don’t systematise this decision process. We weigh up the evidence and make discretionary judgements. All things being equal we tend to allow things a lot of space to work out.

However, in this post, we review a systematic approach which can aid this decision making…

In particular, we concentrate on the following question: *“*

Let’s dive in and explore *The Cold Blood Index!*

Johan Lotter, the brains behind the Zorro development platform, proposed an empirical approach to the problem of reconciling backtest returns with live returns.

Put simply, his approach compares a drawdown experienced in live trading to the backtested equity curve, and he called this approach the **Cold Blood Index (CBI)**.

Apart from sounding like something you’d use to rank your favourite reptiles, we’re going to break down the CBI and find out what it can tell you about your drawdowns in live trading — especially when panic alarms are busy going off in your lizard brain.

You can see Johan’s blog post from 2015 for the original article.

*Let’s break it down….*

Paying homage to its creator, we’ll utilise Zorro to illustrate the CBI in action.

*Don’t worry if any of these details are hard to grasp. Follow the bigger picture and revisit the finer nuances later.*

Say you have been trading a strategy live for \(t\) days and are in a drawdown of length \(l\) and depth \(D\).

You want to know how this compares with the backtest. Most people will want to use this to decide whether their strategy is ‘broken’, but remember that *backtests are far from a 100% accurate representation of future performance. *So be careful how you use this thing.

The CBI is an estimate of the probability of experiencing the current drawdown if the strategy *hasn’t* deviated from its backtest.

- A high CBI indicates that the current drawdown is
*not*unexpected, meaning the strategy probably hasn’t deviated from its backtest. - A low CBI indicates that the system that produced the backtest is very unlikely to have produced the current drawdown – meaning the live strategy
*has*deviated from the backtest.

Our *null hypothesis* (default stance) is that the live strategy’s current drawdown could have been produced by the backtested strategy.

The CBI is the *p*-value used to evaluate the statistical test of our null hypothesis.

**More simply,** **the CBI is the probability that the current drawdown would be equal to (or more extreme than) its observed value if the strategy hadn’t deviated**.

Typically, we don’t have access to the entire population of data points, but only a sample or subset of the total population. So, statistical hypothesis testing is a process that tests claims about the population we DO have, on the basis of evidence gleaned from the sample.

Since we don’t have the full population we can never be *totally sure* of any conclusion we draw (just like virtually all things in trading), but we CAN accept or reject the claims on the basis of the strength of the evidence.

The strength of the evidence is encapsulated in the ** p-value. **But we also need a statistical test for calculating it.

**In this case, our statistical test is empirical in nature – it is derived directly from the backtest equity curve. **

Deriving an empirical statistical test involves calculating the distribution of the phenomenon we are testing – in this case, the depth of drawdowns of length \(l\) within a live trading period \(t\) – from the sample data (our backtest).

Then, we compare the phenomenon’s observed value (\(D\), the depth of our current drawdown of length \(l\)) with the distribution obtained from the sample data, deriving the *p*-value directly from the observed value’s position on the sample distribution.

Naturally, let’s start with the *worst-case scenario.*

Take the simple case where our trading time, \(t\) is the same as the length of our current drawdown, \(l\).

To calculate the empirical distribution, we simply take a window of length \(l\) and place it at the first period in the backtest balance curve.

That is, the window initially covers the backtest balance curve from the first bar to bar \(l\).

Then, we simply calculate the difference in balance across the window, \(G\), and record it. Then, we slide the window by one period at a time until we reach the end of the balance curve, calculating the change in balance across each window as we go.

At the completion of this process, we have a total of \(M\) values for balance changes across windows of length \(l\) (\(M\) is equal to the length of the backtest minus \(l\) plus one). Of these \(M\) values, \(N\) will show a greater drawdown than our current drawdown, \(D\). Then, the CBI, here denoted \(P\), is simply \[P = \frac{N}{M}\]

Here’s the code for calculating the CBI and plotting the empirical distribution from the backtest, for this special case where the strategy is underwater from the first day of live trading:

/* Cold Blood Index Special case of drawdown length equal to trade time That is, strategy underwater since inception */ int TradeDays = 60; // Days since live start and in drawdown var DrawDown = 20; // Current drawdown depth in account currency string BalanceFile = "Log\\simple_portfolio.dbl"; void Histogram(string Name,var Value,var Step,int Color) /* plots a histogram given a value and bin width */ { var Bucket = floor(Value/Step); plotBar(Name,Bucket,Step*Bucket,1,SUM+BARS+LBL2,Color); } void main() { var HistStep = 10; //bin width of histogram plotBar("Live Drawdown",DrawDown/HistStep,DrawDown,80,BARS|LBL2,BLACK); //mark current drawdown in histogram // import balance curve int CurveLength = file_length(BalanceFile)/sizeof(var); var *Balances = file_content(BalanceFile); // get number of samples int M = CurveLength - TradeDays + 1; // sliding window calculations var GMin=0, N=0; //define N as a var to prevent integer truncation in calculation of P int i; for(i=0; i<M; i++) { var G = Balances[i+TradeDays-1] - Balances[i]; if(G <= -DrawDown) N += 1.; if(G < GMin) GMin = G; Histogram("G", G, HistStep, RED); } var P = N/M; printf("\nTest period: %i days",CurveLength); printf("\nWorst test drawdown: %.f",-GMin); printf("\nSamples: %i\nSamples worse than observed: %i",M,(int)N); printf("\nCold Blood Index: %.1f%%",100*P); }

To use this script, you first need to save the profit and loss time series data from a backtest (also, the backtest will need to use the same money management approach as used in live trading).

Do that by setting Zorro’s

LOGFILEand

BALANCEparameters, which automatically save the backtest’s balance curve in Zorro’s log file.

Having saved the balance curves, in the CBI script above, make sure the string

BalanceFileis set correctly (line 11).

Here’s an example. Say we had been trading our strategy live and we were concerned about the performance of the *EUR/USD: rsi* component. It’s been trading for 60 days now, and that component is showing a drawdown of $20.

Plugging those values into the script above and loading that component’s backtested balance curve gives the following histogram:

The script also outputs to the Zorro GUI window some pertinent information. Firstly, that of 1,727 sample windows, 73 were worse than our observed drawdown. CBI is then calculated as \(73/1,727 = 0.04\), which may exceed some individuals’ threshold confidence level of 0.05 (remember a smaller CBI provides stronger evidence of a “broken” strategy). But this is somewhat arbitrary.

We can also run the CBI script with various values of

TradeDaysand

DrawDownto get an idea of what sort of drawdown would induce changes in the p-value.

The implementation of CBI above is for the special case where the strategy has been experiencing a drawdown since the first day of live trading.

Of course, this (hopefully) won’t always be the case!

**For drawdowns that come after some new equity high, calculation of the empirical distribution is a little tricker. **

Why?

Because we now have to consider the total trading time \(t\) as well as the drawdown time \(l\).

Reproducing this distribution of drawdowns faithfully would require traversing the balance curve using nested windows: an outer window of length \(t\) and an inner window of length \(l\) traversing the outer window, period by period, at every step of the outer window’s journey across the curve.

That is, for a backtest of length \(y\), we now have \((y-t+1)*(t-l+1)\) windows to evaluate.

Rather than perform that cumbersome operation, we can apply the same single rolling window process that was used in the simple case, combined with some combinatorial probability to arrive at a formula for CBI, which we denote \(P\), in terms of the previously defined parameters \(M, N, T\):

\[P = 1 – \frac{(M – N)!(M – T)!}{M!(M – N – T)!}\]

The obvious problem with this equation is that it potentially requires evaluation of factorials on the order of \((10^3)!\), which is approximately \(10^{2500}\), far exceeding the maximum range of

vartype variables.

To get around that inconvenience, we can take advantage of the relationships \[ln(1000 * 999 * 998 * … * 1) = ln(1000) + ln(999) + ln(998) + … + ln(1)\] and \[e^{ln(x)} = x\] to rewrite our combinatorial equation thus:

\[P = 1 – e^{x}\] where \[x = ln(\frac{(M – N)!(M – T)!}{M!(M – N – T)!}\] \[= ln((M – N)!) + ln((M – T)!) – ln(M!) – ln((M – N – T)!)\]

Now we can deal with those factorials using a function that recursively sums the logarithms of their constituent integers, which is much more tractable.

Here’s a script that verifies the equivalence of the two approaches for small integers, as well as its output:

var logsum(int n) { if(n <= 1) return 0; else return log(n)+logsum(n-1); } int factorial(int n) { if (n <= 1) return 1; if (n >= 10) { printf("%d is too big", n); return 0; } return n*factorial(n-1); } void main() { int M = 5; int N = 3; printf("\nEvaluate x = (%d-%d)!/%d!using\nfactorial and log transforms", M, N, M); printf("\nBy evaluating factorials directly,\nx = %f", (var)factorial(M-N)/factorial(M)); printf("\nBy evaluating sum of logs,\nx = %f", exp(logsum(M-N) - logsum(M))); } /* OUTPUT: Evaluate x = (5-3)!/5!using factorial and log transforms By evaluating factorials directly, x = 0.016667 By evaluating sum of logs, x = 0.016667 */

Here’s the script for the general case of the CBI (courtesy Johan Lotter):

/* Cold Blood Index General case of the CBI where DrawDownDays != TradeDays */ int TradeDays = 100; // Days since live start int DrawDownDays = 60; // Length of drawdown var DrawDown = 20; // Current drawdown depth in account currency string BalanceFile = "Log\\simple_portfolio.dbl"; var logsum(int n) { if(n <= 1) return 0; else return log(n)+logsum(n-1); } void main() { // import balance curve int CurveLength = file_length(BalanceFile)/sizeof(var); var *Balances = file_content(BalanceFile); // calculate parameters and check sufficient length int M = CurveLength - DrawDownDays + 1; int T = TradeDays - DrawDownDays + 1; if(T < 1 || M <= T) { printf("Not enough samples!"); return; } // sliding window calculations var GMin=0, N=0; //define N as a var to prevent integer truncation in calculation of P int i = 0; for(; i < M; i++) { var G = Balances[i+DrawDownDays-1] - Balances[i]; if(G <= -DrawDown) N += 1.; if(G < GMin) GMin = G; } var P; if(TradeDays > DrawDownDays) P = 1. - exp(logsum(M-N)+logsum(M-T)-logsum(M)-logsum(M-N-T)); else P = N/M; printf("\nTest period: %i days",CurveLength); printf("\nWorst test drawdown: %.f",-GMin); printf("\nM: %i N: %i T: %i",M,(int)N,T); printf("\nCold Blood Index: %.1f%%",100*P); }

Using the same drawdown length and depth as in the simple case, but now having traded for a total of 100 days, our new CBI value is 83%, which provides* no evidence* to suggest that the component has deteriorated.

For comparing drawdowns in live trading to those in a backtest, the CBI is useful. **But we can do better by incorporating statistical resampling techniques.**

Drawdown is a function of the *sequence* of winning and losing trades. However, a backtest represents just one realization of the numerous possible winning and losing sequences that could arise from a trading system with certain returns characteristics.

As such, the CBI presented above considers just one of many possible returns sequences that could arise from a given trading system.

So, we can make the CBI more robust by incorporating the algorithm into a Monte Carlo routine, such that many unique balance curves are created by randomly sampling the backtested trade results, and running the CBI algorithm separately on each curve.

The code for this Resampled Cold Blood Index is shown below, including calculation of the 5th, 50th and 95th percentiles of the resampled CBI values.

/* Resampled Cold Blood Index General case of the Resampled CBI where DrawDownDays != TradeDays */ int TradeDays = 100; // Days since live start int DrawDownDays = 60; // Length of drawdown var DrawDown = 20; // Current drawdown depth in account currency string BalanceFile = "Log\\simple_portfolio.dbl"; var logsum(int n) { if(n <= 1) return 0; else return log(n)+logsum(n-1); } void main() { int CurveLength = file_length(BalanceFile)/sizeof(var); printf("\nCurve Length: %d", CurveLength); var *Balances = file_content(BalanceFile); var P_array[5000]; int k; for (k=0; k<5000; k++) { var randomBalances[2000]; randomize(BOOTSTRAP, randomBalances, Balances, CurveLength); int M = CurveLength - DrawDownDays + 1; int T = TradeDays - DrawDownDays + 1; if(T < 1 || M <= T) { printf("Not enough samples!"); return; } var GMin=0., N=0.; int i=0; for(; i < M; i++) { var G = randomBalances[i+DrawDownDays-1] - randomBalances[i]; if(G <= -DrawDown) N += 1.; if(G < GMin) GMin = G; } var P; if(TradeDays > DrawDownDays) P = 1. - exp(logsum(M-N)+logsum(M-T)-logsum(M)-logsum(M-N-T)); else P = N/M; // printf("\nTest period: %i days",CurveLength); // printf("\nWorst test drawdown: %.f",-GMin); // printf("\nM: %i N: %i T: %i",M,(int)N,T); // printf("\nCold Blood Index: %.1f%%",100*P); P_array[k] = P; } var fifth_perc = Percentile(P_array, k, 5); var med = Percentile(P_array, k, 50); var ninetyfifth_perc = Percentile(P_array, k, 95); printf("\n5th percentile CBI: %.1f%%",100*fifth_perc); printf("\nMedian CBI: %.1f%%",100*med); printf("\n95th percentile CBI: %.1f%%",100*ninetyfifth_perc); }

Using the same drawdown length, drawdown depth and trade time as we evaluated in the single-balance curve example, we now find that our median resampled CBI is around 98%.

It turns out that the value obtained by only evaluating the backtest balance curve was closer to the 5th percentile (that is, the lower limit) of resampled values.

While this is not significant in this example (regardless of the method used, it is clear that the component has not deteriorated) this could represent valuable information if things were more extreme.

The usefulness of the resampled CBI declines for increasing backtest length, but it is easily implemented, comes at little additional compute time (thanks in part to Lite-C’s blistering speed), and provides additional insight into strategy deterioration by considering the random nature of individual trade results.

Note however that this method would break down in the case of a strategy that exhibited significant serially correlated returns, since resampling the backtest balance curve would destroy those relationships.

As interesting as this approach is, probably not….

In nearly all cases **this is giving the right answer to the wrong question:**

“Should I pull out of this strategy bdcause its live performance looks different to the backtest?”

To labour the points we made in the first post in this series, experienced traders know that it’s a bad idea to set performance expectations based on a backtest. Manageable deviations from that backtest performance usually won’t trigger any alarm bells that inspire interference with the strategy.

Instead, we make tea and look for other trades.

To succeed in trading you have to realise and accept the randomness and efficiency of the markets — part of which means sitting through rather uncomfortable drawdowns which won’t show up in R&D. Being disappointed by live performance vs your exciting backtest is very much the norm…. you just have to take it on the chin and trust the soundness of your development process. The markets are too efficient and chaotic to care about meeting our expectations.

*We call this “Embracing the Mayhem”.*

*Embracing the Mayhem *is just one of the 7 Trading Fundamentals we teach inside our Bootcamps. These Fundamentals show the approach we use to trade successfully with Robot Wealth, which is normally only learned after years of expensive trial, error and frustration.

**But you can skip all that — you can get access to a bunch of these Fundamental videos for free by entering your email below:**

The post A Quant’s Approach to Drawdown:<br> The Cold Blood Index appeared first on Robot Wealth.

]]>The post A Quant’s Approach to Drawdown: Part 1 appeared first on Robot Wealth.

]]>Systems go, let’s trade it.

Imagine this new strategy **enters a drawdown.…maybe a lengthy one….maybe from day one!**

How would you react to such a letdown?

A common response to a long or sharp drawdown is to defer to our self-preservative instincts and pull the strategy entirely. Maybe you’d compare your poor live performance to your promising backtest, turn red and frisbee your laptop out the window.

If you enjoy making money trading you’ll need to do better.

A relatively experienced systematic trader, who we’ll call *Jack the Quant,* would look at this drawdown scenario a bit differently.

**Yes I know Jack is actually a terribly drawn Napoleon Dynamite**

Anyway, Jack would rely on:

- an understanding (and acceptance) of the nature of the markets
- a sound systematic research process
- generally
*chilling the heck out*

We’re going to quickly talk about the latter two. You can learn about all three in depth here (or here for existing RW members).

Firstly, he’d go make some tea and relax.

Jack knows it would be trivial to show that a strategy with a true Sharpe ratio of 1.5 stands a reasonable chance of having a two-year drawdown. Jack also knows that the strategy he backtested to a Sharpe of 1.5 isn’t likely to deliver him that performance once live.

This implies that in order to decide whether the performance of Jack’s strategy is “unexpected”, he might have to *wait years. *

That’s assuming he has an accurate picture of what the “expected” performance might be – which he almost certainly *doesn’t* have.

Quite the conundrum, *isn’t it?*

This is why having a **sound research process to begin with** is so important to your success and sanity as a trader.

A sensible research process lets you gather enough evidence to make a smart bet that your strategy has a good chance of paying off in the long run, even if your strategy isn’t too hot out of the gates.

Satisfied with that evidence, Jack’s approach is to:

- size the strategy small
- chill out and let it do its thing for a couple of years
- in the meantime go hunting for other trades.

This might surprise most beginners who would assume a systematic trader would have some kind of automated process for decision making during drawdown. But, Jack’s approach stems from his understanding that trading is hard and that losing money sometimes is completely normal. Sometimes (or actually, most times), the solution is just to relax and keep moving.

Besides, Jack is smart and has structured his portfolio sensibly so that his risk premia tailwind acts as a buffer for when his alpha trades don’t go his way, which is *inevitable*. In the end, he still has trades on and is making money.

This is the less frenzied, more liveable way to run a trading operation.

So be like Jack — copy his well-rounded approach and keep your eyes open for new trades.

You bet!

Jack would down his tea quick smart and pull the pin if live trading demonstrated that his backtest assumptions had been violated so badly that his edge was destroyed. Maybe he was a tad optimistic in his execution assumptions during the research phase, and the live market showed that unexpected frictions are too much of a hurdle to surmount.

He might also pull out if the basis for the trade ceased to exist. For instance, say the trade was based on forced flows from ETF rebalancing. If the ETF Jack was trading changed its mandate in such a way that impacted those forced flows, he’d stop trading the strategy.

But that’s a pretty clear cut example. Often trading *isn’t clear cut at all*.

Say you were running a convergence trade betting that two related financial assets were linked through shared risk factors. It’s a complex question to decide if the dynamics of those risk factors have changed permanently.

In that case again, as in most cases, the best answer is to size it small, chill out, and find more stuff to trade.

Trading can be uncomfortable, you’ll often lose money before you make it….

….this is part of the game, and that’s OK! We just need to make more than we lose. We show you this inside our Bootcamps.

There’s always a temptation to compare live trading results with the backtest to decide whether or not to pull out. But this violates one of the most important tenets of successful systematic trading:

**Don’t use a backtest to define performance objectives!**

Think about it. You’ve spent days or weeks or longer researching and developing the strategy. Consider how many decision points you encountered along the way. Which universe of assets to trade. Whether to choose this parameter, that parameter, or both. No matter how careful and considered your development process, some amount of optimistic bias creeps in with each decision. This is a fact of life. Unavoidable. It’s part and parcel of strategy development. We *can’t not* introduce data mining bias. Which means that we can’t trust our backtest to set future performance objectives for us.

Don’t get me wrong, backtesting is probably the most powerful tool in your arsenal. The ability to acquire empirical evidence that an idea worked or not in the past is crucial. Backtesting gives you this ability.

Problems arise, however, when you expect live trading performance to match the backtest.

Again, sensible ol’ Jack knows this.

*Not entirely….*

There’s always the temptation to try and systematise decisions (they don’t call it systematic trading for nothing). The decision to pull a strategy based on live performance compared to a backtest is no exception. It’s comfortable to know that there’s an algorithm or a set of rules that’s got our back.

This is generally *not* a good idea.

Generally speaking, to repeat what we talked about earlier, a good systematic trader is likely to approach drawdown by:

- adopting a sound research process and sizing small – before drawdown ever occurs
- chilling out and letting the strategy go to work, even during said drawdown
- always looking for other opportunities to trade

Find yourself faced with a strategy that’s a bit underwater? Does it fly in the face of your backtest? Are you frustrated?

**It’s all part of the journey.**

*But all that said, maybe there IS some small utility in quantifying this approach *— we’ll investigate this in **part 2** of this series.

While we’re putting that together you can learn more about where exactly algo trading CAN help you trade more profitably by downloading the free PDF below….

The post A Quant’s Approach to Drawdown: Part 1 appeared first on Robot Wealth.

]]>The post Run Trading Like a Business, <br>Not a Board Game appeared first on Robot Wealth.

]]>“Why would I need to *recalibrate* to a practical approach? *What do you think I’m doing??”*

Maybe you’re doing fine, but many aspiring traders arrive at the Robot’s doorstep fantasising about hunting down that one perfect alpha, building complex ML models, mass data mining their way to bloodshot eyes, or some other sophisticated (and crucially, *theoretical*) means of untangling the markets with supreme intellectual capacity.

**Many aspiring traders won’t make money — spending years chasing their own tails over such low ROI pursuits.**

“Oh, is that a neural network?!”

Rather than focusing on a tried and tested approach, in practice, many aspiring traders follow the approaches with the *weakest history of success, *wrongly assuming that this is how you make money in the markets.

To make your life easier, we suggest you bookmark the fun distractions (at least until you have the luxury of being so experimental) in favour of the simpler, lower-hanging fruit that will grow your account.

Let’s see what this shift looks like.

Say you have *X* dollars of trading capital. As traders, what we’re trying to do is **maximise the chance of turning ****$X**** into more than ****$X****.**

This doesn’t seem like a revolutionary way of looking at things – but it seems to me that most traders aren’t focusing * ruthlessly *on this problem, certainly not at the expense of the fun intellectual challenges.

For instance, a* fun* approach might be to pull some EUR/USD exchange rate data or price data from the E-mini S&P futures, and mine for buy and sell rules that would have proved profitable in the past.

Another *fun* approach may be to try out a lot of indicators and rules in some backtesting software until something promising comes up. Maybe throw in an automated, cloud-based data mining setup. Hell, why not a genetic optimisation algorithm or neural network?

These are fun and exciting games to play, and I understand why traders gravitate towards them.

Individual retail traders tend to enjoy tackling complex problems. They also tend to assume that greater complexity equates to greater market potential. These two factors combined mean you’ll often walk right by the *more effective activities that will make your money grow.*

**At Robot Wealth our focus is using our time, resources and capital to get a return on our money – and so should you if you want to win.**

We teach this approach using several * fundamental truths* in our Bootcamps.

The one we’ll talk briefly about now is *Run it Like a Business — *because it’s the fundamental that becomes most immediately apparent when you make this shift from *fun -and-theoretical* to *money-making-and-practical* (you can get primers on the other 6 at the bottom of this post).

*Not that making money isn’t fun or anything…yeesh. *

When you shift focus to ruthlessly turning $x into >$x, a lot of real-world issues and logistical/infrastructure problems begin bubbling to the surface. Problems which hadn’t become relevant before, probably because growing your money by the most realistic, mundane means wasn’t your sole objective.

It’s like fantasising about buying your dream car. You imagine driving it, and the sense of accomplishment in attaining it. Yet part of that fantasy

likely doesn’tinclude the higher-octane fuel needs, pricey insurance policy and diligent maintenance required just to get it out of your driveway — these issues only present themselves once you’re in the driver’s seat, not in fantasy land.

Let’s think about what you need to run your ~~Lambo~~ systematic trading setup.

You need:

- capital to trade
- strategies to trade
- systems and infrastructure and processes for trading, performance reporting, reconciliation accounting, backtesting etc
- skills, knowledge and experience

Like any business, you need to set up your infrastructure and continuously refine and improve it. **Much of the results you get from your trading will come from how effectively you operate this infrastructure, not how secretive and special your alpha is.**

You have several resources in order to achieve this. First, and most obviously, is your **time. **

But, there are a lot of things to do! How will you split your time between:

- Researching ideas and developing trading strategies
- Development or setup of the other essential software and infrastructure
- The actual trading: monitoring, tweaking, and reconciling systems; executing semi-automated strategies etc
- Accounting and other admin work
- Reading about the markets and technologies and improving your knowledge and skills

Essentially, the work of an entire trading floor is in your hands. But don’t panic.

You’ll be relieved to know that you *don’t* need to make everything from scratch yourself and *you shouldn’t*. Leave that to the people who play *the fun games* at the expense of trading returns. You can buy software or pay others to make it for you.

Of course, there are some things you probably can’t do yourself even if you wanted to, such as streaming market data.

Okay, now that we’ve knocked those off, the task becomes….

How do you make the best use of your resources of time and money to improve your trading setup in order to maximise the return you make on your trading capital?

One of the main lessons we communicate in *Embrace the Mayhem* is that the market is very efficient and trading is hard.

So the prospect of trading in order to turn $X into more than $X should somewhat intimidate you. If you don’t find it intimidating then you are underestimating the efficiency of the market!

The path is straight and narrow, but your resources are finite, so **you need to be prioritising the highest ROI activities.**

High ROI activities include:

- Implementing new trading strategies within a proven framework. An example might be to implement a portfolio of pairs trades in the equity market.
- Scaling existing strategies to new instruments or markets. For example, porting the pair trading setup to a different international equity market.
- Well planned iterative research, set up in such a way that you can test and invalidate ideas quickly. This is the kind of research we show in the Bootcamps.

Low ROI activities include:

- Large scale data mining exercise or any research that involves a large amount of effort to be expended before the idea can be invalidated
- Looking for totally unique alpha ideas when you could be implementing simple trades within a proven framework
- Building your own backtesting platform. You probably think you’ll learn a ton doing this and you’re not wrong about that – but it’s going to suck a huge amount of your time on something that you can buy in cost-effectively.
- Building your own execution platform
- Re-writing stuff that already works in a different language.

Of course, there’s room for fun and creativity in every business. But, if you’re serious about trading, you want to prioritise the activities that are most effective and give yourself the *maximum chance of return* on your capital, time and expenses.

It’ll surprise you how few solo retail traders genuinely do this. Which is why most of them go round in circles for most of their trading careers without getting any closer to making real money.

I’ll repeat this point again because it’s so vital to your success as a trader:

**Many of the results you get from your trading will come from how effectively you operate and refine your trading business, focusing on high ROI activities, and not how secretive, complex and special your alpha is.**

You want to *Run it Like a Business.*

When you start thinking like this you’re going to look at your “business” and realise it’s not going to be very viable one on day 1.

*That’s okay. *

It’s going to have to be a labour of love to start with. Chances are you don’t have the capital, strategies, systems, skills, experience and other assets that you need to make this viable. Not today, anyway.

This is fine. Perfectly normal.

You *don’t* go from 0 to 100 on day 1. And you *don’t* need an exact step-by-step plan.

Work hard, concentrate on the fundamentals and keep your eyes open to the opportunities that will present themselves if you do this work well with clear eyes and an open heart.

Concentrate on:

- Growing your skills, intuition and experience. And building a library of trading frameworks, strategies, approaches and tools.
- Saving money to grow your trading capital
- Growing a network of other traders
- And prioritising effective high ROI work over vanity projects.

Real trading isn’t some board game or puzzle, it’s a **business** which needs multiples roles filled and processes refined.

You don’t have many resources as a solo trader, and you are competing with some of the best minds in the game. Why waste your time and energy on the stuff that won’t move you any closer to your goals?

Focus ruthlessly on turning $x in >$x (running it like a business) – the next problems for you to tackle will soon make themselves known.

*Run it Like a Business* is **just one** of the 7 Fundamentals we teach inside the first week of Algo Bootcamp, along with:

**and The (only 2) Ways to Make Money (we don’t have a nice picture for that one yet…)**

Want a walkthrough of **all of these?**

On the run-up to our next Algo Bootcamp release, you can discover the approach behind the above concepts — sent directly to you via email. **Simply join the Bootcamp waiting list below.**

The post Run Trading Like a Business, <br>Not a Board Game appeared first on Robot Wealth.

]]>