Integrating R with the Zorro Backtesting and Execution Platform

In the last two posts, we implemented a Kalman filter in R for calculating a dynamic hedge ratio, and presented a Zorro script for backtesting and trading price-based spreads using a static hedge ratio.

The goal is to get the best of both worlds and use our dynamic hedge ratio within the Zorro script.

Rather than implement the Kalman filter in Lite-C, it’s much easier to make use of Zorro’s R bridge, which facilitates easy communication between the two applications. In this post, we’ll provide a walk-through of configuring Zorro and R to exchange data with one another.

Why integrate Zorro and R?

While Zorro and R are useful as standalone tools, they have different strengths and weaknesses.

Zorro was built to simulate trading strategies, and it does this very well. It’s fast and accurate. It lets you focus on your strategies by handling the nuts and bolts of simulation behind the scenes. It implements various tools of interest to traders, such as portfolio optimization and walk-forward analysis, and was designed to prevent common bugs, like lookahead bias.

Zorro does a lot, but it can’t do everything.

An overlooked aspect of the software is its ability to integrate R and its thousands of add-on libraries. From machine learning and artificial intelligence to financial modeling, optimization, and graphics, R packages have been developed to cover all these fields and more. And since R is widely used in academia, when a researcher develops a new algorithm or tool it is often implemented as an open source R package long before it appears in commercial or other open-source software.

Zorro’s R bridge unlocks these tools for your trading applications and combines them with Zorro’s fast and accurate market simulation features.

In this post, I’ll show you how to set up and use Zorro’s R bridge. Once that’s out of the way, we’ll be in a position to put all the pieces together and run a simulation of our pairs trade that uses the Kalman filter we wrote for R.

Some notes on usage

Zorro’s R bridge is designed to enable a Zorro script to control and communicate with an R environment running on the same machine. The assumption is that the user will want to send market data (sometimes lots of it) from Zorro to R for processing, and then return the output of that processing, usually consisting of just one or a small number of results, back to Zorro.

Lite-C is generally much faster than R code, so it’s preferable to perform as much computation on the Zorro side as possible, reserving R for computations that are difficult or inconvenient to implement in Zorro. Certainly, you’ll want to avoid doing any looping in R. Having said that, vector and matrix operations are no problem for R, and might even run quicker than in Lite-c.

Zorro orders time series data differently to most platforms – newest elements first. R’s functions generally expect time series with newest elements last. Fortunately Zorro implements the rev function for reversing the order of a time series, which we’ll need to use prior to sending data across to R. I’ll show you an example of how this works.

Finally, debugging R bridge functions requires a little care. For example, executing an R statement with a syntax error from Zorro will cause the R session to fall over and subsequent commands to also fail – sometimes silently. For basic debugging, you can return R output to Zorro’s GUI or use a debugging tool, as well as use an R bridge function for checking that the R sessions is still “alive” (more on these below). But it always pays to execute R commands in the R console before setting them loose from a Lite-C script.

How to make Zorro and R talk to each other

Assuming you have Zorro installed, here’s a walk-through of configuring Zorro and R to talk to one another.

1. Install R

Go to http://cran.r-project.org. and install R.

2. Tell Zorro about the path to R terminal

Open Zorro/Zorro.ini (or Zorro/ZorroFix.ini if using the persistent version of the configuration file) and enter the path to RTerm.exe for the RTermPath variable. This tells Zorro how to start an R session.

Here’s an example of the location of RTerm.exe:

And how the RTermPathsetting in Zorro.ini might look:

Of course, the path to RTerm.exe will be specific to your machine.

3. Run the test script

In Zorro/Strategy, you’ll find a script named RTest.c. Open a Zorro instance, select this script, and press Test. If  R is installed correctly and your Zorro.ini settings are correct, you should get output that looks like this:

If that worked as expected, then you’re ready to incorporate R functionality in your Zorro scripts. If the test script failed, most likely the path specified in Zorro.ini is incorrect.

Using the R bridge

Next, we’ll run through a brief tutorial with examples on using the R bridge functions. 

1. Add r.h header file to a Zorro script

To use the R bridge in your script, you need to include the r.h header file. Simply add this line at the beginning of your Zorro script:

#include <r.h>

2. Start an R session from your Zorro script

In order to use the other R bridge functions run Rstart() in the Zorro script’s INITRUN. Here’s the function’s general form: Rstart(string source, int debuglevel)

Both parameters are optional. source is an R file that is sourced at start up, and loads any predefined functions, variables or data that your R session will use.

We can also specify the optional debuglevel argument, which takes an integer value of either 0, 1, or 2 (0 by default) defining the verbosity of any R output, such as errors and warnings:

  • 0: output fatal errors only
  • 1: output fatal errors, warnings and notes
  • 2: output every R message (this is like the output you see in the R console).

You can use Microsoft’s Debug View tool to see the output of the R session. There’s a more convenient way to display the output of the R session directly in the Zorro GUI too – more on this shortly.

Rstart() returns zero if the R session could not be started, and returns non-zero otherwise. Therefore, we can use Rstart()  to check that the R session started.

This next script attempts to start a new R session via Rstart() ,  but raises the alarm and quits if unsuccessful.

#include <r.h>

function run()
{
    if(!Rstart("", 2)) 
    {
        print("Error - could not start R session!");
        quit();
    }
        
}

3. Check that the R session is still running

Rrun() checks the status of the R session and returns 0 if the session has been terminated or has failed, 1 if the session is ready for input, and 2 if the session is busy with a computation or operation. Use it regularly!

The R session will terminate upon encountering any fatal error (which can arise from a syntax error, unexpected data, and other issues that can arise in real time). But here’s the thing: if the R session is terminated, the R bridge simply stops sending messages and silently ignores further commands.

That means that your script will only throw an error if some Lite-C computation depends on data that wasn’t received back from the R bridge.

It’s a bad idea to assume that this will be picked up, so use Rrun() to check the status of your R connection – typically you’ll want to do this at the end of every bar in a backtest, and possibly prior to critical computations, raising an appropriate error when a failure is detected.

The script below builds on the previous example to also include a call to Rrun()  every bar:

#include <r.h>

function run()
{
    if(!Rstart("", 2)) 
    {
        print("Error - could not start R session!");
        quit();
    }
    
    
    if(!Rrun())
    {
        print("Error - R session has been terminated!");
        quit();
    
    }
        
}

4. Execute R code

Rx(string code, int mode) is a powerful function – it enables the execution of a line of R code directly from a Lite-C script. We simply provide the R code as a string (the code  argument, which can be up to 1,000 characters in length). Optionally, we can provide mode  which specifies how Rx() passes control back to Zorro during execution of code in R.

Normally, the Zorro GUI is unresponsive while the R bridge is busy; mode can modify this behaviour. It takes the following values:

  • 0: Execute code synchronously (that is, freeze Zorro until the computation is finished). This is the default behaviour.
  • 1: Execute code asynchronously, returning immediately and continuing to execute the Lite-C script. Since the R bridge can only handle one request at a time, you’ll need to use Rrun()  to determine when the next command can be sent to the R session. This is useful when you want to run R and Zorro computations in parallel.
  • 2: Execute code asynchronously, enabling the user to access the Zorro GUI buttons, and returning 1 when code  has finished executing and 0 when an error is encountered or the [Stop] button on the Zorro GUI is pressed. This is useful when your R computations take a long time, and you think you might want to interrupt them with the [Stop] button.
  • 3: Execute code asynchronously, like mode = 2, but also printing R output to Zorro’s message window. The verbosity of this output is controlled by the debuglevel argument to Rstart(); in order to output everything (that is, mimic the output of the R console), set debuglevel to 2. This is a convenient alternative to using the Debug Tool mentioned above.

Here’s a script that runs two lines of R code: one line that generates a vector of random normal numbers and calculates its mean; and another that prints the mean, returning the value to the Zorro GUI.

#include <r.h>

function run()
{
    if(!Rstart("", 2)) //enable verbose output
    {
        print("Error - could not start R session!");
        quit();
    }
    
    Rx("x <- mean(rnorm(100, 0, 1))", 0); //default mode: wait until R code completes
    Rx("print(x)", 3); //execute asynchronously, print output to debug view and Zorro GUI window 
        
    if(!Rrun())
    {
        print("Error - R session has been terminated!");
        quit();
    }
}

Here’s the output in the Zorro GUI:

You can see that with every iteration of the run function, Zorro tells R to generate a new vector of random numbers – hence the changing mean.

5. Send data from Zorro to R

To send data from Zorro to R, use Rset(string name, data_type, data_length).

On the R side, the data will be stored in a variable named name.

The actual usage of Rset() depends on what type of data is being sent from Zorro: a single int, a single float, or an array (or series) of float type variables. The latter can be sent to R as either a vector or a two-dimensional matrix.

When sending a single int or float to R, we simply specify the name of that variable.

For sending arrays, we need to specify a pointer to the array and either the number of elements (for sending the array to R as a vector) or the number of rows and columns (for sending the array to R as a matrix).

Specifying a pointer is not as scary as it sounds; in Lite-C we can simply use the name of the array or series, as these are by definition pointers to the actual variables.

Here are some examples of sending the different data types from Zorro to R:

#include <r.h>

function run()
{
    if(!Rstart("", 2)) //enable verbose output
    {
        print("Error - could not start R session!");
        quit();
    }
    
    // make some variables
    int today = dow();
    var last_return = 0.003;
    var my_params[5] = {2.5, 3.0, 3.5, 4.0, 4.5};
    
    // send those variables to R
    Rset("my_day", today);
    Rset("last_ret", last_return);
    Rset("params", my_params, 5); //specify number of elements
    
    // operate on those variables in the R session
    Rx("if(my_day == 1) x <- last_ret * params[1] else x <- 0", 0); //note params[1] is my_params[0] due to R's 1-based indexing and C's 0-based
    Rx("print(x)", 3);
    
    if(!Rrun())
    {
        print("Error - R session has been terminated!");
        quit();
    }   
}

In lines 11-14, we create some arbitrary variables named today (an int), last_return (a float) and my_params (an array of float). In lines 16-19, we send those variables to the R session, assigning them to R objects named my_day, last_ret, and params respectively. When we send the array my_params to the R session, we have to specify the number of elements in the array.

In line 22, we perform an operation on the variables in our R session. Note that R’s indexing is one-based, while C’s is zero-based, so if we want to access the value associated with my_params[0] in the R session, we need to use params[1].

Here’s an example of the output:

[thrive_leads id=’4848′]

Example: sending price data to the R session

Sending price data (or other time series, such as returns, indicators, and the like) follows a process like the one shown above, but there are one or two issues you need to be aware of.

First, during the lookback period, the values of such time series are undefined. Sending an undefined value via the R bridge will cause a fatal error and the subsequent termination of the R session. To get around this issue, we can wrap our calls to Rset() in an if condition which evaluates to True outside the lookback period: if(!is(LOOKBACK)).

The other problem is that Zorro’s time series are constructed with the newest values first. R functions expect time series data in chronological order with the newest elements last. That means that we need to reverse the order of our Zorro time series before sending them to R.

This is fairly painless, since Zorro implements the rev() function for that very purpose. Simply provide rev() with the time series to be reversed, and optionally the number of values to be sent to R (if this argument is omitted, LookBack values are used instead).

Here’s an example of sending price data to R that deals with these two issues:

#include <r.h>

function run()
{
    if(!Rstart("", 2)) //enable verbose output
    {
        print("Error - could not start R session!");
        quit();
    }
    
    vars Close = series(priceClose());
    int size = 20;
    vars revClose = rev(Close, size);
    
    if(!is(LOOKBACK))
    {
        printf("\n#########\nZorro's most recent close:\n%.5f", Close[0]);
        Rset("closes", revClose, size);
        Rx("last_close <- round(closes[length(closes)],5)", 0);
        printf("\nR's most recent close:\n");
        Rx("print(last_close, 5)", 3);
    }
        
    if(!Rrun())
    {
        print("Error - R session has been terminated!");
        quit();
    }
        
}

Here’s an example of the output:

6. Return data from R to Zorro

The three functions Ri(), Rd(), Rv() evaluate a given R expression, much like Rx(), but they return the result of the expression back to the Zorro session as either an int, float, or vector respectively. We can supply any variable, valid R code or function to Ri(), Rd(), Rv(), so long as it evaluates to the correct variable type.

Ri() and Rd() work in much the same way: we only need to supply an R expression as a string, and the functions return the result of the expression. This means that in the Lite-C script, we can set a variable using the output of Ri() or Rd().

For example, to define the variable my_var and use it to store the mean of the R vector my_data, we would do:

var my_var = Rd("mean(my_data)");

Rv() works in a slightly different way. We supply as arguments an R expression that evaluates to a vector, and we also supply a pointer to the Lite-C var array to be filled with the results of the R expression. We also supply the number of elements in the vector.

Here’s an example where we fill the float array my_vector with the output of R’s rnorm() function (which produces a vector of normally distributed random variables of a given length, mean and standard deviation):

var my_vector[10];
Rv("rnorm(10, 0, 1)", my_vector, 10);

Here’s an example where we put both of these together – we populate a vector in our Lite-C script with some random numbers generated in R. Then we send that vector back to R to calculate the mean before printing the results. Of course this is a very convoluted way to get some random numbers and their mean, but it illustrates the point:

#include <r.h>

function run()
{
    if(!Rstart("", 2)) //enable verbose output
    {
        print("Error - could not start R session!");
        quit();
    }
    
    var my_vector[10]; // initialise array of float
    
    if(!is(LOOKBACK))
    {
        Rv("rnorm(10, 0, 1)", my_vector, 10);
        Rset("my_data", my_vector, 10);
        var my_mean = Rd("mean(my_data)");
        
        int i;
        printf("\n#################");
        for(i=0; i<10; i++)
        {
            printf("\nmy_vector[%i]: %.3f", i, my_vector[i]);
        }
        printf("\nmean: %.3f", my_mean);
        
    }
        
    if(!Rrun())
    {
        print("Error - R session has been terminated!");
        quit();
    }
        
}

And here’s the output:

A General Workflow and Common Errors

The intent of Zorro’s R bridge is to:

  1. Facilitate the sending of large amounts of data from Zorro to R,
  2. Enable analysis of this data in R by executing R code from the Lite-C script, and
  3. Return single numbers or vectors from R to Zorro.

With that in mind, it makes sense to do as much of the data acquisition, cleaning and processing on the Lite-C side as possible. Save the R session for analysis that requires the use of specialized packages or functions not available in Zorro.

In particular, avoid executing loops in R (these can be painfully slow). But if operations can be vectorized, they may be more efficiently performed in R.

It is wise to test the R commands you supply to Rx(), Ri(), Rd(), and Rv() in an R console prior to running them in a Lite-C script. Any syntax error or bad data will cause the R session to terminate and all subsequent R commands to fail – potentially without raising a visible error. For that reason, use the Rrun() function regularly (at least once per bar) and keep on eye on the Debug View tool’s output, or the Zorro GUI.

A frozen Zorro instance is often indicative of an incomplete R command, such as a missing bracket. Such a mistake will not throw an error, but R will wait for the final bracket, causing Zorro to freeze.

Another common error is to attempt to load an R package that hasn’t been installed. This will cause the R session to terminate, so make sure your required packages are all installed before trying to load them. The source of the resulting error may not be immediately obvious, so keep an eye on the Debug View tool’s output.

Depending on your setup, the packages available to your R terminal may not be the same as those available in your R Studio environment (if you’re using that particular IDE).

Here’s a short R script that specifies some arbitrary required packages, checks if they are installed, and attempts to install them from CRAN if they are not already installed:

required.packages <- c('deepnet', 'caret', 'kernlab')
new.packages <- required.packages[!(required.packages %in% installed.packages()[,"Package"])]
if(length(new.packages)) install.packages(new.packages, repos='https://cran.us.r-project.org')

You can include this script in the file specified as the source parameter to Rstart() to ensure that your required packages are always present.

Another common issue with the R bridge arises from passing backslashes in file names from Lite-C to R. R uses forward slashes instead. You can modify these manually, or use Zorro’s slash() function, which automatically converts all backslashes in a string to forward slashes. For example, slash(ZorroFolder) returns the file path to the Zorro folder as a string, with forward slashes instead of backslashes.

Conclusion

OK, that was a lengthy tutorial, but it will be worth it!

So far we’ve used fairly simple R functions – stuff that you can easily do in Lite-C, like calcualting the mean of a bunch of numbers. But in the next post, we’ll put together our Zorro pairs tradng script that makes use of the Kalman filter that we wrote in R.

More importantly, if you can master the R bridge functions we’ve discussed, you’ll be able to use any R tool directly in your trading scripts.

[thrive_leads id=’10392′]

3 thoughts on “Integrating R with the Zorro Backtesting and Execution Platform”

  1. Maybe something that will be covered in a next piece, but your latter statement:
    “required.packages <- c('deepnet', 'caret', 'kernlab')
    new.packages <- required.packages[!(required.packages %in% installed.packages()[,"Package"])]
    if(length(new.packages)) install.packages(new.packages, repos='https://cran.us.r-project.org&#039;)"

    How should that be incorporated in the Z-code?

    The manual has an example (#2), but no example of the content of "MySignals.r" or "MyObjects.bin" is provided so Butterworth filter (pass.filt) from the dplR-library to work is challenging and this implementation differs from the one in Zorro. Any clues are welcome.

    Reply
    • Hi Norbert, that code you’re talking about is R code, not Zorro code. It belongs in the R script that you source at the outset. It’s just a convenient way to make sure you have all the required R packages installed and loaded, as this is a common bug (and one that will cause R to fail silently when called from a Zorro script).

      Hope that helps.

      Reply

Leave a Comment