# How to Run Trading Algorithms on Google Cloud Platform in 6 Easy Steps

Earlier this year, I attended the Google Next conference in San Francisco and gained some first hand perspective into what’s possible with Google’s cloud infrastructure. Since then, I’ve been leaning on Google Cloud Platform (GCP) to run my trading algorithms (and more) and it has become an important tool in my workflow.

In this post, I’m going to show you how to set up a GCP cloud compute instance to act as a server for hosting a trading algorithm. I’ll also discuss why such a setup can be a good option and when it might pay to consider alternatives. But cloud compute instances are just a tiny fraction of the whole GCP ecosystem, so before we go any further, let’s take a high level overview of the various components that make up GCP.

# What is Google Cloud Platform?

GCP consists of a suite of cloud storage, compute, analytics and development infrastructure and services. Google says that GCP runs on the very same infrastructure that Google uses for its own products, such as Google Search. This suite of services and infrastructure goes well beyond simple cloud storage and compute resources, providing some very handy and affordable machine learning, big data, and analytics tools.

GCP consists of:

• Google Compute Engine: on-demand virtual machines and an application development platform.
• Google Storage: scalable object storage; like an (almost) infinite disk drive in the cloud.
• BigTable and Cloud SQL: scalable NoSQL and SQL databases hosted in the cloud.
• Big Data Tools:
• BigQuery: big data warehouse geared up for analytics
• DataFlow: data processing management
• DataProc: managed Spark and Hadoop service
• DataLab: analytics and visualization platform, like a Jupyter notebook in the cloud.
• Data Studio: for turning data into nice visualizations and reports
• Cloud Machine Learning: train your own models in the cloud, or access Google’s pre-trained neural network models for video intelligence, image classification, speech recognition, text processing and language translation.
• Cloud Pub/Sub: send and receive messages between independent applications.
• Management and Developer Tools: monitoring, logging, alerting and performance analytics, plus command line/powershell tools, hosted git repositories, and other tools for application development.
• More that I haven’t mentioned here!

The services and infrastructure generally play nicely with each other and with the standard open source tools of development and analytics. For example, DataLab integrates with BigQuery and Cloud Machine Learning and runs Python code. Google have tried to make GCP a self-contained, one-stop-shop for development, analytics, and hosting. And from what I have seen, they are succeeding.

## Introduction

Google Compute Engine (GCE) provides virtual machines (VMs) that run on hardware located in Google’s global network of data centres (a VM is simply an emulation of a computer system that provides the functionality of a physical computer). You can essentially use a VM just like you would a normal computer, without actually owning the requisite hardware. In the example below, I used a VM instance to:

• Host and run some software applications (Zorro and R) that execute the code for the trading system.
• Connect to a broker to receive market data and execute trades (in this case, using the Interactive Brokers IB Gateway software).

GCE allows you to quickly launch an instance using predefined CPU, RAM and storage specifications, as well as to create your own custom machine. You can also select from several pre-defined ‘images’, which consist of the operating system (both Linux and Windows options are available), its configuration and some standard software. What’s really nice is that that GCE enables you to create your own custom image that includes the software and tools specific to your use case. This means that you don’t have to upload your software and trading infrastructure each time you want to launch a new instance – you can simply create an instance from an image that you saved previously.

Before we get into a walk-through of setting up an instance and running a trading algorithm, I will touch on the advantages and disadvantages of GCE for this use case, as well as the cost.

## Pros and Cons of Running Trading Algorithms on Google Compute Engine

There’s a lot to like about using GCE for managing one’s trading infrastructure. But of course, there will always be edge cases where other solutions will be more suitable. I can only think of one (see below), but if you come up with more, I’d love to hear about them in the comments.

### Pros:

• GCE abstracts the need to maintain or update infrastructure, which allows the trader to focus on high-value tasks instead, like performance monitoring and further research.
• The cost of a cloud compute instance capable of running a trading algorithm is very reasonable (I’ll go into specifics below). In addition, you only pay for what you use, but can always increase the available resources if needed.
• Imaging: it is possible to create an ‘image’ of your operating system configuration and any applications/packages necessary to run your algorithm. You can start a new compute instance with that image without having to manually install applications and configure the operating system. This is a big time-saver.
• Scalability: if you find that you need more compute resources, you can add them easily, however this will interrupt your algorithm.
• Security: Google claim to have excellent security and employ a team of over 750 experts in that field, and take measures to protect the physical security of their data centres and the cybersecurity of their servers and software.
• Uptime: Google commits to providing 99.95% uptime for GCE. If that level of uptime isn’t met in any particular month, Google issues credit against future billing cycles.
• Access to other services: since the GCP services all play nicely together, you can easily access storage, data management, and analytical tools to compliment or extend a compute instance, or indeed to build a bigger workflow on GCP that incorporates data management, research and analytics.

### Cons:

• If your trading algorithm is latency sensitive, GCE may not be the best solution. While you do have some choice over where your algorithm is physically hosted, this won’t be ideal if latency is a significant concern. For the vast majority of independent traders, this is unlikely to be a deal-breaker, but it is certainly worth mentioning.

I was almost going to list security as a disadvantage, since it can be easy to assume that if security is not handled in-house, then it is a potential issue. However, one would think that Google would do security much better than any individual could possibly do (at least, that’s what you’d think after reading Google’s spiel on security), and that therefore it makes sense to include the outsourcing of security as an advantage. This issue might get a little more complicated for a trading firm which may prefer to keep security in-house, but for most individuals it probably makes sense to outsource it to an expert.

## Pricing

GCE is surprisingly affordable. The cost of hosting and running my algorithm is approximately 7.6 cents per hour, which works out to around $55 per month (if I leave the instance running 24/7) including a sustained use discount, which is applied automatically. Factoring the$300 of free credit I received for signing up for GCP, the first year’s operation will cost me about $360. This price could come down significantly, depending on the infrastructure I use, as I’ll explain below. I used an n1-standard-1 machine from GCE’s list of standard machine types. This machine type utilizes a single CPU and allocates 3.75 GB of memory, and I attached a 50GB persistent disk. This was enough to run my trading algorithm via the Zorro trading automation software (which requires Windows), executed through Interactive Brokers via the IB Gateway. The algorithm in question generates trading signals (for a portfolio of three futures markets) by processing hourly data with callouts to a feedforward neural network written as an R script, and it monitors tick-wise price data for micro-management of individual trades. The machine type I chose handled this job reasonably well, despite recommendations from Google’s automated monitoring that I assign some additional resources. These recommendations generally arise as a result of retraining my neural network, a task that proved to be more resource intensive than the actual trading. Thankfully, this only happens periodically and I have so far chosen to ignore Google’s recommendations without apparent negative consequence. I used a Windows Server 2016 image (since my trading application runs on Windows only) and a 50GB persistent disk, which is the minimum required to run such an image. The Windows Server image accounts for the lion’s share of the cost – approximately$29 per month.

### Step 2: Navigate to Google Compute Engine

Firstly, from the GCP homepage, navigate to your GCP Console via one of the options shown below:

Then, navigate to the Compute Engine dashboard like so:

### Step 3: Create a new VM instance

Simply click on Create in the VM Instances screen on your GCE dashboard, like so:

Then fill out the specs for your new instance. The specs I used look like this (you can see the cost estimate on the right):

I used one of GCP’s US east-coast data centres since IB’s servers are located in the New York area. My algorithm isn’t latency sensitive, but every little bit helps.

After clicking Create, the instance will take a few moments to spin up, then it will appear in your VM dashboard like so:

### Step 4: Set up access control

Next, you need to set up a password for accessing the instance.  Click the arrow next to RDP and select Set Windows password like so:

### Step 5: Connect and test

You can connect directly from the VM dashboard using Google’s remote desktop application for Chrome by clicking RDP (ensuring the correct VM is selected), or download the Windows RDP:

Once connected to the instance, it should look like a normal (although somewhat Spartan) Windows desktop. To test that it can connect to Interactive Brokers (IB), we are going to connect to IB’s test page. But first, we have to adjust some default internet settings. To do this, open Internet Explorer. Select the Settings cog in the top right of the browser then Internet Options, then Security, then Trusted Sites. Click the Sites button and add https://www.interactivebrokers.com to the list of trusted sites. Then save the changes. Here’s a visual from my instance:

Now, connect to IB’s test page to check that your instance can communicate with IB’s servers. Simply navigate to https://www.interactivebrokers.com/cgi-bin/conn_test.pl in Internet Explorer. If the instance is connecting properly, you should see a page that looks like this:

You can now upload your trading software and algorithm to your instance by simply copying and pasting from your home computer, or download any required software from the net. Note that to copy and paste from your home computer, you will need to access the instance using Windows RDP, not Chrome RDP (this may change with future updates to Chrome RDP).

### Gotcha: changing permissions of root directory for Windows Server:

I found that I wasn’t able to install R packages from script due to restrictions on accessing certain parts of the Windows file structure. To resolve this, I followed these steps:

• In Windows Explorer, navigate to the R installation directory and right-click it, then choose Properties.
• Go to the Security tab.
• Click Advanced, then Change Permissions.
• Choose This folder, subfolders and files under Applies to:
• Choose Full Control under Basic Permissions.
• Click OK.

### Step 6: Don’t’ forget to stop the instance!

If you need to stop trading your algorithm, it is usually a good idea to stop the instance so that you aren’t charged for compute resources that you aren’t using. Do so from the VM dashboard:

So long as you don’t delete the instance, you can always restart it from the same state at which it was stopped, meaning you don’t have to re-upload your software and scripts. You are not billed for an instance that has been stopped.

On the other hand, if you delete your instance and later want to restart, you will have to create a whole new instance and re-upload all your trading infrastructure. That’s where images come in handy: you can save an image of your setup, and then start an identical instance from the console. I’ll show you how to do that in another post.

# Concluding Remarks

## GCP on the Command Line

In this post I’ve demonstrated how to set up and run instances using the GCP Console. The same can be achieved using the Gcloud Command Line tool, which is worth learning to use if you start using GCP extensively thanks to the boost in productivity that comes with familiarity.

## Going Further

There’s a lot that can be done on GCP, including big data analytics and machine learning. We can also apply some simpler workflows to make our lives easier, such as creating custom images as mentioned above, or integrating with cloud storage infrastructure for managing historical data and using Data Studio for monitoring performance via attractive dashboard-style interfaces. I’m in a good position to show you the ropes on how to use these tools in your trading workflow, so if there is something in particular that you would like me to showcase, let me know in the comments.

Happy Googling!

Recently, Yahoo Finance – a popular source of free end-of-day price data – made some changes to their server which wreaked a little havoc on anyone relying on it for their algos or simulations. Specifically, Yahoo Finance switched from HTTP to HTTPS and changed the data download URLs. No doubt this is a huge source of frustration, as many backtesting and trading scripts that relied on such data will no longer work.

Users of the excellent R package quantmod  however are in luck! The package’s author, Joshua Ulrich, has already addressed the change in a development version of quantmod. You can update your quantmod  package to the development version that addresses this issue using this command in R:

devtools::install_github("joshuaulrich/quantmod", ref="157_yahoo_502")

Of course, you need the devtools  package installed, so do install.packages("devtools")  first if you don’t already have it installed.

Once the package updates,  quantmod::getSymbols(src = "yahoo")  should work just as it did prior to the updates on the Yahoo Finance server. I can verify that this worked for me.

Of course, if you don’t want to update quantmod to a version that lives on a Git branch, you can wait until the changes are merged into master and do

devtools::install_github("joshuaulrich/quantmod")

I don’t know when that will happen, but I have been using the branch version for a few days now, and all appears to be working as expected.

Update: A user suggested making use of the quantmod::adjustOHLC() function as the adjusted close of Yahoo data is currently incomplete, and doesn’t account for dividends. Example usage:

First time here? Check out our posts on machine learning in finance and our recent review of dual momentum as an investment strategy. Enjoy!

# Dual Momentum: A Review

I recently read Gary Antonacci’s book Dual Momentum Investing: An Innovative Strategy for Higher Returns with Lower Risk, and it was clear to me that this was an important book to share with the Robot Wealth community. It is important not only because it describes a simple approach to exploiting the “premier anomaly” (Fama and French, 2008), but because it is ultimately about approaching the markets with a critical, inquisitive mindset, while not taking oneself too seriously. I think we can all do with a dose of that sometimes.

Gary’s style is unique: this is the work of a free and critical thinker who is not afraid to question the status quo. While articulately drawing from a range of sources, from Shakespeare to Bacon and Einstein to Buffet (even Thomas Conrad’s 1970 book Hedgemanship: How to Make Money in Bear Markets, Bull Markets and Chicken Markets While Confounding Professional Money Managers and Attracting a Better Class of Women, which has got to be the greatest title in the history of trading books), Gary comes across as playful and slightly eccentric (which is wonderfully refreshing in a book about the markets). He derides economists who take themselves and their models too seriously (the line “for followers of CAPM, the real world was an annoying special case” almost made me fall off my chair), and importantly, he does this from the perspective of someone who has won the right to do so through hard-fought and won practical experience.

In this post I’ll describe some of the highlights from the book, including a description of Dual Momentum and Gary’s modular approach for exploiting it. I’ll also describe a variation of this approach and of course illustrate the performance of both approaches with some equity curves. Robot Wealth members have access to the code for implementing these systems and a research framework for additional experimentation.

There is a lot more to Gary’s book than the results I’m showing here, including detailed discussions around the existing momentum literature, the basis for momentum (it’s always nice to have a tangible reason for an anomaly to exist before committing capital to it) and suggestions for other Dual Momentum implementations. The book is highly recommended.

# Momentum, Relative and Absolute

The seminal work of Jagadeesh and Titman (1993) showed that relative momentum – that is, the returns of an asset in comparison to other assets – provides profitable trading opportunities which are largely robust to the parameters of the trading strategy that might be used to exploit them. They showed that the returns of relative momentum outperformed benchmark returns, however in order to harvest this out-performance, one must typically endure significant volatility, often only marginally better than the benchmark itself. For many active investors and managers, the reward may not justify the risk.

Antonacci (2012) published an extremely simple yet highly effective extension to relative momentum. He also looked at the absolute momentum of an asset – that is, the momentum of the asset relative to itself – and found that by combining the two types of momentum, he could reap the rewards of relative momentum investing while vastly reducing the volatility of the approach.

Antonacci’s Dual Momentum is extremely simple to implement and manage, requiring at most a few positional adjustments each month. This combined with its history of low volatility out-performance in my opinion makes it the perfect place to start for people who are new to trading, prior to investigating more complex strategies. It won’t shoot the lights out and make you rich overnight, but it has proven itself over time to be a robust way to outperform the market in the long term.

So why isn’t everyone doing this? My theory is that it is too simple. Most folks who decide they want to beat the markets like an intellectual challenge. They like to apply advanced quantitative methods or perform in-depth research into the fundamentals. Of course, I really like doing this too, and these methods can be handsomely profitable. But they take a huge amount of effort and usually a lot of frustration to get them right. Dual Momentum is like low hanging fruit. When you’re starting out, it makes sense to work on something that is both simple and robust, even if it doesn’t exactly satisfy that intellectual itch, and at least start getting some results. One learns a lot in the process.

# Dual Momentum Explained

Before we go any further, I’ll explain what is meant by “Dual Momentum” and how it might be applied.

Dual Momentum is about selecting assets that have both historically outperformed and also themselves generated a positive return. The first step in applying Dual Momentum is to compare the assets of interest against one another. If an asset has a higher return than another over the time period of interest, then it has positive relative momentum. We select the assets which have positive relative momentum for further analysis. Relative momentum thus acts as the initial filter.

Next, we look at the absolute momentum of individual assets. That is, we look at the performance of individual assets compared only to themselves. In simple terms, if an asset has a positive return over the time period of interest, its absolute momentum is positive, and if its return is negative, its absolute momentum is negative. Taking our assets with positive relative momentum, we would only consider buying those assets whose absolute momentum is also positive.

It is possible for an asset to have positive relative momentum and negative absolute momentum. For example, if the whole market was going down, the best performer in such a bear market would have positive relative momentum, but it might have negative absolute momentum. That is, it might have lost less than its peers. The Dual Momentum approach would prevent us buying such assets. Likewise, an asset might be going up and have positive absolute momentum, but if other assets performed better, it would have negative relative momentum. The Dual Momentum approach would force us into the assets that had both gone up and outperformed their peers.

In the description above, I referred to the long side only, but of course Dual Momentum could be applied to the short side in the same way. In my experience, a long only Dual Momentum strategy seems to perform better when applied to equities (which makes sense given the long-term upward bias in that asset class), but you may be able to apply it in a different way that I haven’t thought of, or apply it to a different asset class, in order to take advantage of the short side too.

# Implementing Dual Momentum

There are many ways to build a strategy that implements the Dual Momentum approach. Gary’s research shows that momentum works best when applied to geographically diversified equity indices. This research was backed up by Geczy and Samonov (2015), by way of a 215-year backtest!

We can’t easily invest directly in an index, so in this examination of Dual Momentum, we used Exchange Traded Funds (ETFs) that track the indices of interest. While this analysis is constrained by the formation date of the individual ETFs, we found that the results are generally in line with Gary’s results over the same period. Gary’s backtests using indices date back to 1974 and demonstrate a greater degree of out-performance than is evident across the relatively short (~10 years) backtest used here.

Below are the results of an ETF-based version of the modular approach described in Antonacci (2012), as well as a sector-rotation approach. Transaction costs and ETF distributions have not been included in these simulations, however would likely not have a significant impact on results given the typical holding period and trade frequency.

## Modular Dual Momentum

The modular approach to Dual Momentum is the one described in Antonacci (2012). This approach dictates that every month, we compare two related sectors or two parts of a single sector and select the better performer over the formation period (the prior twelve months). If the better performer has positive absolute momentum, we buy that asset. If the better performer has negative absolute momentum, we hold treasury bonds or investment grade bonds.

Gary provides some examples of “modules” to which one might apply Dual Momentum:

• Equities (US equities – international equities)
• Bonds (credit bonds – high yield bonds)
• Economic stress (gold – treasury bonds)
• REITS (mortgage reits – credit reits)

We could also look within other sectors for more ideas for modularizing Dual Momentum.

Download my R code – used to generate the following three Modular Dual Momentum equity curves  – here.

Here are the results for the equities and bonds modules from 2007 to March 2017. The ETFs used were SPY/CWI and CSJ/HYG respectively. We can see that over this period, Modular Dual Momentum resulted in returns that were comparable with the best performing component but with a fraction of the maximum drawdown. The results are also consistent with Gary’s finding that Dual Momentum tends to underperform during strong bull regimes or when the market rebounds, but thanks to its ability to weather bear markets, has outperformed in the long run. In his paper, Antonacci (2012) provides a backtest that extends back to 1974 and which better captures this long-term outperformance (as stated above, in this ETF implementation, we are constrained by the time that the ETFs have been in existence).

Finally, here are the performance charts of a Modular Dual Momentum portfolio consisting of a 60-40 split between equities and bonds modules.

## Dual Momentum Sector Rotation

This is a twist on Antonacci’s modular approach, and is actually the approach I prefer. We choose a universe of ETFs that represent various sectors, regions and asset classes, ranking them based on their return over the formation period and buy up to the best three ETFs whose absolute momentum is also positive. Essentially, this approach results in a sector rotation strategy that leverages the benefits of Dual Momentum. Robot Wealth members have access to the complete research environment for reproducing and experimenting with this Dual Momentum Sector Rotation strategy. Join up here.

The results below are for a Dual Momentum Sector Rotation for the following sectors:

• Stocks in the Russell 1000
• Stocks in the S&P 500
• Stocks listed on the FTSE
• European equities
• Japanese equities
• Asia-Pacific (ex-Japan) equities
• 7-10 Year Treasury Bonds
• 1-3 Year Treasury Bonds
• Gold

And the results using a 6-month formation period:

No doubt you can see why I prefer this approach over the modular one! Although to be fair, the parameters that I used are among the best parameter sets for this particular universe of ETF sectors. A good question to ask now would be “how robust is the strategy to changes in the parameter space?”. After all, the goal of strategy research is to discover robust strategies that perform well in the future, not to create the best backtests. We will soon be adding our suite of robustness testing tools to the members’ area to help answer this question.

# Important Caveats

Readers please note that my implementations have some important differences from the approach that Gary describes in his book and on his website. Gary’s equities implementation is called Global Equities Momentum (GEM) and you can see its performance and read the fine print here.  One of the key differences is that the trend of the US market determines the trend of all equities indices. In other words, if the return of the S&P500 is less than the return of short-term treasuries, we hold bonds regardless of the performance of foreign stocks. The reasoning for this is research referenced in Gary’s book that shows the US stock market leading global equities markets. In the short backtest posted here, there is only one month when this makes a difference, but it may be more significant in the long term.

Also, Gary explicitly advises against a sector rotation model. In his backtests back to 1974, he found that sector rotation outperformed the S&P500, but underperformed his GEM model. If you look at Gary’s results here, you can see that sector rotation outperformed GEM only intermittently (including the last few years of the simulation), with GEM coming out well on top over time. However, the sector rotation Gary describes on his website here is, I think, very different to my implementation in that it examines the individual sectors of the US market only. My implementation is more of a “global macro sector rotation”.

Gary is quite explicit that based on his research, the best application of Dual Momentum is the one presented in his book, or a similar one that focuses on equities, which have historically offered the highest risk premium. Readers should certainly familiarize themselves with Gary’s FAQ page in order to get the full story, or better yet, read the book!

# Conclusion

This article provided a description of Dual Momentum and presented results for two different implementations of Dual Momentum using ETFs. We found that in general, the results of our approximately 10-year ETF-based simulations were in line with Gary’s much longer index-based simulations, although the latter better demonstrate the long-term outperformance of the Dual Momentum approach. It is recommended to read the FAQ page on Gary’s website,or better yet, the book, to get the full story.

Robot Wealth members have access to the code that was used to generate the backtests shown above, which forms part of a larger research environment that can be easily modified and extended, for example by varying the instruments used in each module, thinking up other modules, and varying strategy parameters like the formation period and the number of ETFs held in the sector rotation version. There is lots of other useful content in the members’ area too, including a machine learning research framework, educational material for learning algorithmic trading, and an active and exclusive forum of like minded individuals. We would love to have you in the community – register here.

# References

Antonacci G. 2012, Risk Premia Harvesting through Dual Momentum

Fama, E. and K. French, 2008, Dissecting Anomalies, The Journal of Finance, 63, pg. 1653-1678.

Geczy, C and Samonov, M. 2015, 215 Years of Global Multi-Asset Momentum: 1800-2014 (Equities, Sectors, Currencies, Bonds, Commodities and Stocks)

Jagadeesh N. and Titman S. 1993, Returns to Buying Winners and Selling LosersJournal of Finance, Vol 48, Issue 1, pp.65-69

# Back to Basics Part 3: Backtesting in Algorithmic Trading

This is the final post in our 3-part Back to Basics series. You may be interested in checking out the other posts in this series:

Nearly all research related to algorithmic trading is empirical in nature. That is, it is based on observations and experience. Contrast this with theoretical research which is based on assumptions, logic and a mathematical framework. Often, we start with a theoretical approach (for example, a time-series model that we assume describes the process generating the market data we are interested in) and then use empirical techniques to test the validity of our assumptions and framework. But we would never commit money to a mathematical model that we assumed described the market without testing it using real observations, and every model is based on assumptions (to my knowledge no one has ever come up with a comprehensive model of the markets based on first principles logic and reasoning). So, empirical research will nearly always play a role in the type of work we do in developing trading systems.

So why is that important?

Empirical research is based on observations that we obtain through experimentation. Sometimes we need thousands of observations in order to carry out an experiment on market data, and since market data arrives in real time, we might have to wait a very long time to run such an experiment. If we mess up our experimental setup or think of a new idea, we would have to start the process all over again. Clearly this is a very inefficient way to conduct research.

A much more efficient way is to simulate our experiment on historical market data using computers. In the context of algorithmic trading research, such a simulation of reality is called a backtest. Backtesting allows us to test numerous variations of our ideas or models quickly and efficiently and provides immediate feedback on how they might have performed in the past. This sounds great, but in reality, backtesting is fraught with difficulties and complications, so I decided to write an article that I hope illustrates some of these issues and provides some guidance on how to deal with them.

# Why Backtest?

Before I get too deeply into backtesting theory and its practical application, let’s back up and talk about why we might want to backtest at all. I’ve already said that backtesting allows us to carry out empirical research quickly and efficiently. But why do we even need to do that? Everyone knows that we should just buy when the Relative Strength Index drops below 30, right?

OK, so that was obviously a rhetorical question. But I just wanted to highlight one of the subtly dangerous modes of thinking that can creep in if we are not careful. Now, I know that for the vast majority of Robot Wealth readers, I am preaching to the converted here, but over the last couple of years I’ve worked with a lot of individuals who have come to trading from non-mathematical or non-scientific backgrounds who struggle with this very issue, sometimes unconsciously. This is a good place to address it, so here goes.

In the world of determinism (that is, well-defined cause and effect), natural phenomena can be represented by tractable, mathematical equations. Engineers and scientists reading this will be well-versed for example in Newton’s laws of motion. These laws quantify a physical consequence given a set of initial conditions and are solvable by anyone with a working knowledge of high school level mathematics. The markets however are not deterministic (at least not in the sense that the information we can readily digest describes the future state of the market). That seems obvious, right? The RSI dropping below 30 does not portend an immediate price rise. And if price were to rise, it didn’t happen because the RSI dropped below 30. Sometimes prices will rise following this event, sometimes they will fall and sometimes they will do nothing. We can never tell for sure, and often we can’t describe the underlying cause, beyond more people buying than selling. Most people can accept that fact. However, I have observed time and again a paradox: namely, that the same person who accepts that markets are not deterministic will believe in a set of trading rules because they read them in a book or on the internet.

I have numerous theories about why this is the case, but one that stands out is that it is simply easy to believe things that are nice to believe. Human nature is like that. This particular line of thinking is extraordinarily attractive because it implies that if you do something (simple) over and over again, you will make a lot of money.  But that’s a dangerous trap to fall into. And you can even fall into it if your rational self knows that the markets are not deterministic, but you don’t question the assumptions underlying that trading system you read about.

I’m certainly not saying that all DIY traders fall into this trap, but I have noticed it on more than a few occasions. If you’re new to this game, or you’re struggling to be consistently profitable, maybe this is a good thing to think about.

I hope it is clear now why backtesting is important. Some trading rules will make you money; most won’t. But the ones that do make money don’t work because they accurately describe some natural system of physical laws. They work because they capture a market characteristic that over time produces more profit than loss. You never know for sure if a particular trade is going to work out, but sometimes you can conclude that in the long run, your chances of coming out in front are pretty good. Backtesting on past data is the one tool that can help provide a framework in which to conduct experiments and gather information that supports or detracts from that conclusion.

# Simulation versus Reality

You might have noticed that in the descriptions of backtesting above I used the words simulation of reality and how our model might have performed in the past. These are very important points! No simulation of reality is ever exactly the same as reality itself. Statistician George Box famously said “All models are wrong, but some are useful” (Box, 1976).  It is critical that our simulations be accurate enough to be useful. Or more correctly, we need our simulations to be fit for purpose, after all, a simulation of a monthly ETF rotation strategy may not need all the bells and whistles of a simulation of high frequency statistical arbitrage trading. The point is that any simulation must be accurate enough that it supports the decision-making process for a particular application, and by “decision making process” I mean the decisions around allocating to a particular trading strategy.

So how do we go about building a backtesting environment that we can use as a decision-support tool? Unfortunately, backtesting is not a trivial matter and there are a number of pitfalls and subtle biases that can creep in and send things haywire. But that’s OK, in my experience the people who are attracted to algorithmic trading are usually up for a challenge!

At its most simple level, backtesting requires that your trading algorithm’s performance be simulated using historical market data, and the profit and loss of the resulting trades aggregated. This sounds simple enough, but in practice it is incredibly easy to get inaccurate results from the simulation, or to contaminate it with bias such that it provides an extremely poor basis for making decisions. Dealing with these two problems requires that we consider:

1. The accuracy of our simulation; and
2. Our experimental methodology and framework for drawing conclusions from its results

Both these aspects need to be considered in order to have any level of confidence in the results of a backtest. I can’t emphasize enough just how important it is to ensure these concepts are taken care of adequately; compromising them can invalidate the results of the experiment. Most algorithmic traders spend vast amounts of time and effort researching and testing ideas and it is a tragic waste of time if not done properly. The next sections explore these concepts in more detail.

# Simulation Accuracy

If a simulation is not an accurate reflection of reality, what value is it? Backtests need to be as true a reflection of real trading as necessary to make them fit for their intended purpose. In theory, they should generate the very same trades with the same results as live trading the same system during the same time period.

In order to understand the accuracy of a backtest, you need to understand how the results are simulated and what the limitations of the simulation are. The reality is that no model can precisely capture the phenomena being simulated, but it is possible to build a model that is useful for its intended purposes. It is imperative that we create models that are as accurate a reflection of reality as possible, but equally that we are aware of the limitations. Even the most realistic backtesting environments have limitations.

Backtesting accuracy can be affected by:

• The parameters that describe the trading conditions (spread, slippage, commission, swap) for individual brokers or execution infrastructure. Most brokers or trading setups will result in different conditions, and conditions are rarely static. For example, the spread of a market (the difference between the prices at which the asset can be bought and sold) changes as buyers and sellers submit new orders and amend old ones. Slippage (the difference between the target and actual prices of trade execution) is impacted by numerous phenomena including market volatility, market liquidity, the order type and the latency inherent in the trade execution path. The method of accounting for these time-varying trading conditions can have a big impact on the accuracy of a simulation. The most appropriate method will depend on the strategy and its intended use. For example, high-frequency strategies that are intended to be pushed to their limits of capacity would benefit from modelling the order book for liquidity. That approach might be overkill for a monthly ETF rotation strategy being used to manage an individual’s retirement savings.
• The granularity (sampling frequency) of the data used in the simulation, and its implications. Consider a simulation that relies on hourly open-high-low-close (OHLC) data. This would result in trade entry and exit parameters being evaluated on every hour using only four data points from within that hour. What happens if a take profit and a stop loss were evaluated as being hit during the same OHLC bar? It isn’t possible to know which one was hit first without looking at the data at a more granular level. Whether this is a problem will depend on the strategy itself and its entry and exit parameters.
• The accuracy of the data used in the simulation. No doubt you’ve head the modelling adage “Garbage in, garbage out.” If a simulation runs on poor data, obviously the accuracy of the results will deteriorate. Some of the vagaries of data include the presence of outliers or bad ticks, missing records, misaligned time stamps or wrong time zones, and duplicates. Financial data can have its own unique set of issues too. For example, stock data may need to be adjusted for splits and dividends. Some data sets are contaminated with survivorship bias, containing only stocks that avoided bankruptcy and thus building in an upward bias in the aggregate price evolution. Over-the-counter products, like forex and CFDs, can trade at different prices at different times depending on the broker. Therefore a data set obtained from one source may not be representative of the trade history of another source. Again, the extent to which these issues are problems depends on the individual algorithm and its intended use.

The point of the above is that the accuracy of a simulation is affected by many different factors. These should be understood in the context of the strategy being simulated and its intended use so that sensible decisions can be made around the design of the simulation itself. Just like any scientific endeavour, the design of the experiment is critical to the success of the research!

As a practical matter, it is usually a good idea to check the accuracy of your simulations before deploying them to their final production state. This can be done quite easily by running the strategy live on a small account for some period of time, and then simulating the strategy on the actual market data on which it was run.  Regardless of how accurate you thought your simulator was, you will likely see (hopefully) small deviations between live trading and simulated trading, even when the same data is being used. Ideally the deviations will be within a sensible tolerance range for your strategy, that is, a range that does not significantly alter your decisions around allocating to the strategy. If the deviations do cause you to rethink a decision, then the simulator was likely not accurate enough for its intended purpose.

# Development Methodology

In addition to simulation accuracy, the experimental methodology itself can compromise the results of our simulations. Many of these biases are subtle yet profound: they can and will very easily creep into a trading strategy research and development process and can have disastrous effects on live performance. Accounting for these biases is critical and needs to be considered at every stage of the development process.

Robot Wealth’s course Fundamentals of Algorithmic Trading details a workflow that you can adopt to minimize these effects as well as an effective method of maintaining records which will go a long way to helping identify and minimize bias in its various forms. For now, I will walk through and explain the various biases that can creep in and their effect on a trading strategy. For a detailed discussion of these biases and a highly interesting account of the psychological factors that make accounting for these biases difficult, refer to David Aronson’s Evidence Based Technical Analysis (2006).

## Look-Ahead Bias, Also Known As Peeking Bias

This form of bias is introduced by allowing future knowledge to affect trade decisions. That is, trade decisions are affected by knowledge that would not have been available at the time the trade decision was taken. A good simulator will be engineered to prevent this from happening, it is surprisingly easy to allow this bias to creep in when designing our own backtesting tools. A common example is executing an intra-day trade on the basis of the day’s closing price, when that closing price is not actually known until the end of the day.

When using custom-built simulators, you will find that you will typically need to give this bias some attention. I commonly use the statistical package R and the Python programming language for various modelling and testing purposes and when I do, I find that I need to consider this bias in more detail. On the positive side, it is easy to identify when a simulation doesn’t properly account for it, because the resulting equity curve will typically look like the one shown below, since it is easy to predict an outcome if we know about the future!

Another way this bias can creep in more subtly is when we use an entire simulation to calculate a parameter and then retrospectively apply it to the beginning of the next run of the simulation. Portfolio optimization parameters are particularly prone to this bias.

## Curve-Fitting Bias, Also Known As Over-Optimization Bias

This is the bias that allows us to create magical backtests that produce annual returns on the order of hundreds of percent. Such backtests are of course completely useless for any practical trading purpose.

The following plots show curve-fitting in action. The blue dots are an artificially generated linear function with some noise added to distort the underlying signal. It has the form $$y=mx + b + ϵ$$ where $$ϵ$$ is noise drawn from a normal distribution, and we have 20 points in our data set. Regression models of varying complexity were fit to the first 10 points of the data set (the in-sample set) and these models were tested against the unseen data consisting of the remaining 10 points not used for model building (the out-of-sample set). You can see that as we increase the complexity of the model, we get a better fit on the in-sample data, but the out-of-sample performance deteriorates drastically. Further, the more complex the model, the worse the deterioration out of sample.

The more complex models are actually fitting to the noise in the data rather than the signal, and since noise is by definition random, the models that predict based on what they know about the in-sample noise perform horrendously out-of-sample. When we fit a trading strategy to an inherently noisy data set (and financial data is extremely noisy), we run the risk of fitting our strategy to the noise, and not to the underlying signal. The underlying signal is the anomaly or price effect that we believe provides profitable trading opportunities, and this signal is what we are actually trying to capture with our model. If we fit our model to the noise, we will essentially end up with a random trading model, which of course is of little use to anyone, except perhaps your broker.

## Data-Mining Bias, Also Known As Selection Bias

Data mining bias is another significant source of over-estimated model performance. It takes a number of forms and is largely unavoidable – therefore rather than eliminating it completely, we need to be aware of it and take measures to account for it. Most commonly, you will introduce it when you select the best performer from a number of possible algorithms, algorithm variants, variables or markets to take forward with your development process. If you try enough strategies and markets, you will eventually (relatively often actually) find one that performs well due to luck alone.

For example, say you develop a trend following strategy for the forex markets. The strategy does exceptionally well in its backtest on the EUR/USD market, but fails on the USD/JPY market. By selecting to trade the EUR/USD market, you have introduced selection bias into your process, and the estimate of the strategy’s performance on the EUR/USD market is now upwardly biased. You must either temper your expectations from the strategy, or test it on a new, unseen sample of EUR/USD data and use that result as your unbiased estimate of performance.

It is my experience that beginners to algorithmic trading typically suffer more from the effects of curve-fitting bias during the early stage of their journey. That may be because these effects tend to be more obvious and intuitive. Selection bias on the other hand can be just as severe as curve-fitting bias, but is not as intuitively easy to understand. A common mistake is to test multiple variants of a strategy on an out-of-sample data set and then select the best one based on the out-of-sample performance. While this is not necessarily a mistake per se, the mistake is treating the performance of the selected variant in the out-of-sample period as an unbiased estimate of future performance. This may not be the end of the world if only a small number of variants were compared, but what if we looked at hundreds or even thousands of different variants, as we might do if were using machine learning methods? Surely among hundreds of out-of-sample comparisons, at least one would show a good result by chance alone? In that case, how can we have confidence in our selected strategy?

This begs the question of how to account for this data-mining bias effect.  In practical terms, you will quickly run into difficulty if you try to test your strategy on new data after each selection or decision point since the amount of data at your disposal is finite. Other methods to account for data mining bias include comparing the performance of the strategy with a distribution of random performances, White’s Reality Check and its variations, and Monte Carlo Permutation Tests.

# Conclusion

This final Back to Basics article described why we take an experimental approach to algorithmic trading research and detailed the main barriers to obtaining useful, accurate and meaningful results from our experiments. A robust strategy is one that exploits real market anomalies, inefficiencies, or characteristics, however it is surprisingly easy to find strategies that appear robust in a simulator, but whose performance turns out to be due to inaccurate modelling, randomness or one of the biases described above. Clearly, we need a systematic approach to dealing with each of these barriers and ideally a workflow with embedded tools, checks and balances that account for these issues. I would go as far to say that in order to do any serious algorithmic trading research, or at least to do it in an efficient and meaningful way, we need a workflow and a research environment that addresses each of the issues detailed in this article. That’s exactly what we provide in Fundamentals of Algorithmic Trading, which was written in order to teach new or aspiring algorithmic traders how to operate the tools of the trade, and even more importantly, how to operate them effectively via such a workflow. If you would like to learn such a systematic approach to robust strategy development, head over to the Courses page to find out more.

# References

Aronson, D. 2006, Evidence Based Technical Analysis, Wiley, New York

Box, G. E. P. 1976, Science and Statistics, “Journal of the American Statistical Association” 71: 791-799

# Appendix – R Code for Demonstrating Overfitting

If the concept of overfitting is new to you, you might like to download the R code below that I used to generate the data and the plots from the overfitting demonstration above. It can be very useful for one’s understanding to play around with this code, perhaps generating larger in-sample/out-of-sample data sets, using a different model of the underlying generative process (in particular applying more or less noise), and experimenting with model fits of varying complexity. Enjoy!