Simulating Variable FX Swaps in Zorro and Python

One of the ongoing research projects inside the Robot Wealth community involves an FX strategy with some multi-week hold periods. Such a strategy can be significantly impacted by the swap, or the cost of financing the position. These costs change over time, and we decided that for the sake of more accurate simulations, we would incorporate these changes into our backtests.

This post shows you how to simulate variable FX swaps in both Python and the Zorro trading automation software platform.

What is Swap?

The swap (also called the roll) is the cost of financing an FX position. It is typically derived from the central bank interest rate differential of the two currencies in the exchange rate being traded, plus some additional fee for your broker. Most brokers apply it on a daily basis, and typically apply three times the regular amount on a Wednesday to account for the weekend. Swap can be both credited to and debited from a trader’s account, depending on the actual position taken.

Why is it Important?

Swap can have a big impact on strategies with long hold periods, such as the typical momentum strategy. Therefore, accurately accounting for it is important in such cases.  Zorro’s default swap calculation relies on a constant derived from the Assets List used in the simulation, which is fine for most situations, but might lead to unrealistic results when the hold period is very long.

Simulating Variable Swaps in Zorro and in Python

Here’s some code for simulating historical swaps. It takes historical central bank data from the Bank of International Settlements, via Quandl. I’ve included code for the historical interest rates of the G8 countries – to get others, you just need the relevant Quandl code.

For the Zorro version, you’ll also need Zorro S, as the Quandl bridge is not available in the free version of Zorro. However, at the end of this article, I’ve also included a Python script for downloading the data from Quandl that you can save and then import into your backtesting platform. The advantage of the Zorro version is that you can access the relevant data from within a trading script via direct link to the Quandl API. That’s super convenient and all but eliminates the need to do any data wrangling at all. The advantage of the Python version is that it is completely free, but using the data in a trading script requires a little more messing around.

The Zorro Version

In order to access data from Quandl within Zorro, you’ll need a Quandl API key (get it from the Quandl website) and enter it in your ZorroFix.ini or Zorro.ini file.

Here’s the Zorro script:


One major thing to remember is that your FX broker won’t charge/pay swaps based on the exact interest rate differential. In practice, they might take some additional fat for themselves, or even adjust their actual swaps on the basis of perceived upside/downside volatility – and these may not even be symmetrical! The short story is that the broker’s cut will vary by broker, FX pair, and even by direction! You can verify that yourself by searching various brokers’ websites for their current swap rates.

So the upshot of all that is that if you want to include an additional broker fee in your simulation, recognise that it will be an estimate, do some research on what brokers are currently charging, and err on the conservative side. In the code above, the broker fee is set in line 44; you can also set this to zero if you like.

The trickiest part is converting the interest rate differential of the base-quote currencies to Zorro’s RollLong  and RollShort  variables – but the advantage is that once you get that right, Zorro will take care of simulating the roll for you – you literally won’t have to do another thing! These variables represent the swap in account currency per 10,000 traded FX units. Most of that conversion is taken care of the in the calculate_roll_long()  and calculate_roll_short()  functions in the code above. But these functions output the swap in units of the quote currency, not the account currency. This requires some more conversion.

The code also contains an example of converting the EUR/USD roll for an account denominated in AUD. This is accomplished from line 46.

Here’s the output of running the script. You can see how the swap for long and short trades has changed over time. At some point in 2014, it became a less expensive proposition to sell the EUR against the USD rather than buy it. You can also see that the value of the swap is constantly changing; that’s because the calculation considers the contemporaneous exchange rate of the account currency (AUD) against the quote currency (USD) of the pair being traded.

variable fx swap

The Python Version

Here’s a python script for downloading the same data set as used above (albeit with a longer history) from Quandl, and a function for calculating the swap. This time, the function calculates the swap per standard FX lot, which is 100,000 units of the quote currency (the Zorro script above calculates the swap per 10,000 units which is required for Zorro’s RollLong  and RollShort  variables).

Plotting the historical effective fed funds rate, you can see that the data set might have some problems prior to about 1985. You may need to smooth the data or remove outliers to use it effectively.

fed funds rate

We can simulate and plot the historical swap of the AUD/CAD exchange rate as follows:

Again, you can see some potential data issues prior to about 1990.


The cost of financing a long-term FX position can have a significant impact on the overall result of the trade. This post demonstrated a simple and inexpensive way to simulate the historical variable financing costs for FX.

*Data is the basis of everything we do as quant traders. Inside the Robot Wealth community, we show our members how to use this and other data for trading systems research in a way that goes much deeper than the basics we touched on here. But data is just one of the many algorithmic trading fundamentals we cover inside Class to Quant. Not only are our members improving their trading performance with our beginner to advanced courses, but together they’re building functioning strategies inside our community as part of our Algo Laboratory projects. If you’re interested and want to find out more, try Class to Quant for 30 days risk free. I’d love to meet you inside.

Fun with the Cryptocompare API

Cryptocompare is a platform providing data and insights on pretty much everything in the crypto-sphere, from market data for cryptocurrencies to comparisons of the various crytpo-exchanges, to recommendations for where to spend your crypto assets. The user-experience is quite pleasant, as you can see from the screenshot of their real-time coin comparison table:

cryptcurrency prices

As nice as the user-interface is, what I really like about Cryptocompare is its API, which provides programmatic access to a wealth of crypto-related data. It is possible to drill down and extract information from individual exchanges, and even to take aggregated price feeds from all the exchanges that Cryptocompare is plugged into – and there are quite a few!

Interacting with the Cryptocompare API

When it comes to interacting with Cryptocompare’s API, there are already some nice Python libraries that take care of most of the heavy lifting for us. For this post, I decided to use a library called cryptocompare . Check it out on Git Hub here.

You can install the current stable release by doing pip install cryptocompare , but I installed the latest development version direct from Git Hub, as only that version had support for minutely price history at the time of writing.

To install the dev version from Git Hub, do:

This version will limit you to one month’s worth of daily price data and one week’s worth of hourly data. If you’re feeling adventurous, you can install the version that I forked into my own Git Hub account and modified to increase those limits. To do that, you’ll need to do:

Now that we’ve got our library of API functions, let’s take a look at what we can do with Cryptocompare!

List all Available Coins

To get a list of all the coins available on Cryptocompare, we can use the following Python script:

At the time of writing, this returned a list of 2,609 coins! By comparison, there are around 2,800 stocks listed on the New York Stock Exchange.

Coins by Market Capitalisation

Let’s focus on the biggest players in crypto-world: the coins with the largest market capitalisation.

We can get price data for a list of coins using the function cryptocompare.get_price()  and if we specify full=True , the API will return a whole bunch of data for each coin in the list, including last traded price, 24-hour volume, number of coins in circulation, and of course market capitalisation.

Cryptocompare’s API will only allow us to pass it a list of coins that contains no more than 300 characters at any one time. To get around that limitation, we’ll pass lists of 50 coins at a time, until we’ve passed our entire list of all available coins.

The API returns a json string, which we can interpret as a dictionary in Python. Note that the outer-most keys in the resulting dictionary are 'RAW'  and 'DISPLAY'  which hold the raw data and data formatted for better displaying respectively. In our case, we prefer to work with the raw data, so we’ll keep it and discard the rest.

Here’s the code for accomplishing all that:

coin_data  now contains a whole bunch of dictionaries-within-dictionaries that hold our data. Each outer key corresponds to a coin symbol, and looks like this:

That 'USD'  key is common to all the coins in coin_data  and it specifies the counter-currency in which prices are displayed. That key is going to be troublesome when we turn our dictionary into a more analysis-friendly data structure, like a pandas DataFrame , so let’s get rid of it:

Now we can go ahead and create a DataFrame  from our coin_data  dictionary and sort it by market capitalization:

All good so far, but interrogating this data by doing coin_data['MKTCAP'].head(20)  reveals that the coin with the highest market cap is something called AMO:

Wouldn’t we expect that honour to go to Bitcoin, with symbol BTC? And what about all those other coins that you’ve probably never heard of? What’s going on here?

It turns out that Cryptocompare includes data for coins that haven’t yet gone to ICO, and it appears that in such cases, the market capitalisation calculation is done using the pre-ICO price of the coin, and its total possible supply of coins.

That’s going to skew things quite significantly, so let’s exclude any coins from our list that haven’t traded in the last 24 hours. We can get this information from the TOTALVOLUME24H  field, which is the total amount the coin has been traded in 24 hours against all its trading pairs:

coin_data['MKTCAP'].head()  now looks a lot more sensible:

Get Market Data for Top Coins

We can get the last month’s historical daily data for the 100 top coins by market cap, stored as a dictionary of DataFrames, by doing the following:

And we can access the data for any coin in the dictionary by doing df_dict[coin]  where coin is the symbol of the coin we interested in, such as ‘BTC’. Now that we have our data, we can do some fun stuff!

You will need to use the version of cryptocompare  from my Git Hub repo (see above) in order to get enough to data to reproduce the examples below. In that case, once you’ve downloaded my version, just replace line 5 in the script above with

Some Basic Analysis

First, let’s pull out all the closing prices from each DataFrame  in our dictionary:

Plot some prices from 2017, an interesting year for cryptocurrency, to say the least:

Plot some returns series from the same period:

A correlation matrix:

And finally, a scatterplot matrix showing distributions on the diagonal:

There’s lots more interesting analysis you can do with data from Cryptocompare, before we even do any backtesting, for example:

  • Value of BTC and other major coins traded through the biggest exchanges over time – which exchanges are dominating?
  • Top coins traded by fiat currency – do some fiats gravitate towards certain cryptocurrencies?
  • Are prices significantly different at the same time across exchanges – that is, are arbitrage opportunities present?1


In this post, I introduced the Cryptocompare API and some convenient Python tools for interacting with it. I also alluded to the depth and breadth of data available: over 2,000 coins, some going back several years, broken down by exchange and even counter-currency. I also showed you some convenient base-Python and pandas data structures for managing and interrogating all that data. In future blog posts, we’ll use this data to backtest some crypto trading strategies.


*Data is the basis of everything we do as quant traders. Inside the Robot Wealth community, we show our members how to use this and other data for trading systems research in a way that goes much deeper than the basics we touched on here. But data is just one of the many algorithmic trading fundamentals we cover inside Class to Quant. Not only are our members improving their trading performance with our beginner to advanced courses, but together they’re building functioning strategies inside our community as part of our Algo Laboratory projects. If you’re interested and want to find out more, try Class to Quant for 30 days risk free. I’d love to meet you inside.

ETF Rotation Strategies in Zorro

At Robot Wealth we get more questions than even the most sleep-deprived trader can handle. So whilst we develop the algo equivalent of Siri and brag about how we managed to get 6 hours downtime last night, we thought we’d start a new format of blog posts — answering your most burning questions.

Lately our Class to Quant members have been looking to implement rotation-style ETF and equities strategies in Zorro, but just like your old high-school essays, starting is the biggest barrier. These types of strategies typically scan a universe of instruments and select one or more to hold until the subsequent rebalancing period. Zorro is my go-to choice for researching and even executing such strategies: its speed makes scanning even large universes of stocks quick and painless, and its scripting environment facilitates fast prototyping and iteration of the algorithm itself – once you’ve wrestled it for a while (get our free Zorro for Beginners video course here).

I’m going to walk you through a general design paradigm for constructing strategies like this with Zorro, and demonstrate the entire process with a simple rotation algorithm based on Gary Antonacci’s Dual Momentum. By the end you should have the skills needed to build a similar strategy yourself. Let’s begin!

ETF Rotation Strategy Design Paradigm

To construct a rotation style strategy in Zorro, we’d follow these general design steps:

  1. Construct your universe of instruments by adding them to an assets list CSV file. There are examples in Zorro’s History folder, and I’ll put one together for you below.
  2. Set up your rebalancing period using Zorro’s time and date functions.
  3. Tell Zorro to reference the asset list you just created using the assetList  command.
  4. Loop through each instrument in the list and perform whatever calculations or analysis your strategy requires for the selection of which instruments to hold.
  5. Compare the results of the calculations/analysis performed in the prior step and construct the positions for the next period.

That’s pretty much it! Of course, the details of each step might differ slightly depending on the algorithm, and you will also need some position sizing and risk management, but in general, following these steps will get you 90% of the way there.

Not happy trading with a 90% complete strategy? No problem, let’s look at what this looks like in practice.

An Example

This example is based on Gary Antonacci’s Dual Momentum. We will simplify Gary’s slightly more nuanced version to the following: if US equities outperformed global equities and its return was positive, hold US equities. If global equities outperformed US equities and its return was positive, hold global equities. Otherwise, hold bonds.

Gary has done a mountain of research on Dual Momentum and found that it has outperformed for decades. In particular, it has tended to kick you out of equities during extended bear markets, while still getting you in for most of the bull markets. Check out Gary’s website for more information and consider getting hold of a copy of his book – you can read my review here.

Our simplified version of the strategy will use a universe of three ETFs that track US equities, global equities and short-term bonds. We will use the returns of these ETFs for both generation of our trading signals and actual trading (Gary’s approach is slightly more nuanced than that – again check out his website and book for more details).

Step 1: Construct our asset list

Our asset list contains the universe of instruments we wish to scan. In our case, we only need three ETFs. We’ll choose SPY for our US equities instrument, EFA for our global equities and SHY for our bonds ETF.

Zorro’s asset lists are CSV files that contain a bunch of parameters about the trading conditions of each instrument. This information is used in Zorro’s simulations, so it’s important to make it as accurate as possible. In many cases, Zorro can populate these files for us automatically by simply connecting to a broker, but in others, we need to do it manually (explained in our Zorro video course).

Our asset list for this strategy will look like this:

You can see that most of the parameters are actually the same for each instrument, so we can use copy and paste to make the construction of this file less tedious than it would otherwise be. For other examples of such files, just look in Zorro’s History folder.

Save this file as a CSV file called AssetsDM.csv and place it in your History folder (which is where Zorro will go looking for it shortly).

Step 2: Set up rebalance period

Here we are going to rebalance our portfolio every month. We decided to avoid the precise start/end of the month and rebalance on the third trading day of the month. You can experiment with this parameter to get a feel for how much it affects the strategy.

Simply wrap the trading logic in the following if()  statement:

Step 3: Tell Zorro about your new asset list

In the initial run of the script, we want Zorro to reference the newly created asset list. Also, if we don’t have data for these instruments, we want to download it in the initial run. We’ll use Alpha Vantage end-of-day data, which can be accessed directly from within Zorro scripts. These lines of code take care of that for us:

Note that this assumes you’ve entered your Alpha Vantage API key in the Zorro.ini or ZorroFix.ini configuration files, which live in Zorro’s base directory. If you don’t have an Alpha Vantage API key head over to the Alpha Vantage website to claim one.

Step 4: Loop through instruments and perform calculations

For our dual momentum strategy, we need to know the return of each instrument over the portfolio formation period. So we can loop through each asset in our list, calculate the return, and store it in an array for later use.

If you intend on using Zorro’s optimizer, perform the loop operation using a construct like:

If you don’t intend on using the optimizer, you can safely use the more convenient while(loop(Assets))  construct.

The reason we don’t use the latter in an optimization run is that the loop()  function is handled differently in Zorro’s Train mode, and will actually run a separate simulation for each instrument in the loop. This is perfect in the instance we want to trade a particular algorithm across multiple, known instruments – something like a moving average crossover traded on each stock in the S&P500, where we wanted to optimize the moving average periods separately for each instrument2. But in an algorithm that compares and selects instruments from a universe of instruments, optimizing some parameter set on each one individually wouldn’t make sense.

This is actually a really common mistake when developing these type of strategies in Zorro, but if you understand the behavior of loop()  in Zorro’s Train mode, it’s one that you probably won’t make again.

Here’s the code for performing the looped return calculations:

Step 5 Compare the results and construct the positions for the next period.

Recalling our dual momentum trading logic, we firstly check if US equities outperformed global equities. If so, we then check that their absolute return was positive. If so, then we hold US equities. If global bonds outperformed US equities, we check that their absolute return was positive. If so, then we hold global equities. If neither US equities nor global equities had a positive return, we hold bonds.

If you stop and think about that logic, we are really just holding the instrument with the highest return in the formation period, with the added condition that for the equities instruments, they also had a positive absolute return. We could implement that trading logic like so:

This is probably the most confusing part of the script, so let’s talk about it in some detail. Firstly, the line

returns an array of the indexes of the Returns  array, sorted from lowest to highest. Say our Returns  array held the numbers 4, -2, 2. Our array idx  would contain 1, 2, 0 because the item at Returns[1]  is the lowest number, followed by the number at Returns[2] , with Returns[0]  being the highest number. This might seem confusing, but it will provide us with a convenient way to access the highest ranked instrument directly from the Assets array, which holds the names of the instruments in the order called by our loop()  function.

In lines 5-17, we firstly use this feature to exit any open positions that aren’t the highest ranked asset – provided those lower ranked assets aren’t bonds. Remember, we might want to hold a bond position, even if it isn’t the highest ranked asset. So we won’t exit any open bond positions just yet.

Next, in line 20, we switch the highest ranked instrument. If that instrument is bonds, we don’t bother checking the absolute return condition (it doesn’t apply to bonds) and go long. If that instrument is one of the equities ETFs, we check the absolute return condition. If that turns out to be true, we enter a long position in that ETF, then switch to bonds and exit any open position we may have been holding.

Finally, if the absolute return condition on our top-ranked equities ETFs wasn’t true, we switch to bonds and enter a long position.

Position Sizing

In this case we are simply going to be fully invested with all of our starting capital and any accrued profits in the currently selected instrument. Here’s the code for accomplishing that:

Note that this is only possible because we are trading these instruments with no leverage (leverage is defined in the asset list above). If we were using leverage, we’d obviously have to reduce the amount of margin invested in a given position.

Putting it all Together

Finally, here’s the complete code listing for our simple Dual Momentum algorithm. In order for the script to run, remember to save a copy of the asset list in Zorro’s History folder, and enter your Alpha Vantage API key in the Zorro.ini or ZorroFix.ini configuration files.

Over the simulation period, the strategy returns a Sharpe Ratio of 0.52. That’s pretty healthy for something that trades so infrequently. In terms of gross returns, the starting capital of $10,000 was almost tripled, and the maximum drawdown was approximately $4,700. One of the main limitations of the strategy is that by design, it is highly concentrated, taking only single position at a time.

Here’s the equity curve:


Rotation style strategies* require a slightly different design approach than strategies for whom the tradable subset of instruments is static. By following the five broad design principles described here, you can leverage Zorro’s speed, power and flexibility to develop these types of strategies. Good luck and happy profits!

*This is just one of the many algorithmic trading fundamentals we cover inside Class to Quant. Not only are our members improving their trading performance with our beginner to advanced courses, but together they’re building functioning strategies inside our community as part of our Algo Laboratory projects. If you’re interested and want to find out more, try Class to Quant for 30 days risk free. I’d love to meet you inside.

Where to from here?

  • Check out my review of Gary Antonacci’s Dual Momentum, and explore some other variations written in R
  • Get our free Zorro for Beginners video series, and go from beginner to Zorro trader in just 90 minutes
  • If you’re ready to go deeper and get more practical tips and tricks on building robust trading systems, as well as joining our strong community of traders, check out our flagship offer Class to Quant.

Deep Learning for Trading Part 4: Fighting Overfitting with Dropout and Regularization

This is the fourth in a multi-part series in which we explore and compare various deep learning tools and techniques for market forecasting using Keras and TensorFlow.

In Part 1, we introduced Keras and discussed some of the major obstacles to using deep learning techniques in trading systems, including a warning about attempting to extract meaningful signals from historical market data. If you haven’t read that article, it is highly recommended that you do so before proceeding, as the context it provides is important.

Part 2 provides a walk-through of setting up Keras and Tensorflow for R using either the default CPU-based configuration, or the more complex and involved (but well worth it) GPU-based configuration under the Windows environment.

Part 3 is an introduction to the model building, training and evaluation process in Keras. We train a simple feed forward network to predict the direction of a foreign exchange market over a time horizon of one hour and assess its performance.


In the last post, we trained a densely connected feed forward neural network to forecast the direction of the EUR/USD exchange rate over a time horizon of one hour. We landed on a model that predicted slightly better than random on out of sample data. We also saw in our learning plots that our network started to overfit badly at around 40 epochs. In this post, I’m going to demonstrate some tools to help fight overfitting and push your models further. Let’s get started.


Regularization is a commonly used technique to mitigate overfitting of machine learning models, and it can also be applied to deep learning. Regularization essentially constrains the complexity of a network by penalizing larger weights during the training process. That is, by adding a term to the loss function that grows as the weights increase.

Keras implements two common types of regularization:  

  • L1, where the additional cost is proportional to the absolute value of the weight coefficients
  • L2, where the additional cost is proportional to the square of the weight coefficients

These are incredibly easy to implement in Keras: simply pass  regularizer_l2(regularization_factor)  or regularizer_l2(regularization_factor)  to the kernal_regularizer  argument in a Keras layer instance (details on how to do this below), where regularization_factor * abs(weight_coefficient)  or regularization_factor * weight_coefficient^2  is added to the total loss, depending on the type of regularization chosen.

Note that in Keras speak, 'kernel'  refers to the weights matrix created by a layer. Regularization can also be applied to the bias terms via the argument bias_regularizer  and the output of a layer by activity_regularizer .

Getting smarter with our learning rate

When we add regularization to a network, we might find that we need to train it for more epochs in order to reach convergence. This implies that the network might benefit from a higher learning rate during early stages of model training.2

However, we also know that sometimes a network can benefit from a smaller learning rate at later stages of the training process. Think of it like the model’s loss being stuck halfway down the global minimum, bouncing from one side of the loss surface to the other with each weight update. By reducing the learning rate, we can make the subsequent weight updates less dramatic, which enables the loss to ‘fall’ further down towards the true global minimum.

By using another Keras callback, we can automatically adjust our learning rate downwards when training reaches a plateau:

This tells Keras to reduce the learning rate by a factor of 0.9 whenever validation accuracy doesn’t improve for patience  epochs. Also note the epsilon  parameter, which controls the threshold for measuring the new optimum. Setting this to a higher value results in fewer changes to the learning rate. This parameter should be on a scale that is relevant to the metric being tracked, validation accuracy in this case.

Putting it together

Here’s the code for an L2 regularized feed forward network with both  reduce_lr_on_plateau and model_checkpoint callbacks (data import and processing is the same as in the previous post):

Plotting the training curves now gives us three plots – loss, accuracy and learning rate:

This particular training process resulted in an out of sample accuracy of 53.4%, slightly better than our original unregularized model. You can experiment with more or less regularization, as well as applying regularization to the bias terms and layer outputs.


Dropout is another commonly used tool to fight overfitting. Whereas regularization is used throughout the machine learning ecosystem, dropout is specific to neural networks. Dropout is the random zeroing (“dropping out”) of some proportion of a layer’s outputs during training. The theory is that this helps prevents pairs or groups of nodes from learning random relationships that just happen to reduce the network loss on the training set (that is, result in overfitting). Hinton and his colleagues, the discoverers of dropout, showed that it is generally superior to other forms of regularization and improves model performance on a variety of tasks. Read the original paper here.2

Dropout is implemented in Keras as it’s own layer, layer_dropout() , which applies dropout on it’s inputs (that is, on the outputs of the previous layer in the stack). We need to supply the fraction of outputs to drop out, which we pass via the rate  parameter. In practice, dropout rates between 0.2 and 0.5 are common, but the optimal values for a particular problem and network configuration need to be determined through appropriate cross validation.

At the risk of getting ahead of ourselves, when applying dropout to recurrent architectures (which we’ll explore in a future post), we need to apply the same pattern of dropout at every timestep, otherwise dropout tends to hinder performance rather than enhance it.3

Here’s an example of how we build a feed forward network with dropout in Keras:

Training the model using the same procedure as we used in the L2-regularized model above, including the reduce learning rate callback, we get the following training curves:

One of the reasons dropout is so useful is that it enables the training of larger networks by reducing their propensity to overfit. Here’s the training curves for a similar model but this time eight layers deep:

Notice that it doesn’t overfit significantly worse than the shallower model. Also notice that it didn’t really learn any new, independent relationships from the data – this is evidenced by the failure to beat the previous model’s validation accuracy. Perhaps 53% is the upper out of sample accuracy limit for this data set and this approach to modeling it.

With dropout, you can also afford to use a larger learning rate, which means it is a good idea to make use of the reduce_lr_on_plateau  callback and kick off training with a higher learning rate, which can always be decayed as learning stalls.

Finally, one important consideration when using dropout is constraining the size of the network weights, particularly when a large learning rate is used early in training. In the Hinton et. al. paper linked above, constraining the weights was shown to improve performance in the presence of dropout.

Keras makes that easy thanks to the kernel_constraint  parameter of layer_dense() :

This model provided an ever-so-slight bump in validation accuracy:

And quite a stunning test-set equity curve:

Interestingly, every experiment I performed in writing this post resulted in a positive out of sample equity curve. The results were all slightly different, even when using the same model setup, which reflects the non-deterministic nature of the training process (two identical networks trained on the same data can result in different weights, depending on the initial, pre-training weights of each network). Some equity curves were better than others, but they were all positive.

Here are some examples:

With L2-weight regularization and no dropout:

With a dropout rate of 0.2 applied at each layer, no regularization, and no weight constraints:

Of course, as mentioned in the last post, the edge of these models disappears when we apply retail spreads and broker commissions, but the frictionless equity curves demonstrate that deep learning, even using a simple feed-forward architecture, can extract predictive information from historical price action, at least for this particular data set, and that tools like regularization and dropout can make a difference to the quality of the model’s predictions.

What’s next?

Before we get into advanced model architectures, in the next unit I’ll show you:

  1. One of the more cutting edge architectures to get the most out of a densely connected feed forward network.
  2. How to interrogate and visualize the training process in real time.


This post demonstrated how to fight overfitting with regularization and dropout using Keras’ sequential model paradigm. While we further refined our previously identified slim edge in predicting the EUR/USD exchange rate’s direction, in practical terms, traders with access to retail spreads and commission will want to consider longer holding times to generate more profit per trade, or will need a more performant model to make money with this approach.

Where to from here?

  • To find out why AI is taking off in finance, check out these insights from my days as an AI consultant to the finance industry 
  • If this walk-through was useful for you, you might like to check out another how-to article on running trading algorithms on Google Cloud Platform
  • If the technical details of neural networks are interesting for you, you might like our introductory article 
  • Be sure to check out Part 1Part 2, and Part 3 of this series on deep learning applications for trading. 
  • If you’re ready to go deeper and get more practical tips and tricks on building robust trading systems, consider becoming a Robot Wealth member