# R

Holding data in a tidy format works wonders for one's productivity. Here we will explore the tidyr package, which is all about creating tidy data. In particular, let's develop an understanding of the tidyr::pivot_longer and tidyr::pivot_wider functions for switching between different formats of tidy data. In this video, you'll learn: What tidy data looks like Why it's a sensible approach The difference between long and wide tidy data How to efficiently switch between the two format When and why you'd use each of the two formats What's tidy data? Tidy data is data where: Every column is variable. Every row is an observation. Every cell is a single value. Why do we care? It turns out there are huge benefits to thinking about the “shape” of your data and the best way to structure and manipulate it for your problem. Tidy data is a standard way of shaping data that facilitates analysis. In particular, tidy data works very well with the tidyverse tools. Which means less time spent transforming and cleaning data and more time spent solving problems. In...

In this post, we are going to construct snapshots of historic S&P 500 index constituents, from freely available data on the internet. Why? Well, one of the biggest challenges in looking for opportunities amongst a broad universe of stocks is choosing what stock "universe" to look at. One approach to dealing with this is to pick the stocks that are currently in the S&P 500 index. Unfortunately, the stocks that are currently in the S&P 500 index weren't all there last year. A third of them weren't there ten years ago... If we create a historical data set by picking current S&P 500 index constituents, then we will be including historical data for smaller stocks that weren't in the index at that time. These are all going to be stocks that did very well, historically, or else they wouldn't have gotten in the index! So this universe selection technique biases our stock returns higher. The average past returns of current SPX constituents is higher than the average past returns of historic SPX constituents, due to this upward bias. It's easy...

To say we're living through extraordinary times would be an understatement. We saw the best part of 40% wiped off stock indexes in a matter of weeks, unprecedented co-ordinated central bank intervention on a global scale, and an unfolding health crisis that for many has already turned into a tragedy. As an investor or trader, what do you do? You manage your exposures the best you can, dial everything down, and go hunting for the opportunities that inevitably present themselves in a stressed out market. We've been hunting pretty much since this thing kicked off - and we want to show you what we found. And, more importantly, the tools and approach we used to find them. To that end, we are opening the gates to our Robot Wealth Pro community, a tight-knit network of independent traders with whom we share our firm's research, data, systematic trading strategies, and real-time ideas. We normally insist that you go through an introductory Bootcamp before joining our Pro team, but these are extraordinary times and we want to get after these opportunities as...

The vector autoregression (VAR) framework is common in econometrics for modelling correlated variables with bi-directional relationships and feedback loops. If you google "vector autoregression" you'll find all sorts of academic papers related to modelling the effects of monetary and fiscal policy on various aspects of the economy. This is only of passing interest to traders. However, if we consider that the VAR framework finds application in the modelling of correlated time series, the implication being that correlation implies a level of forecasting utility, then perhaps we could model a group of related financial instruments and make predictions that we can translate into trading decisions? So we'll give that a try. But first, a brief overview of VAR models. Overview of VAR models The univariate autoregression (AR) is a model of a time series as a function of past values of itself: (Y_t = \alpha + \beta_1 Y_{t-1}+ \beta_2 Y_{t-2} ) That's an AR(2) model because it uses two previous values in the time series (Y) to estimate the next value. The name of the game is figuring out how many...

Way back in November 2007, literally weeks after SPX put in its pre-GFC all-time high, Friedman, Hastie and Tibshirani published their Graphical Lasso algorithm for estimation of the sparse inverse covariance matrix. Are you suggesting that Friedman and his titans of statistical learning somehow caused the GFC by publishing their Graphical Lasso algorithm? Not at all. I'm just setting you up to demonstrate the fallacy of mistaking correlation with causation (thanks for playing along). Seeing patterns where there are none is part of what it means to be human. Of course, Friedman and his gang of statisticians didn't cause the GFC. But they did help us deal with our psychological flaws by providing us with a powerful tool for detecting spurious correlations. Their tool allows one to figure out if variables are correlated with one another directly, or whether any measured connection is merely due to a common connection to something else. Confusing? Let's look at an example Consider the two stocks ABB, a multinational industrial automation company, and PUK, a multinational life insurance company. Over the period 2015 to...

In the last two posts, we implemented a Kalman filter in R for calculating a dynamic hedge ratio, and presented a Zorro script for backtesting and trading price-based spreads using a static hedge ratio. The goal is to get the best of both worlds and use our dynamic hedge ratio within the Zorro script. Rather than implement the Kalman filter in Lite-C, it's much easier to make use of Zorro's R bridge, which facilitates easy communication between the two applications. In this post, we'll provide a walk-through of configuring Zorro and R to exchange data with one another. Why integrate Zorro and R? While Zorro and R are useful as standalone tools, they have different strengths and weaknesses. Zorro was built to simulate trading strategies, and it does this very well. It’s fast and accurate. It lets you focus on your strategies by handling the nuts and bolts of simulation behind the scenes. It implements various tools of interest to traders, such as portfolio optimization and walk-forward analysis, and was designed to prevent common bugs, like lookahead bias. Zorro does...

In our previous post, we looked into implementing a Kalman filter in R for calculating the hedge ratio in a pairs trading strategy. You know, light reading... We saw that while R makes it easy to implement a relatively advanced algorithm like the Kalman filter, there are drawbacks to using it as a backtesting tool. Setting up anything more advanced than the simplest possible vectorised backtesting framework is tough going and error-prone. Plus, it certainly isn't simple to experiment with strategy design - for instance, incorporating costs, trading at multiple levels, using a timed exit, or incorporating other trade filters. To be fair, there are good native R backtesting solutions, such as Quantstrat. But in my experience none of them let you experiment as efficiently as the Zorro platform. And as an independent trader, the ability to move fast - writing proof of concept backtests, invalidating bad ideas, exploring good ones in detail, and ultimately moving to production efficiently - is quite literally a superpower. I've already invalidated 3 ideas since starting this post The downside with Zorro is that...

This Kalman Filter Example post is the first in a series where we deploy the Kalman Filter in pairs trading. Be sure to follow our progress in Part 2: Pairs Trading in Zorro, and Part 3: Putting It All Together. Anyone who's tried pairs trading will tell you that real financial series don't exhibit truly stable, cointegrating relationships. If they did, pairs trading would be the easiest game in town. But the reality is that relationships are constantly evolving and changing. At some point, we're forced to make uncertain decisions about how best to capture those changes. One way to incorporate both uncertainty and dynamism in our decisions is to use the Kalman filter for parameter estimation. The Kalman filter is a state space model for estimating an unknown ('hidden') variable using observations of related variables and models of those relationships. The Kalman filter is underpinned by Bayesian probability theory and enables an estimate of the hidden variable in the presence of noise. There are plenty of tutorials online that describe the mathematics of the Kalman filter, so I won't...

This is the third in a multi-part series in which we explore and compare various deep learning tools and techniques for market forecasting using Keras and TensorFlow. In Part 1, we introduced Keras and discussed some of the major obstacles to using deep learning techniques in trading systems, including a warning about attempting to extract meaningful signals from historical market data. If you haven’t read that article, it is highly recommended that you do so before proceeding, as the context it provides is important. Read Part 1 here. Part 2 provides a walk-through of setting up Keras and Tensorflow for R using either the default CPU-based configuration, or the more complex and involved (but well worth it) GPU-based configuration under the Windows environment. Read Part 2 here. Part 3 is an introduction to the model building, training and evaluation process in Keras. We train a simple feed forward network to predict the direction of a foreign exchange market over a time horizon of hour and assess its performance. [thrive_leads id='4507'] . Now that you can train your deep learning models on a GPU, the fun can really start....

This is the second in a multi-part series in which we explore and compare various deep learning tools and techniques for market forecasting using Keras and TensorFlow. In Part 1, we introduced Keras and discussed some of the major obstacles to using deep learning techniques in trading systems, including a warning about attempting to extract meaningful signals from historical market data. If you haven't read that article, it is highly recommended that you do so before proceeding, as the context it provides is important. Read Part 1 here. Part 2 provides a walk-through of setting up Keras and Tensorflow for R using either the default CPU-based configuration, or the more complex and involved (but well worth it) GPU-based configuration under the Windows environment. Stay tuned for Part 3 of this series which will be published next week. CPU vs GPU for Deep Learning No doubt you know that a computer's Central Processing Unit (CPU) is its primary computation module. CPUs are designed and optimized for rapid computation on small amounts of data and as such, elementary arithmetic operations on a few numbers...