The post Practical Pairs Trading appeared first on Robot Wealth.

]]>While you can, in theory, create mean reverting portfolios from as many instruments as you like, this post will largely focus on the simplest case: pairs trading.

Pairs trading involves buying and selling a portfolio consisting of two instruments. The instruments are linked in some way, for example they might be stocks from the same business sector, currencies exposed to similar laws of supply and demand, or other instruments exposed to the same or similar risk factors. We are typically long one instrument and short the other, making a bet that the value of this long-short portfolio (the spread) has deviated from its equilibrium value and will revert back towards that value.

One of the major attractions of pairs trading is that we can achieve market neutrality, or something close to it. Because the long and short positions offset each other, pairs trading can be somewhat immune to movements of the overall market, thus eliminating or reducing market risk – theoretically at least.

The fact that we can construct artificial spreads with mean-reverting properties is one of the major attractions of this style of trading. But there are some drawbacks too.

For starters, if a series was mean reverting in the past, it may not be mean reverting in the future. Constructed spreads typically mean revert when random, non-structural events affect the value of the components. A good spread combined with a good trading strategy will capture these small opportunities for profit consistently. On the other hand, when a *structural* *shift* occurs – such as a major revaluation of one asset but not the other – such a strategy will usually get burned quite badly.

How can we mitigate that risk? Predicting the breakdown of a spread is very difficult, but a sensible way to reduce the risk of this approach is to trade a diverse range of spreads. As with other types of trading, diversity tends to be your friend. I like to have an economic reason for the relationship that links the components, but to be totally honest, I’ve seen pairs trading work quite well between instruments that I couldn’t figure a relationship for. If you understand the relationship, you may be in a better position to judge when and why the spread might break down. Then again, you might not either. Breakdowns tend to happen suddenly and without a lot of warning.

Probably the best example of what not to do with a pairs trading strategy is the Long Term Capital Management meltdown. I won’t go into the details here, but there is plenty written about this incident and it makes a fascinating and informative case study.

When two or more non-stationary series can be combined to make a stationary series, the component series are said to be **cointegrated**. One of the challenges of pairs trading is to determine the coefficients that define this stationary combination. In pairs trading, that coefficient is called the hedge ratio, and it describes the amount of instrument B to purchase or sell for every unit of instrument A. The hedge ratio can refer to a dollar value of instrument B, or the number of units of instrument B, depending on the approach taken. Here, we will largely focus on the latter approach.

The Cointegrated Augmented Dickey Fuller (CADF) test finds a hedge ratio by running a linear regression between two series, forms a spread using that hedge ratio, then tests the stationarity of that spread. In the following examples, we use some R code that runs a linear regression between the price series of Exxon Mobil and Chevron to find a hedge ratio and then tests the resulting spread for stationarity.

First, download some data, and plot the resulting price series (here we use the Robot Wealth data pipeline, which is a tool we wrote for members for efficiently getting prices and other data from a variety of sources):

# Preliminaries library(urca) source("data_pipeline.R") prices <- load_data(c('XOM', 'CVX'), start='2014-01-01', end = '2017-01-01', source='av', return_data = 'Adjusted', save_indiv = TRUE) plot(prices, col=c('blue', 'red'), main = 'XOM and CVX') legend('bottomright', col=c('blue', 'red'), legend=c('XOM', 'CVX'), lty=1, bty='n')

Next, we create a scatter plot of our price series, which will indicate whether there is an underlying relationship between them, and fit a linear regression model using ordinary least squares:

# scatter plot of prices plot(coredata(prices[, 'XOM.Adjusted']), coredata(prices[, 'CVX.Adjusted']), col='blue', xlab='XOM', ylab='CVX') # linear regression of prices fit <- lm(coredata(prices[, 'CVX.Adjusted']) ~ coredata(prices[, 'XOM.Adjusted'])) summary(fit) ''' # output: # Call: # lm(formula = coredata(prices[, "CVX.Adjusted"]) ~ coredata(prices[, # "XOM.Adjusted"])) # # Residuals: # Min 1Q Median 3Q Max # -9.6122 -2.2196 -0.3546 2.4747 9.3366 # # Coefficients: # Estimate Std. Error t value Pr(>|t|) # (Intercept) -41.39747 1.88181 -22.00 <2e-16 *** # coredata(prices[, "XOM.Adjusted"]) 1.67875 0.02309 72.71 <2e-16 *** # --- # Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 # # Residual standard error: 3.865 on 754 degrees of freedom # Multiple R-squared: 0.8752, Adjusted R-squared: 0.875 # F-statistic: 5287 on 1 and 754 DF, p-value: < 2.2e-16 ''' clip(min(coredata(prices[, 'XOM.Adjusted'])), max(coredata(prices[, 'XOM.Adjusted'])), min(coredata(prices[, 'CVX.Adjusted'])), max(coredata(prices[, 'CVX.Adjusted']))) abline(fit$coefficients, col='red')

The line of best fit over the whole data set has a slope of 1.68, which we’ll use as our hedge ratio. Notice however while that line is the *global* line of best fit, there are clusters where this line isn’t the *locally* best fit. We’ll likely find that points in those clusters occur close to each other in time, which implies that a dynamic hedge ratio may be useful. We’ll return to this idea later, but for now, we’ll use the slope of the global line of best fit.

Next, we construct and plot a spread using the hedge ratio we found above:

# construct and plot spread hedge <- fit$coefficients[2] spread <- prices[, 'CVX.Adjusted'] - hedge*coredata(prices[, 'XOM.Adjusted']) plot(spread, main='CVX-XOM Spread', col='black')

The resulting spread is clearly much more mean-reverting than either of the underlying price series, but let’s test that observation using the ADF test:

# ADF TEST adf <- ur.df(spread, type = "drift", selectlags = 'AIC') summary(adf) ''' ############################################### # Augmented Dickey-Fuller Test Unit Root Test # ############################################### # # Test regression drift # # # Call: # lm(formula = z.diff ~ z.lag.1 + 1 + z.diff.lag) # # Residuals: # Min 1Q Median 3Q Max # -3.8638 -0.5122 0.0037 0.5102 7.0391 # # Coefficients: # Estimate Std. Error t value Pr(>|t|) # (Intercept) -1.081908 0.375640 -2.880 0.00409 ** # z.lag.1 -0.026377 0.009032 -2.920 0.00360 ** # z.diff.lag -0.023002 0.036596 -0.629 0.52984 # --- # Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 # # Residual standard error: 0.9464 on 751 degrees of freedom # Multiple R-squared: 0.01263, Adjusted R-squared: 0.01 # F-statistic: 4.803 on 2 and 751 DF, p-value: 0.008459 # # # Value of test-statistic is: -2.9204 4.3109 # # Critical values for test statistics: # 1pct 5pct 10pct # tau2 -3.43 -2.86 -2.57 # phi1 6.43 4.59 3.78 '''

Observe that the value of the test statistic is -2.92, which is significant at the 95% level. Thus we reject the null of a random walk and assume our series is stationary.

You might have noticed that since we used ordinary least squares (OLS) to find our hedge ratio, we will get a different result depending on which price series we use for the dependent (y) variable, and which one we choose for the independent (x) variable. The different hedge ratios are *not* simply the inverse of one another, as one might reasonably and intuitively expect. When using this approach, it is a good idea to test both spreads using the ADF test and choosing the one with the most negative test statistic, as it will be the more strongly mean reverting option (at least in the past).

As an alternative OLS, we can use total least squares (TLS), which accounts for the variance in both series, while OLS accounts for the variance in only one (this is why we get different hedge ratios depending on our choice of dependent variable). TLS is symmetrical, and will give the same result regardless of our choice of dependent variable. In practical terms, the hedge ratios obtained by OLS and TLS usually won’t differ greatly, but when they *do* differ, that difference is likely to be significant. So it is worth including the TLS approach in your analysis.

Implementing TLS in R is straightforward using principal components analysis (PCA). Here’s the syntax:

# TLS for computing hedge ratio pca <- princomp(~ coredata(prices[, 'CVX.Adjusted']) + coredata(prices[, 'XOM.Adjusted'])) tls_hedge <- pca$loadings[1, 1]/pca$loadings[2, 1] tls_spread <- prices[, 'CVX.Adjusted'] - tls_hedge*coredata(prices[, 'XOM.Adjusted']) plot(tls_spread, main='CVX-XOM Spread', col='black')

The TLS approach results in a hedge ratio of 1.86 and a spread that is not all that different from our OLS hedge ratio. If we flipped the dependent and independent variables, our hedge ratio would simply the inverse of the one we just calculated.

Let’s see if it results in a more significant ADF test result:

# ADF TEST adf <- ur.df(tls_spread, type = "drift", selectlags = 'AIC') summary(adf) ''' # output ############################################### # Augmented Dickey-Fuller Test Unit Root Test # ############################################### # # Test regression drift # # # Call: # lm(formula = z.diff ~ z.lag.1 + 1 + z.diff.lag) # # Residuals: # Min 1Q Median 3Q Max # -4.0533 -0.5579 0.0100 0.5905 7.3852 # # Coefficients: # Estimate Std. Error t value Pr(>|t|) # (Intercept) -1.779796 0.545639 -3.262 0.00116 ** # z.lag.1 -0.031884 0.009693 -3.289 0.00105 ** # z.diff.lag -0.022874 0.036562 -0.626 0.53176 # --- # Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 # # Residual standard error: 1.057 on 751 degrees of freedom # Multiple R-squared: 0.01577, Adjusted R-squared: 0.01315 # F-statistic: 6.017 on 2 and 751 DF, p-value: 0.002556 # # # Value of test-statistic is: -3.2894 5.4476 # # Critical values for test statistics: # 1pct 5pct 10pct # tau2 -3.43 -2.86 -2.57 # phi1 6.43 4.59 3.78 '''

Our test statistic is slightly more negative than that resulting from the spread constructed using OLS.

The speed at which our spread mean reverts has implications for its efficacy in a trading strategy. Faster mean reversion implies more excursions away from, and subsequent reversion back to the mean. One estimate of this quantity is the **half-life of mean reversion**, which is defined for a continuous mean reverting process as the average time it takes the process to revert half-way to the mean.

We can calculate the half-life of mean reversion of our spread using the following

half_life()function:

half_life <- function(series) { delta_P <- diff(series) mu <- mean(series) lag_P <- Lag(series) - mu model <- lm(delta_P ~ lag_P) lambda <- model$coefficients[2] H <- -log(2)/lambda return(H) } H <- half_life(tls_spread)

In this example, our half life of mean reversion is 21 days.

The Johansen test is another test for cointegration that generalizes to more than two variables. It is worth being familiar with in order to build mean reverting portfolios consisting of three or more instruments. Such a tool has the potential to greatly increase the universe of portfolios available and by extension the diversification we could potentially achieve. It’s also a data-miner’s paradise, with all the attendant pitfalls.

The

urcapackage implements the Johansen test via the

ca.jo()function. The test can be specified in several different ways, but here’s a sensible specification and the resulting output (it results in a model with a constant offset, but no drift) using just the CVX and XOM price series:

jt <- ca.jo(prices, type="trace", K=2, ecdet="const", spec="longrun") summary(jt) ''' output ###################### # Johansen-Procedure # ###################### Test type: trace statistic , without linear trend and constant in cointegration Eigenvalues (lambda): [1] 1.580429e-02 2.304504e-03 -3.139249e-20 Values of teststatistic and critical values of test: test 10pct 5pct 1pct r <= 1 | 1.74 7.52 9.24 12.97 r = 0 | 13.75 17.85 19.96 24.60 Eigenvectors, normalised to first column: (These are the cointegration relations) XOM.Adjusted.l2 CVX.Adjusted.l2 constant XOM.Adjusted.l2 1.0000000 1.000000 1.0000000 CVX.Adjusted.l2 -0.4730001 -1.350994 -0.7941696 constant -36.1562621 48.872924 -23.2604096 Weights W: (This is the loading matrix) XOM.Adjusted.l2 CVX.Adjusted.l2 constant XOM.Adjusted.d -0.03890275 0.003422549 -1.424199e-16 CVX.Adjusted.d -0.01169526 0.006491211 2.416743e-17 '''

**Here’s how to interpret the output of the Johansen test:**

Firstly, the test actually has more than one null hypothesis. The first null hypothesis is that there are no cointegrating relationships between the series, and is given by the

r = 0test statistic above.

The second null hypothesis is that there is one or less cointegrating relationships between the series. This hypothesis is given by the

r <= 1test statistic. If we had more price series, there would be further null hypotheses testing for less than 2, 3, 4, etc cointegrating relationships between the series.

If we can reject all these hypotheses, then we are left with the number of cointegrating relationships being equal to the number of price series.

In this case, we can’t reject *either* null hypothesis at even the 10% level! This would seem to indicate that our series don’t combine to produce a stationary spread, despite what we found earlier. **What’s going on here?**

This raises an important issue, so let’s spend some time exploring this before continuing.

First, the tests for finding statistically significant cointegrating relationships should not be applied mechanically without at least some understanding of the underlying process. It turns out that in the

ca.jo()implementation of Johansen, the critical values reported may be arbitrarily high thanks to both the minimum number of lags required (2) and statistical uncertainty associated with the test itself. This can result in unjustified acceptance of the null hypothesis. For this reason, we might prefer to use the CADF test, but that’s not an option with more than two variables.

In practice, I would be a little slow to reject a portfolio on the basis of a failed Johansen test, so long as there was a good reason for there to be a link between the component series, and we could create a promising backtest. In practice, you’ll find that dealing with a changing hedge ratio is more an issue than the statistical significance of the Johansen test. More on this in another post.

Returning to the output of the Johansen test, the reported eignevectors are our hedge ratios. In this case, for every unit of XOM, we hold an opposite position in CVX at a ratio of 0.47. To be consistent with our previously constructed spreads, taking the inverse gives us the hedge ratio in terms of units of XOM for every unit of CVX, which works out to be 2.13 in this case. Here’s how we extract that value and construct and plot the resulting spread:

# Johansen spread jo_hedge <- 1/jt@V[2, 1] jo_spread <- prices[, 'CVX.Adjusted'] + jo_hedge*coredata(prices[, 'XOM.Adjusted']) plot(jo_spread, main='CVX-XOM Spread', col='black')

For completeness, here’s an example of using the Johansen test on a portfolio of three instruments, adding Conoco-Philips (COP, another energy company) to our existing pair:

# 3-series johansen example portfolio <- load_data(c('XOM', 'CVX', 'COP'), start='2014-01-01', end='2017-01-01', source='yahoo', return_data = 'Adjusted', save_indiv = TRUE) plot(portfolio, col=c('blue', 'red', 'black'), main = 'XOM and CVX') legend('bottomright', col=c('blue', 'red', 'black'), legend=c('XOM', 'CVX', 'COP'), lty=1, bty='n') jt <- ca.jo(portfolio, type="eigen", K=2, ecdet="const", spec="longrun") summary(jt) ''' output: ###################### # Johansen-Procedure # ###################### Test type: maximal eigenvalue statistic (lambda max) , without linear trend and constant in cointegration Eigenvalues (lambda): [1] 1.696099e-02 6.538698e-03 2.249865e-03 -3.753129e-18 Values of teststatistic and critical values of test: test 10pct 5pct 1pct r <= 2 | 1.70 7.52 9.24 12.97 r <= 1 | 4.95 13.75 15.67 20.20 r = 0 | 12.90 19.77 22.00 26.81 Eigenvectors, normalised to first column: (These are the cointegration relations) XOM.Adjusted.l2 COP.Adjusted.l2 CVX.Adjusted.l2 constant XOM.Adjusted.l2 1.00000000 1.0000000 1.000000 1.0000000 COP.Adjusted.l2 0.05672729 -0.5610875 1.169714 0.3599442 CVX.Adjusted.l2 -0.49143408 -0.7078675 -1.764155 -1.2310314 constant -37.60449105 19.0946477 31.146100 9.2824197 Weights W: (This is the loading matrix) XOM.Adjusted.l2 COP.Adjusted.l2 CVX.Adjusted.l2 constant XOM.Adjusted.d -0.04560861 0.003903014 -0.001297751 -1.251443e-15 COP.Adjusted.d -0.01296028 0.003347451 -0.003035065 -5.016783e-16 CVX.Adjusted.d -0.02437064 0.009714253 -0.001897541 -5.504833e-16 ''' # Johansen spread jo_hedge <- jt@V[, 1] jo_spread <- portfolio%*%jo_hedge[1:3] jo_spread <- xts(jo_spread, order.by = index(portfolio)) plot(jo_spread, main='CVX-XOM-COP Spread', col='black')

While it may seem tempting to check thousands of instrument combinations for cointegrating relationships, be aware of these two significant issues:

- Doing this indiscriminately will incur significant amounts of data mining bias. Remember that if you run 100 statistical significance tests, 5 will pass at the 95% confidence level through chance alone.
- The Johansen test creates portfolios using all components. This can result in significant transaction costs, as for every instrument added, further brokerage and spread crossing costs are incurred. One alternative is to explore the use of sparse portfolios – see for example this paper.

In practice, I use statistical tests such as CADF and Johansen to help find potential hedge ratios for price series that I think have a good chance of generating a profit in a pairs trading strategy. Specifically, **I don’t pay a great deal of attention to the statistical significance of the results.** The markets are noisy, chaotic and dynamic, and something as precise as a test for statistical signficance proves to be a subotpimal basis for making trading decisions.

In practice, I’ve seen this play out over and over again, when spreads that failed a test of statistical signifiance generated positive returns out of sample, while spreads that passed with flying colours lost money hand over fist. Intuitively this makes a lot of sense – if trading was as easy as running tests for statistical significance, anyone with undergraduate-level math (or the ability to copy and paste code from the internet) would be raking money from the markets. Clearly, other skills and insights are needed.

In future posts, I’ll show you some backtesting tools for running experiments on pairs trading. Thanks for reading!

We just launched our new **FX Bootcamp**, where you can team up, build and trade a live retail algo FX portfolio from scratch in just 16 weeks. Not only will you end up with a diverse, robust portfolio by the end, but you will have gained the exact approach we’ve used to trade for a living for decades between us.

Interested?

**Enrolment to FX Bootcamp will close Midnight Friday July 12th. **Join your new team of retail traders and get started!

The post Practical Pairs Trading appeared first on Robot Wealth.

]]>The post Bond. Treasury Bond appeared first on Robot Wealth.

]]>The Federal Reserve publishes the yield-to-maturity of US Treasury bonds. However, the actual returns earned by investors are not publicly available. Nor are they readily and intuitively discerned from historical yields, since *“a bond’s return equals its yield only if its yield stays constant and if all coupons (cash payments) are reinvested at that same yield”* (Tuckman and Angel, 2013, p.95).

Recently, Laurens Swinkels of Erasmus University in the Netherlands estimated such a return series using publicly available data for US government bonds with 10-year maturity. The working paper accompanying the data is not yet available, but he has generously published his data in an Excel spreadsheet, including formulas for the return estimation process. You can find the data here.

One nice result of this data is that it enables the construction of a proxy for a bond price series, which can be thought of as representing a constant exposure to 10-year treasuries, and thus facilitate *ex-post *analysis of various timing models.

This idea of a constant exposure is much like an allocation to the ETF IEF or the ZN futures contract in that it is essentially a dynamic trading strategy to maintain a roughly constant time to maturity. However, IEF has only been around since 2002. ZN first traded back in 1982. Swinkels’ data set on the other hand goes back all the way to 1962, so offers the opportunity to explore bonds trading strategies over more than 65 years.

A treasury is a debt obligation issued by the US government to finance its spending. US Treasury securities are among the most liquid securities in the world and are perceived as one of the safest investments globally. In practical terms, this means that the demand for treasuries increases when something happens to scare investors. You can see this playing out for example in the recent price action of the TLT ETF, which holds long-duration treasuries:

Treasury bonds (also known as *notes*) entitle the holder to regular interest payments until *maturity*, at which time the principal, or the amount loaned in exchange for the bond, is returned. Treasury *bills, on the other hand,* have short maturities (one year or less) and only pay out at maturity.

The idea of “buying a single bond” is easy to understand. The coupons are just priced on a fixed rate and are never going to change. Say you buy a 10 year bond. It turns into a 9 year bond, then an 8 year, a 7 year, and eventually a 1 year, paying you a fixed amount at known intervals along the way. Then it matures and your principal is returned. Therefore, if you hold your bond to maturity you know exactly what yield you are going to get on the original capital you paid for it.

Maintaining an “exposure” to bonds is more nuanced. Conceptually, the value of the exposure’s cash flows changes based on current rate conditions and time to maturity of the actual securities held. And rate conditions can change differently at different points on the curve at the same time. This all gets quite complex – hence the value of Swinkels’ data set.

First, here’s the cumulative return from holding a constant exposure to bonds since 1962 (Column F from Swinkels’ data):

That’s a total return of about 4,400%.

Of course, one needs to take care interpreting this figure. This is the return you would have achieved if you could have maintained a constant time-to-maturity, re-invested all of your profits, and done so cost-free. So it’s not overly useful for estimating an investor’s actual returns, but it is useful for getting some insight into how US Treasuries, as an asset class, have performed over this time scale, particularly for comparison with other assets.

Before we get started exploring timing models, it bears stating that in this analysis, we’re not so much interested in actual returns, as there will necessarily be many assumptions baked into the analysis (like those mentioned above). But we are *really *interested in whether timing models can beat a benchmark under the same assumptions.

I’m not holding out a lot hope that we can do better than buy and hold, as bonds have done so well for so long. The reality of trying to time a bond exposure is that whenever you are “out” of bonds you are giving up exposure to the yield – that is, you forgo interest payments, which probably account for the majority of the total returns (although I haven’t verified this). In line with our Robot Wealth mantra to ‘trade humble’ I very much doubt the ability of any simple timing model to overcome this hurdle.

But you never know. In part 2 of this series, we’ll look at various factors in an attempt to outperform a buy and hold bond exposure. We’ll start by looking at the usual momentum and value factors, but if you have any ideas you’d like us to take a look at, let us know in the comments.

The post Bond. Treasury Bond appeared first on Robot Wealth.

]]>The post Shannon Entropy: A Genius Gambler’s Guide to Market Randomness appeared first on Robot Wealth.

]]>The purpose of this post is to scratch the surface of the markets from an information theoretic perspective, using tools developed by none other than the father of the digital age, Claude Shannon. Specifically, we’re going to tinker with the concept of Shannon Entropy.

Shannon (the man, not the entropy) was one of those annoying people that excels at everything he touches. Most notably, he was the first to describe the theory of electrical circuit design (in his Master’s thesis at the age of 21, no less). Later, around 1948, he discovered Information Theory, which leverages his unique-at-the-time understanding that computers could express numbers, words, pictures, even audio and video as strings of binary digits.

Not being one to let his genius go to waste, he and his buddy Ed Thorpe secretly used wearable computers to beat roulette in Vegas casinos by synchronising their computers with the spins of the wheel. They also beat the casinos by counting cards at Blackjack.

Between Information Theory and digital circuit design (and less so his gambling escapades), Shannon’s work essentially ushered in the digital world we find ourselves in today.

Measured in bits, Shannon Entropy is a measure of the information content of data, where *information content* refers more to what the data *could* contain, as opposed to what it *does* contain. In this context, information content is really about quantifying predictability, or conversely, randomness.

This concept of information is somewhat counter-intuitive. In everyday usage, we equate the word *information* with *meaning. *Information has some meaning, otherwise it’s not really information. In Shannon’s Information Theory, *information *relates to the effort or cost to describe some variable, and Shannon Entropy is the minimum number of bits that are needed to do so, on average.

This sounds a bit whacky, but will become clearer as we introduce some equations and examples. The key is to divorce the information theoretical definition of *information* from our everyday concept of *meaning*.

Say we have some random variable, like a coin toss. Your friend tosses the coin and hides the result from you. You can discern the outcome of any individual coin toss by asking just one binary (yes-no) question: *was the outcome a head *?1

The number of binary questions we need to ask to describe each outcome is one. Here and in the examples below, consider that ‘the cost of encapsulating information’ is analogous to ‘the number of questions required to describe a random variable’.

Next, consider a deck of cards with the jokers removed. Our friend shuffles the deck, draws a card and records its suit without showing us. Our friend then replaces the card, reshuffles the deck and repeats. Our friend’s record of the suits drawn from the deck is then a random variable with four equally probable outcomes. What is the minimum number of binary questions we must now ask, on average, to ascertain the suit of each draw?

Our first question could be *is the suit red?* Then, if the answer was *yes*, we might ask *is the suit a diamond? * Otherwise, we might ask *is the suit a spade? *If you think about it, regardless of the suit and the binary questions asked, we always need two questions to arrive at the correct answer.

Now consider a deck stacked with one suit and with some cards of the other suits removed. Say we have 26 hearts, 10 diamonds, 8 spades and 8 clubs. The outcome of a card draw is no longer completely random in the sense that the outcomes are no longer equally likely. We can use that knowledge to reduce the number of questions we need, on average, to arrive at the correct suit.

The probability of a card being a heart, diamond, spade or club is now 0.5, 0.19, 0.15 and 0.15 respectively. If our first question is *Is the suit a heart?* in 50% of cases, we will have arrived at the correct suit after a single question. If the answer is *No*, our next question would be *Is the suit a diamond?* Now, in almost 70% of cases, we will have our answer within two questions. In 30% of cases, we’ll need a third question.

Now, the average number of questions is simply the sum of the probabilities associated with each possible number of questions, that is

\[\frac{26}{52} * 1 + \frac{10}{52} * 2 + \frac{8+8}{52}*3 \approx 1.81\]

Recall that when all suits had an equal probability of occurrence, we always needed to ask two questions to ascertain any particular draw’s suit. This equal weight case corresponds to the system that *maximizes randomness and minimizes order*. When we add some order to the system by making one outcome more likely, we reduce the average number of questions to ascertain the suit, in this case to 1.81.2

The phenomenon we’ve just seen is analogous to Shannon Entropy, which measures the cost or effort required to describe a variable. Like the number of questions we need to arrive at the correct suit, Shannon Entropy decreases when order is imposed on a system and increases when the system is more random. **Entropy is maximized (and predictability minimized) when all outcomes are equally likely.**

Shannon Entropy, \(H\) is given by the following equation:

\[H = -\sum_{i=1}^np_i\log_2 p_i\]

Where \(n\) is the number of possible outcomes, and \(p_i\) is the probability of the \(i^{th}\) outcome occurring.

Why did Shannon choose to use the logarithm in his equation? Surely there are more intuitive ways to measure information and randomness? Certainly when I first looked at this equation, I wondered where it came from. It turns out that randomness is a tricky thing to quantify, and there are several approaches in addition to Shannon’s. The choice of measure is really informed by the properties we wish our measure to take on, rather than the properties of the phenomenon being measured. In this case, it is partially to do with how we might perceive the measure of information to change (for example, to double the number of possible states of a binary string, we simply add a bit, which is equivalent to incrementing the base 2 logarithm of the number of possible states).

More than that, using the logarithm also means that a very likely outcome does not contribute much to the randomness measure (since in this case the \(log_2 p_i\) term approaches zero), and that a very unlikely outcome also does not contribute much (since in this case the \(p_i\) term approaches zero). In addition, the additive property of logarithms simplifies combining entropies from different systems.

Zorro implements a Shannon Entropy indicator, but it’s tucked away in the *Indicators* section of the manual, one of dozens of functions listed on that page, and it’s easy to miss it.

Zorro’s is quite a clever implementation that works by converting a price curve into binary information: either the current value is higher than the previous one, or it is not. The function then detects and counts every combination of price changes in the curve of a given length. For example, we can check for patterns of two consecutive price changes, of which there are four possible binary combinations (up-up, down-down, up-down, down-up). Zorro then determines the relative frequency of these binary combinations, which are of course the empirically determined \(p\) values for use in the Shannon Entropy equation.

Once the \(p\)’s are known, Zorro simply implements the Shannon Entropy equation and returns the calculated value for \(H\), in bits. That means that the maximum \(H\), which corresponds to a perfectly random system, is equal to the pattern size. In our example of analyzing patterns of length 2, \(H = 2\) implies that all the patterns were equally likely to occur, and thus the system is purely random.

Of course, deviations from randomness are of interest to traders, because less randomness implies more predictability.

Before we dive into some examples, let’s take a look at the arguments of the

ShannonEntropy()function:

- The function’s first argument is a data series, usually a price curve. Remember, the function differences this series for us so we can simply supply raw prices.
- Next, we supply an integer time period over which to analyze the price curve for randomness.
- Finally, we supply the length of the patterns that we are interested in, from 2 to 8, remembering that there are \(2^x\) possible patterns, where \(x\) is the pattern length.

Here’s a simple implementation that plots the Shannon Entropy for the SPY ETF from 2000 to 2016 for all pattern sizes from 2 to 5 measured at the daily time scale (the script gets its data from the Alpha Vantage API; if you don’t have access, comment out lines 16-17 and select an asset from your Zorro GUI):

/* SHANNON ENTROPY */ function run() { set(PLOTNOW); StartDate = 2000; EndDate = 2016; BarPeriod = 1440; LookBack = 80; PlotHeight1 = 250; PlotHeight2 = 125; PlotWidth = 800; if(is(INITRUN)) assetHistory("SPY", FROM_AV); asset("SPY"); int period = LookBack; vars Closes = series((priceClose())); int patterns; for(patterns=2;patterns<=5;patterns++) { var H = ShannonEntropy(Closes, period, patterns); plot(strf("H_%d", patterns), H, NEW|BARS, BLUE); } }

And the resulting plot:

While the markets are clearly highly random, we can see regular departures from perfect randomness across multiple pattern sizes. That’s good news! If the markets aren’t completely random, then there is hope for us traders!

The next obvious question is how could we apply this measure of randomness in our trading? In reality, you’re unlikely to derive a trading strategy from the calculation of \(H\) in isolation (more on this below), but perhaps there is merit in applying it as an additional trade filter. For example, if you’ve found an edge in a particular market, perhaps it makes sense to apply it selectively during periods of predictability. Like all backwards-looking measures however, we need to consider that the past may not be like the future, and we need to be careful about optimizing the lookback period used in our analysis.

While the past may not be like the future, there is a principle related to entropy that may provide clues about the future state of a system. This principle states that complex systems tend to evolve so as to maximize entropy production under present constraints. This principle of maximum entropy has found application in physics, biology, statistics and other fields. Perhaps we can apply it to the markets too.

If the principle of maximum entropy does indeed apply to the markets, we would expect that given an existent series of price changes, the next value in the series should tend to maximize the entropy of the system. That means that if we know the market direction that maximizes its entropy, we have a clue as to which way the market is more likely to move. To test this, we can simulate possible future price movements and work out which scenarios tend to maximize the system’s entropy at any given time.

Below is some code to accomplish this. Firstly, we need to set the

PEEKflag (line 8), which enables the

price()functions to access future data. We calculate Shannon Entropy of our price series as before (lines 30-31). But this time, we need to create two additional arrays that correspond to the two possible entropy states at the next time period (line 19). That is, one array corresponds to an up move, and the other corresponds to a down move. We fill these arrays in a

for()loop where we copy across the existing series into our new arrays, starting from index 1 (lines 21-25). Then, we place either a higher or lower price compared to the current price at index 0 of each array (lines 26-27).

Now we’ve got arrays from which we can calculate the two possible entropy states at the next time period (remember that entropy is calculated from binary price patterns, therefore only the direction of the next move is important, not its magnitude). Then, we simply calculate the entropy of both states, and the one that returns the higher value is our prediction (lines 30-31).

I’m going to apply this idea to one of the most efficient (and therefore random) markets around: the foreign exchange markets. Rather than run a traditional backtest, in this case I just want to count the number of correct and incorrect forecasts made by our maximum entropy predictor, and sum the total number of pips that would be collected in the absence of the realities of trading (slippage, commission and the like). Lines 33-41 accomplish this.

Here’s the code:

/* SHANNON ENTROPY */ function run() { set(PLOTNOW|PEEK); StartDate = 2004; EndDate = 2016; BarPeriod = 1; LookBack = 100; int patterns = 2; int period = LookBack; vars Closes = series((priceClose())); // create possible future entropy states var up[501], down[501]; //set these large to enable experimenting with lookback - can't be set by variable int i; for(i=1;i<period+1;i++) { up[i] = Closes[i]; down[i] = Closes[i]; } up[0] = Closes[0] + 1; down[0] = Closes[0] - 1; // get entropy of next state var entropyUp = ShannonEntropy(up, period+1, patterns); var entropyDn = ShannonEntropy(down, period+1, patterns); // idealized backtest static int win, loss; static var winTot, lossTot; if(is(INITRUN)) win = loss = winTot = lossTot = 0; if(entropyUp > entropyDn and priceClose(-1) > priceClose(0)) {win++; winTot+=priceClose(-1)-priceClose(0);} else if(entropyUp < entropyDn and priceClose(-1) < priceClose(0)) {win++; winTot+=priceClose(0)-priceClose(-1);} else if(entropyUp > entropyDn and priceClose(-1) < priceClose(0)) {loss++; lossTot+=priceClose(0)-priceClose(-1);} else if(entropyUp < entropyDn and priceClose(-1) > priceClose(0)) {loss++; lossTot+=priceClose(-1)-priceClose(0);} if(is(EXITRUN)) printf("\n\n%W: %.2f%%\nWins: %d Losses: %d\nWinTot: %.0f LossTot: %.2f\nPips/Trade: %.1f", 100.*win/(win + loss), win, loss, winTot/PIP, lossTot/PIP, (winTot-lossTot)/PIP/(win+loss)); ColorUp = ColorDn = 0; plot("PipsWon", (winTot-lossTot)/PIP, MAIN|BARS, BLUE); PlotWidth = 1000; }

You can have some fun experimenting with this script. Try some different assets, different bar periods and different pattern sizes. You’ll see some interesting things in relation to randomness – some of which may go against the existing common wisdom.

As an example, when we run this script using 1-minute bars, we nearly always get a very slightly positive result, usually on the order of 51% correct predictions. The plot of pips collected for the major currency pairs from 2004-2016 are shown below (from left to right, top to bottom, AUD/USD, USD/JPY, USD/CAD, EUR/USD, NZD/USD, GBP/USD):

While there is very likely a tiny edge here, it is just that: tiny. In reality, you’ll never make money by predicting the direction of the next minute’s price change correctly 51% of the time! However, what if you could predict the direction over a longer time frame and improve the accuracy of your prediction? Could you make money from that?

Here’s an idea for extending our directional forecast based on maximum entropy to two time steps into the future. Now we have four possible scenarios (up-up, down-down, up-down, down-up) to assess. In the idealized backtester, we now go long when the two-ahead entropy is maximized by the up-up case, and short when it is maximized by the down-down case.

/* SHANNON MULTI-STEP AHEAD FORECAST */ function run() { set(PLOTNOW|PEEK); StartDate = 2010; EndDate = 2019; BarPeriod = 1; LookBack = 100; int patterns = 3; int period = LookBack; vars Closes = series((priceClose())); var upup[501], dndn[501], updn[501], dnup[501]; int i; for(i=1;i<period+1;i++) { upup[i] = Closes[i]; dndn[i] = Closes[i]; updn[i] = Closes[i]; dnup[i] = Closes[i]; } upup[1] = Closes[0] + 1; upup[0] = upup[1] + 1; dndn[1] = Closes[0] - 1; dndn[0] = dndn[1] - 1; updn[1] = Closes[0] + 1; updn[0] = updn[1] - 1; dnup[1] = Closes[0] - 1; dnup[0] = dnup[1] + 1; var entropyUpUp = ShannonEntropy(upup, period+2, patterns); var entropyDnDn = ShannonEntropy(dndn, period+2, patterns); var entropyUpDn = ShannonEntropy(updn, period+2, patterns); var entropyDnUp = ShannonEntropy(dnup, period+2, patterns); static int win, loss; static var winTot, lossTot; if(is(INITRUN)) win = loss = winTot = lossTot = 0; if(max(max(max(entropyUpUp, entropyDnDn), entropyUpDn), entropyDnUp) == entropyUpUp and priceClose(-2) > priceClose(0)) {win++; winTot+=priceClose(-2)-priceClose(0);} else if(max(max(max(entropyUpUp, entropyDnDn), entropyUpDn), entropyDnUp) == entropyDnDn and priceClose(-2) < priceClose(0)) {win++; winTot+=priceClose(0)-priceClose(-2);} else if(max(max(max(entropyUpUp, entropyDnDn), entropyUpDn), entropyDnUp) == entropyUpUp and priceClose(-2) < priceClose(0)) {loss++; lossTot+=priceClose(0)-priceClose(-2);} else if(max(max(max(entropyUpUp, entropyDnDn), entropyUpDn), entropyDnUp) == entropyDnDn and priceClose(-2) > priceClose(0)) {loss++; lossTot+=priceClose(-2)-priceClose(0);} if(is(EXITRUN)) printf("\n\n%W: %.2f%%\nWins: %d Losses: %d\nWinTot: %.0f LossTot: %.0f\nPips/Trade: %.1f", 100.*win/(win + loss), win, loss, winTot/PIP, lossTot/PIP, (winTot-lossTot)/PIP/(win+loss)); ColorUp = ColorDn = 0; plot("pips", (winTot-lossTot)/PIP, MAIN|BARS, BLUE); PlotWidth = 1000; }

And here’s a plot of the number of pips collected over the past few years, exlcuding trading frictions, on EUR/USD:

In this post, we looked at the markets from a slightly different perspective, through the lens of Information Theory. In particular, we saw how Shannon Entropy is a measure of the degree of order or predictability within a system, with increasing entropy corresponding to more randomness and maximum entropy occurring when all outcomes are equally likely. We saw that financial markets are highly random (in general displaying a Shannon Entropy close to that of a perfectly random system), but that they do depart from randomness regularly. They may even do so differently depending on the time horizon and granularity over which they are analyzed.

We also saw an attempt to use Shannon Entropy in a standalone trading system via the principle of maximum entropy production. While such a system does appear to have a small edge, in reality, it will be difficult to make a consistent trading system from these predictions alone as transaction costs will usually swamp the edge.

While this is all very interesting from an academic perspective, if you can think of a practical application to trading, we’d love to hear about it in the comments.

—————-

We recently released our new ebook **Embrace the Mayhem****. **

Embrace the Mayhem is an honest yet exciting guide to what really works as a retail trader — based on the skills and approach we currently use to trade full-time.

It is the result of all the hard-won lessons myself and James have learned from over 20 years in professional and retail markets.

All the costly mistakes and frustrating misguidance we’ve dealt with.

All the wins and losses we’ve celebrated or grieved over.

In an area brimming with deception and confusion, Embrace the Mayhem is the retail trader’s survival guide and playbook all in one.

**Find out more about Embrace the Mayhem and get it here.*******

***If you are already a Robot Wealth member, please get the ebook via this page (the same price, but you won’t unnecessarily open two accounts).**

The post Shannon Entropy: A Genius Gambler’s Guide to Market Randomness appeared first on Robot Wealth.

]]>The post Super Fast Cross-Platform Data I/O with Feather appeared first on Robot Wealth.

]]>It’s a binary file format for storing data frames – the near-universal data container of choice for data science.

Have I already mentioned that reading and writing feather files is fast?

Check this out. Here I’ve created a pandas data frame with one million rows and ten columns. Here’s how long it took to write that data frame to disk using both feather and gzip:

Yes, you read that correctly: 94 milliseconds for feather versus 33 seconds for gzip!

Here’s the read time for the each format:

The other thing I like about feather is that it is agnostic to your choice of the two main weapons of data science, namely Python pandas and R. In fact, the format was born out of a collaboration between two of the giants of data science – Wes McKinny, originator of the pandas project, and Hadley Wickham, to whom the R community owes a debt of gratitude for the tidyverse suite of tools. Apparently, these guys got together and lamented the lack of interoperability between Python and R for data science, and did something about it.

Noting that pandas and R data frames share many similarities, the feather format was developed in order to provide a common storage format for both. In practical terms, that means that you can store your data on disk in a format that is easy and fast to read into whatever platform you happen to be using.

This cross-platform approach really resonated with me. I often hear from readers who are agonising over the decision of which tool to use for analysing the markets – usually it’s a decision between Python and R, and you’d be amazed at how often I hear from people who get hung up on this decision. But it is *completely* the wrong thing to agonise over. Python and R are not mutually exclusive. Starting with one does not mean that you’re forever handcuffed to it and forbidden from using the other. They’re both wonderful tools in their own right, so why not skill up in both?

Of course, language agnosticism is nothing new. The Jupyter notebook has for a while supported both Python and R code – even in the same notebook. And now with feather you can store your data in a format that can easily be retrieved regardless of which language you happen to use. A highly personalised pick-and-choose approach to data science is the future, where you can use the tools that you like best for particular tasks, regardless of the language they were developed for or the preferences of your colleagues.

Using these sorts of tools, you have the power to implement whatever workflow is best for you or for your particular project, and even collaborate with people who like to work differently. For example, James loves the tidyverse suite of tools in R (quote: “*its basically an in-memory SQL implementation, but with nicer grammar*“), while I tend to do things faster in pandas. And we collaborate just fine doing things our own way. The point is, your choice of tool is much less important than you probably think it is. We have smart people like Wes and Hadley to thank for that luxury. Make the most of it.

There are two:

First, a feather file takes up more disk space than the same file compressed using gzip. Here’s a size comparison:

$ ls -lah total 92M drwxr-xr-x 1 Kris 197612 0 Jun 8 17:27 ./ drwxr-xr-x 1 Kris 197612 0 Jun 8 17:27 ../ -rw-r--r-- 1 Kris 197612 77M Jun 8 17:09 test_df.feather -rw-r--r-- 1 Kris 197612 15M Jun 8 17:09 test_df.gzip.csv

You can see that the feather file takes up about five times the space of the gzip file. So you’re probably not going to choose feather for long term data storage. Its primary use cases are fast input/output and cross-platform interoperability.

Secondly, feather won’t work with non-standard indexes, like date-time objects – pandas will throw an error, as in the notebook below. Thankfully the solution is simple: before feathering, reset the index to the default integer sequence and save the actual index as a column. Then, when reading back in, you simply set that column as the index:

On Python, you can install using conda from the conda-forge cannel with this command:

conda install feather-format -c conda-forge.

You can also install using pip:

pip install -U feather-format.

Also, make sure you’re using the latest pandas (0.24.x at the time of writing). There’s also a library called

featherfor working with this format in Python if you don’t want to use the pandas wrappers.

On R, simply install the feather library, then call

library(feather).

The syntax to use feather is similar on Python pandas and R. First, pandas:

import pandas as pd df = pd.read_feather('myfile.feather')

On R:

library(feather) df <- read_feather('myfile.feather')

I read somewhere that R’s native RData format would read and write quicker than feather format. So if you’re on R and have no reason to save your data frames to a format that’s compatible with Python, you may find that RData is the better choice. However, I tested this quickly in a Jupyter notebook running R and didn’t find this to be case at all:

I’m not sure why my results were so different to what I read online. Perhaps the efficiency of RData and feather scales differently depending on the size of the data frame. Perhaps feather has improved since the time of the article that I read. Perhaps a single observation isn’t enough to draw any conclusions.

In any event, knowing about the feather format has given me a not-insignificant productivity boost. I hope it is of some use to you too.

The post Super Fast Cross-Platform Data I/O with Feather appeared first on Robot Wealth.

]]>The post Optimising MetaTrader for Algorithmic Trading appeared first on Robot Wealth.

]]>If we had the choice, we’d rather trade directly with the broker through a dedicated API rather than through a third-party platform, but often that’s not an option. One thing that my life as a trader has taught me is that it’s better to move fast in order to get trading strategies into the market with a solution that’s “good enough” rather than spending valuable R&D time on developing “optimal” solutions – which usually end up changing anyway. So we suck it up and make the best of the tools at our disposal. It’s all about being smart with priorities.

We do our Spot FX trading through Darwinex. Not only is their business model aligned with their clients’ desire to make money through trading (which is not as common as you might think in FX land), but they were generous enough to provide a fantastic deal on brokerage fees for our members. Darwinex is developing an API to enable fast, direct automated trading and we very much look forward to making use of that in the future. But for now, we’re executing our FX strategies via the MetaTrader platform. Specifically, we’re running our strategies through the Zorro Automated Trading software, which in turn controls MetaTrader.

The main benefit of this stack is convenience – we get all the power and flexibility of Zorro for doing automated trading, without having to build any additional market connectivity infrastructure. This makes the path from idea generation to research to market feedback to live trading about as short as one could hope.

There are a few drawbacks to this approach. One is the additional execution latency that comes with executing through MetaTrader (the Zorro link adds almost nothing however). None of our strategies are latency sensitive, so we’re not worried about this right now.

Another drawback is that MetaTrader is something of a resource hog. And since you’ll generally want to run an automated trading portfolio on a commercial server, resource intensive software leads to bigger servers, which leads to added costs.

Getting Zorro set up and connected to MetaTrader is a bit of a pain the first time around. I’ll give you a walk through of that process in another blog post. But right now, I want to show you some simple tips and tricks for reducing the resource footprint of the MetaTrader platform – and hopefully saving you a few bucks along the way. Here goes.

MetaTrader is capable of plotting a lot of information about a lot of different markets all at the same time. Which is awesome if you’re this guy:

But we don’t need this for automated trading. And you can save a decent amount of computer resources by getting rid of it. Under the *Tools* menu, select *Options*, then click on the *Charts* tab in the pop-up. Then, enter *1* for *Max bars in history* and *Max bars in chart:*

By default, MetaTrader will automatically scroll its charts as time marches on. Again, we don’t need this feature for our purposes. Disable it like so:

More unnecessary (for our purposes) bloat. Turn off MetaTrader’s news feed by selecting *Tools* –> *Options* as above, then select the *Server* tab and uncheck the *Enable News* box:

Disable sounds by selecting the *Events* tab and unchecking the *Enable* box:

MetaTrader can subscribe to data feeds that you don’t need, hogging valuable resources. Right click in the *Market Watch* window and select *Hide All*:

The *Market Watch* window will then look something like this, showing that you’re only pulling data for one or a few products:

Now that only a few tickers will remain, you may need to add back the ones you wish to trade. Do this by once again right-clicking in the *Market Watch* window and selecting *Symbols*, like so:

Another pop-up menu appears listing all the products that you can subscribe to. Anything with a greyed-out box next to it is currently unsubscribed; while anything with a gold box is subscribed. Double click on the boxes to toggle them back and forth:

We don’t need to display charts for all the tickers we intend trading in order to use Zorro to send trades to MetaTrader. Being subscribed via *Market Watch* (see above) is enough. *But you do need one chart.*

Because of the way MetaTrader works, Zorro is able to control it by “attaching” some software to a chart. I’ll cover this in another blog post, but the wash up is that this chart needs to remain open in order for Zorro to work properly.

If you close all the charts you don’t need you’ll save some resources, and if you minimise the one chart that you do need, you can save a tiny bit more by reducing the amount of visual rendering.

Your platform should now look like this:

The terminal outputs a bunch of information from the platform: useful things like real-time trade profit and account margin, and less useful things like “signals” that you can subscribe to from within the platform that will trade on your behalf.

Admittedly, turning this off won’t make a huge difference, but it’s easy to do and there’s no downside. Disable the terminal by clicking this button on the toolbar (you can turn it back on anytime by clicking the same button):

To get the benefit of all those changes, you may need to restart your MetaTrader platform.

At the time of writing, we were running our Zorro-MetaTrader automated fx portfolio on a virtual machine provided by Google Cloud Platform (here’s an article that will help you get something similar up and running). We’re running an n1 standard instance with 3.5GB of memory, 1 virtual CPU and 50GB of persistent disk (which we wouldn’t need if we weren’t bound to using Windows – I’ve heard that Zorro for Linux is coming, but we make do with what we have). We’re running a MetaTrader platform set up as described in this post. We’ve also got five Zorro instances interacting with it.

How’s the performance? Well, Google’s resource monitor tells us that we’re over-utilising the machine and that we should upgrade, but we haven’t run into any problems yet. Our CPU monitoring looks like this:

Like all trading tools, MetaTrader has its pros and cons. For fast and easy access to retail foreign exchange markets, it’s hard to beat.

By performing some simple optimisations, we can significantly reduce the resource load of the platform, thereby reducing the specs of our trading server, improving its performance and saving some dollars along the way.

The post Optimising MetaTrader for Algorithmic Trading appeared first on Robot Wealth.

]]>The post Humility and The Pain Trade appeared first on Robot Wealth.

]]>* Being humble* is fundamental to everything we do at Robot Wealth: in our own trading, and in collaborative research in our Bootcamps.

The more we trade, the more we are humbled by the markets.

The more we are humbled by the markets, the simpler our trading becomes.

The financial markets are very efficient. Good traders inherently understand this, because good traders know it is hard to make money trading. Understanding this point is a critical starting point for a trader’s success.

At Robot Wealth, we call this * Embracing the Mayhem*.

**Our new ebook, now available here.**

You must understand the efficiency and the randomness in the market, and you* must accept it*.

*Embracing the Mayhem* is the first step.

Next, you must understand the games available to you as a trader. You must pick the ones with the best chance of a positive outcome, and you must play those games in an effective way.

There are two types of games available which will pay you:

- Collecting Risk Premia
- Exploiting Market Inefficiencies.

You get rewarded over the long term for taking on certain risks that others find unattractive *(risk premia), *or you get rewarded for exploiting fleeting inefficiencies in the market *(alphas.)*

The first is uncomfortable.

The second is hard. Alphas come and go. The market is extremely efficient.

Relying on alpha generation to provide all your trading returns is an overconfident bet.

We believe that:

Every trader should structure their portfolio so that they are likely to make money even when their alpha signals or active views are wrong.

* Embracing the Mayhem* of the markets leads us to

We *should* be paranoid about losing our active edges. It is sensible to be worried about this. It’s a hard game to play, and it makes perfect sense to structure things so that we can make money even if we lose our alpha.

One of the best and simplest ways to do this is to maintain a constant allocation to long-term risk premia.

*Being uncomfortable is better than going broke.*

In our first Bootcamp we developed a low-frequency, long-only strategy to harvest market risk premia.

The main performance drivers of this strategy are:

**Asset Selection**– picking positive carry assets with exposures to diverse global risk premium which have been historically well rewarded**Risk Management**– minimising unrewarded risk through diversification and volatility management.

The strategy is a simple, but sophisticated, risk parity portfolio with some very subtle active tilts.

It is a humble portfolio. It makes little attempt to predict the premium we receive for certain risks at certain times.

We know we’re paid over the long run for taking exposures to certain short term risks. So we look to have exposure to as many of these as we can, and to equalise our risk across them.

Short term pain is the price we pay for long term gain. If we try to sidestep the risk, we are likely to sidestep the premium too.

US Treasury Bonds are a wonderful asset class. They’re a positive carry asset with long term negative correlation to global risk. They are exposed to the risk of real interest rates and inflation increasing.

One of the (self-imposed) constraints on our risk premia portfolio was that we want to trade ETFs only. US Treasury Bond ETFs are less volatile than the other ETFs in the portfolio, so our risk parity sizing algorithm assigned a significant share of the total dollar exposure of the portfolio to them.

This alarmed many of our Bootcamp participants.

As we were putting the finishing touches to the strategy in November 2018, the following was a common concern:

“I’m concerned about the dollar weight of fixed income in our portfolio. The 30-year bond bull market is coming to an end. Bonds are trading at a premium. Rates are likely to go up. We are buying expensive bonds that are likely to tank. Bond’s risk/reward isn’t good at all.”

*^ Paraphrased from real participant comments.*

* Embracing the Mayhem* starts with understanding that

*Maybe you think you have unique, valuable insight into the economic and monetary system? Maybe you think you can use this to predict future bond prices?*

You probably don’t. And you probably can’t.

And neither can I, Warren Buffett, Jimmy Buffett, Ray Dalio, or anyone else.

But don’t worry about that, we don’t need to play those prediction games to make money. We can play different games.

**So what’s the humble thing to do here?**

The humble thing is to understand that the current market price is as good an estimate of fair value as you’re likely to come up with, without access to privileged information.

The humble thing is to understand that the risk you are trying to sidestep is *exactly* the risk we are being rewarded for by holding the assets. Taking on that risk is the whole trade!

Holding interest rate risk gets rewarded over the long run *because *people are worried that rates are going up, and that their bonds are going to look comparatively expensive.

Taking on interest rate risk in the face of this concern is the trade that gets rewarded here.

Trying to sidestep the risk when it looks the riskiest is…. not.

**If we try to sidestep the risk, we also sidestep the premium.**

We received those comments at the beginning of November 2018. US Treasuries had been in a mini bear market since 2016 and it was becoming a consensus view that the risk/reward of US Treasuries was not looking good.

6 months later and TLT is up over 12%.

We didn’t include TLT in our risk premia collection strategy because we had some kind of grand insight on rates, or any other special ability to predict the future.

We included it because:

- it is a positive carry asset
- it is exposed to interest rate and inflation risk
- it has excellent diversification properties in a portfolio of global risk assets.

**We included it because being long TLT gives us plenty of room to be wrong and still make money.**

Don’t try to be a hero. **Be humble**, at least in the way you approach the markets.

Give yourself plenty of room to be wrong and still make money.

**Disclaimer:** None of this is investment advice. The author is long a “sh-t ton” of US Treasuries across the curve.

**Want to join us and the community in our up-coming Algo Bootcamp? Click here to find out more**

The post Humility and The Pain Trade appeared first on Robot Wealth.

]]>The post Trading Lessons from Gamblers appeared first on Robot Wealth.

]]>Like many sociopaths, Frank Wallace was a fan of the philosophy of Ayn Rand. He wrote a book called *“Poker: A Guaranteed Income for Life”*, started a cult based on Rand’s philosophy of objectivism, and got convicted of tax fraud.

*“Poker: A Guaranteed Income for Life”* is about getting edges in poker by playing weak players. The main character, John Finn, gets himself invited to a variety of home poker games. He creates a carefree atmosphere at the games, he befriends and encourages the weaker players, and slowly and carefully, he extracts their money.

In short, he’s a con artist. But he’s a con artist who has correctly identified that the most reliable way to make money playing poker is to play in easy games.

This lesson can be applied to trading.

**The smartest decision you can make in trading is to play easy games.**

What are the easy games in trading?

- Passively harvesting risk premia is an easy game
- Relative return prediction (cross-sectional strategies) is easier than making absolute return predictions (time series strategies)
- Immature markets (e.g. cryptocurrencies) offer more alpha opportunities than mature markets (e.g. FX)
- Capital-constrained statistical arbitrage can be very profitable for smaller traders and scales well within and across asset classes
- Cross-exchange and cross-timezone statistical arbitrage strategies offer opportunities for those willing to do the admin work and carry risk over market close.

You’ll notice that these “easy games” all have something a little distasteful about them. Harvesting risk premia involves taking on short term risks and skews in exchange for long term expected returns. Cross-timezone statistical arbitrage involves tedious admin work and risks which are hard to hedge. Trading cryptocurrencies involves credit risk, regulatory risk, jump risk and a hundred other risks you’ve never heard of before.

The easy games might not be the ones that you want to play. They might not be the most fun. But they are the most profitable. And having lots of money *is* fun.

Joseph Buchdhal has written a number of excellent books on fixed odds sports betting.

As you read through his books in chronological order you observe his progression from an optimistic sports bettor who believes they can out-handicap the betting markets, to believing that the betting market price itself is the best prediction possible.

This leads the sports bettor to the realisation that, if she wants to make money, she needs to play a different game.

She might not be able to make absolute predictions that are better than the market – but she might be able to systematically exploit short term pricing inefficiencies on the way to price equilibrium.

*This will sound like a familiar journey to the successful trader…*

Maybe you’ve tried to predict the fair value of a company by analysing financial information? Or you’ve tried to calculate the fair value of an exchange rate by looking at economic data?

The market typically humbles such earnest efforts. The current market price is almost certainly as good a prediction as any you can make with public data, however slick your Excel modelling skills may be…

**Just as the successful sports bettor doesn’t try to out-predict the market line, the successful trader doesn’t try to make heroic market predictions.**

The successful trader plays a totally different game entirely.

There are short-term relative pricing inefficiencies that are still pervasive in the markets.

A good example is the over-reaction and under-reaction we see, on average, before or after certain events. These effects can be exploited systematically over a large number of independent bets; the Price Earnings Announcement Drift (PEAD) trade being a textbook example.

Other examples include short-term relative misalignment in prices which can be exploited through systematic convergence trading, and predictable moves in option implied volatility around scheduled information announcements.

The trader doesn’t need complex predictive models to exploit any of these effects. The bets are obvious, and she just needs to systematically take the bets and let her edge play out over time.

This isn’t particularly sexy. It doesn’t involve exciting, novel research. Nobody wants to talk to you about these trades. But this is a profitable game to play.

*“I hope to break even this month… I could use the money” –* Professional Gambler

Every successful professional gambler is a grinder. They don’t bet the farm, because they know the most important thing is to not blow up.

Rather than looking for “life-changing bets” they take a large number of small consistent bets, allowing a small edge to realise itself over time.

They grind it out.

The same is true in trading. Over-sizing your trades will kill you. Shooting for the moon is almost guaranteed to lose you money over the long run. (Look at the price of far out of the money options if you don’t believe me!)

The successful trader grinds it out every day.

**Every day he bets small and bets often and takes as many different bets as he can.**

There’s a lot you can learn from successful gamblers. Just make sure you get some sunshine every now and then.

The post Trading Lessons from Gamblers appeared first on Robot Wealth.

]]>The post Backtesting Bias: <br>Feels Good, Until You Blow Up appeared first on Robot Wealth.

]]>Knowing exactly what causes exploitable inefficiencies would make predicting market behaviour and building profitable trading strategies a fairly cushy gig, right?

If you’re an engineer or scientist reading this, you are probably nodding along, hoping I’ll say the financial markets show some kind of* domino effect* for capitalists. That you can model them with the kinds of analytical methods you’d throw at a construction project or the petri dish. But unfortunately, trying to shoehorn the markets into formulas is a futile exercise… like stuffing Robot Wealth’s frontman Kris into a suit.

Since the markets aren’t strictly deterministic, this makes testing your new and exciting strategy ideas a bit tricky. We’d all love to know for sure whether our ideas will be profitable before we throw real money at them. But, since you can’t realistically apply mathematical equations to your strategy to derive its future performance, you’ll need to resort to the next best thing — **experimentation.**

By experimenting with your trading strategy during development, you’re left wide open to some *fatal errors* when assessing its potential future performance. These can, and will, cost you time, money and many, many headaches.

**In this post, you’re going to learn how to identify and avoid the more common, often expensive simulation biases so you can build more robust, more profitable systematic trading strategies. Let’s go!**

As far as I know, you’ve got two options when it comes to testing systematic trading strategies:

- You can find out the
**actual**performance of your strategy - You can find out the
**likely**performance of your strategy

The first option simply involves throwing real money at a live version of your strategy and seeing if it does anything exciting. The second is **testing your strategy**, usually on past data, before following option 1 later down the pipeline. If like us, you *don’t* enjoy needlessly throwing away wads of your capital, you’ll first test your strategy via simulation, assessing whether it is likely to perform as well in the future as it did in the past.

In financial trading, such a simulation of past performance is called a **backtest.**

Despite what you’ll read online in what I’ll call *retail trading folklore*, backtesting your strategies is not as simple as aligning signals with entry and exit prices and summoning the results. Simply doing a backtest is one thing, but gaining accurate, actionable data that will help you keep more of your capital is another

Such simplistic approaches will undoubtedly lead to *some ***backtest bias**.

In this post, we’re going to focus on a few ways that biases in your **development methodology** can hold you back from successful trading. Many of these effects are subtle yet profound: they can and will creep into your development process and can have disastrous effects on strategy performance.

If you could travel back in time, you’d probably be quite tactical about your “creative” work wherever you landed. Google? Your idea. *Predicting* future events to become a local deity? Hold my feather crown.

In trading, you might want to hold off on all that. Look-ahead bias is introduced by allowing future knowledge to affect your decisions around historical scenarios or events. As a trader running backtests, this bias impacts your trade decisions by **acting upon knowledge that** **would not have been available** at the time the original trade decision was taken.

What does this look like in practice?

A popular example is executing an intra-day trade on the basis of the day’s closing price, when that closing price is not actually known until the end of the day.

Even if you use backtesting software that’s designed against lookahead bias, you need to be careful. A subtle but potentially serious mistake is to use the entire simulation period to calculate a trade parameter (for example, a portfolio optimization parameter) which is then retrospectively applied at the beginning of the simulation.

This error is so common that you must always double check for it. And triple check if your backtest looks really good.

If you run a backtest producing annual returns in the ballpark of **thousands of percent**, don’t quit your day job – you’ve likely stumbled across overfitting bias. Apart from being comedic firewood for your favourite FX forum, these backtests are useless for real, systematic trading purposes.

Check out the following plots to see overfitting in action. The blue squares in the figure below are an artificially generated quadratic function with some noise added to distort the underlying signal. The lines represent various models fitted to the data points. The red line is a linear regression line; the green, blue and orange lines are quadratic, cubic and quartic functions respectively. Apart from the linear regression line, these all do a decent job of modelling the data, for this region in the parameter space.

The pink line is a high-order polynomial regression: notice that it fits this data best of all:

But do these models hold up out-of-sample? What I’m really asking is, how well do they generalize to data that was not used in the model-fitting process?

Well, the next plot shows the performance of the quadratic, cubic and quartic functions in a new region of the observed variable space, meaning an out-of-sample data set. In this case, the quadratic function is *clearly the best performer*, and we know that it most closely matches the underlying generating function – **this is an example of a well-fit model.**

The other models do a pretty crummy job of predicting the value of the function for this new, unseen region of parameter space, even though they looked pretty attractive on the in-sample data.

The best model on the in-sample data set, the high-order polynomial, does a terrible job of modeling this out-of-sample region. In fact, in order to see it, we have to look at a completely different portion of the y-axis, and even use a logarithmic scale to make sense of it:

This model is predicting hugely negative values of our function when we know that it could *never* generate a single negative value (thanks to the quadratic term in the underlying function). The function looks nothing like a quadratic function: it is more like a hyperbolic function. Or bad modern art.

This misrepresentation of the underlying process is a classic example of overfitting, and it’ll have you banging your head against the wall a lot in your early days of algo trading. In fact, you’ll face this problem *every day *in your strategy development, you just learn how to eat its punches with experience.

Overfitting bias affects strategies that are tested on in-sample data. The same data is used to optimize and then test the strategy. Common sense will tell you that a strategy will perform well on the data with which it was optimized – *that’s the whole point of optimization! *What’s more, exhaustively searching the parameter space and choosing a local performance maximum will undoubtedly lead to overfitting and failure in an out-of-sample test.

It is crucial to understand that the purpose of the in-sample data set is *not* to measure the performance of a strategy. The in-sample data is used to ** develop** the strategy and find parameter values that may be suitable. At best, you should consider the in-sample results to be

*Avoid using the in-sample results to benchmark likely future performance of a strategy.*

Look again at the figures above. The model with the best in-sample performance was the high-order polynomial shown by the magenta line. That said, its out-of-sample performance was as enviable as stepping on a plug. The quadratic, cubic and fourth-order models all performed reasonably well in the in-sample test, but the quadratic model was the clear star performer in the out-of-sample test. Obviously, you can infer little about performance on unseen data using in-sample testing.

Here’s the real insidious part….

When you fit a model (a trading strategy) to a noisy data set (and financial data is a rave), **you risk fitting your model to the noise, rather than the underlying signal.** The underlying signal is the anomaly or price effect that you believe provides profitable trading opportunities, and this signal is what you are actually trying to capture with your model.

Noise gets between you and the money. It’s a random process, and it’s unlikely to repeat itself exactly the same way. If you fit your model to the noise, you’ll end up with a random model. Unless you enjoy paying for your broker’s 12oz rib-eye steak, this isn’t something you should ever trade.

So what’s the overarching lesson from all this?

Well, in-sample data is only useful in the following ways:

- Finding out whether a strategy can be profitable and under what conditions
- Determining which parameters have a significant impact on performance
- Determining sensible ranges over which parameters might be optimized
- Debugging the strategy, that is, ensuring trades are being entered as expected

Given the topic of this post, you’ll notice something missing from that list: *measuring the performance of a trading strategy. *

Any estimate of performance you derive from an in-sample test is plagued with overfitting bias and is likely to be an optimistic estimate – unless your entire development process is watertight…but that’s a story for another time.

The solution to overfitting bias is adopting a sensible approach to the markets and strategy development. This includes:

- Keeping trades simple. The fewer the number of fittable parameters, the better.
- Favouring trades that can be rationalised in a sentence, over blindly data mining for trading rules.
- Optimising for robustness, not in-sample performance (more on this later).
- Avoid the temptation to be precise in your model specification. Market data is noisy and fickle, and any signal is weak.
- Avoid trades that will, at best, marginally cover retail trading costs, such as scalping.

**Like an annoying uncle who won’t stop shouting at the 6 o’clock news, I’ll repeat myself: don’t use in-sample data to measure your strategy’s performance unless you have been very very very careful!**

Like taxes, this one is unavoidable. So, rather than spending an eternity trying to eliminate this bias entirely, just be aware of it and accept that, generally, your strategies won’t perform as well in the markets as they did in your simulations.

You’ll commonly introduce data-mining bias when selecting the best performer from a bunch of strategy variants, variables or markets to continue developing. If you’re persistent enough in trying strategies and markets, you’ll eventually find one that performs well simply due to luck.

Think of it this way. Say you develop a trend following strategy in FX. The strategy crushes its backtest on EUR/USD but flops on USD/JPY. Any sensible person would trade the EUR/USD market, right? Sure, but you’ve just introduced selection bias into your process. Now, your estimate of the strategy’s performance is upwardly biased. Should you throw the strategy in the bin? Not necessarily. Maybe it performed well on EUR/USD specifically for good reason. But nevertheless, some selection bias has crept into your development process.

There are statistical tests to account for data mining bias, including comparing the performance of the strategy with a distribution of random performances. You can find examples of this in the Robot Wealth Advanced Course.

You can also use out-of-sample data, but this quickly becomes problematic as we only have a finite amount of historical data on which to develop.

So your best bet to overcome selection bias is simply adopting a sensible, measured approach to strategy development, as outlined above.

As a rule of thumb, you want to build **robust trading strategies** that exploit real market anomalies or inefficiencies. What’s more, you want to do this with an approach grounded in **simplicity.** The more complex your approach, the more likely you are to fall into the traps we’ve talked about above. Either way, it’s surprisingly easy to find strategies that appear to do well, but whose performance turns out to be due to luck or randomness. That’s part of the game.

You have probably noticed that I introduced the concept of a “sensible approach to strategy development”. But what does that look like? We’ve covered it at a conceptual level, which is useful in its own right. That said, we think that teaching our approach in detail is much better accomplished through our Algo Bootcamps. In Bootcamp, you can learn first-hand by watching us apply the process in real time and participating in the development of real trading strategies. There are some very smart people inside our community, too, who can keep you on the right track.

This blog post originally appears as an excerpt from our *Algorithmic* *Trading with Zorro* course, which will be available in the coming weeks.

The post Backtesting Bias: <br>Feels Good, Until You Blow Up appeared first on Robot Wealth.

]]>The post Momentum Is Dead! Long Live Momentum! appeared first on Robot Wealth.

]]>As you might expect, we found evidence suggesting that risk premia are time-varying. If we could somehow predict this variation, we could use that prediction to adjust the weightings of our portfolio and quite probably improve the strategy’s performance.

This might sound simple enough, but we actually found compelling evidence both *for* and *against* our ability to time risk premia returns.

We’re always telling our Bootcamp participants that developing trading and investment strategies requires the considered balancing of evidence in the face of uncertainty. In this case, we decided that there was enough evidence to suggest that we could weakly predict time-varying risk premia returns, at least to the extent that slight weight adjustments in accordance with these predictions might provide value.

The strategy was already decent enough, so we were loath to add additional complexity that could bite us later. There was compelling evidence that our predictions could add value. But there was also a troubling deterioration in the quality of these predictions over time. In the end, we added only a very slight weight adjustment on the basis of these predictions.

Why am I telling you all this?

Well, I am *really* curious as to whether you would have made the same decision as we did. In this post, I’ll provide a bunch of our findings and let you make up your own mind. The best decision for us at the time was to only incorporate a very small timing aspect in our risk premia strategy and move on to something else. But I don’t think everyone would agree. This stuff is always context-dependent, and we all have a different context, but still, I’d love to hear what you would have done in the comments.

As I mentioned above, our strategy was already looking quite decent before we started exploring ways to time the market. Here’s a long-term backtest, before costs (many of our ETFs weren’t around for the entirety of this backtest, so we had to create synthetic asset data from indexes, mutual funds, and other relevant sources):

The strategy had a backtested Sharpe ratio of 1.22 and a Compound Annual Growth Rate (CAGR) of 6.6%. If we could lever it up 2x costlessly (which of course we can’t) we can bump up the CAGR to over 12%:

Over the same period, the S&P500 delivered a CAGR of around 8.3% at a Sharpe of approximately 0.6.

Two anomalies seem to pop up over and over again in the markets: *momentum* and *value*. The AQR paper *Value and Momentum Everywhere* is a good summary for the uninitiated. Essentially, the authors demonstrate a momentum and value effect *within* every asset class they look at, as well as *across* asset classes. This suggests that we might be able to use relative momentum and value rankings across the assets in our risk premia universe as a simple prediction of future returns.

We ended up ignoring the value effect for now (we ran out of time, and the strategy was good enough to get into the market at the end of the Bootcamp, but we’ll likely revisit this in the future), and instead focused on the momentum effect across our risk premia universe.

The thing with momentum is that we don’t really know exactly what it is or how to calculate it. So we deferred to the simplest approach we could think of to estimate it: the rate of change of price over some formation period.

We performed a classic rank-based factor analysis by:

- Calculating our momentum estimate.
- Ranking each of our assets according to this estimate.
- Looking at subsequent returns over some holding period for each rank.

So our momentum analysis is really subject to two parameters: the formation period used in the momentum estimate, and the holding period used to assess the momentum factor’s relationship with future returns.

We looked at all combinations of 1, 3, 6, 9 and 12 month formation periods and 1, 3, 6, 9 and 12 month hold periods. We found a clear and persistent momentum effect, at least on average over the whole sample.

Here’s a selection of factor rank bar plots showing the mean future return by momentum rank (1 representing the highest momentum, 8 the lowest):

We actually saw some sort of momentum effect in every combination of formation and holding period that we looked at.

Next, we tried to quantify the strength of this cross-sectional momentum effect. We did that by looking at the difference in annualised returns between the top and bottom *n* assets by momentum rank (again, for a combination of formation and holding periods).

Here are some plots that show the difference in annualised returns between the top and the bottom *n* assets by momentum rank. The formation period (in months) is on the x-axis. The holding period (in months) is on the y-axis. The colour represents the magnitude of outperformance of the top-ranked asset.

First, for *n = 2:*

And for *n = 4*:

We see that pretty much across the board, *assets with higher recent momentum tend to outperform those with lower recent momentum*, again on average over the whole sample of our data.

We can also see that the effect is *greater the shorter the holding period. *This is unsurprising, but from a strategy development point of view is somewhat disappointing, because the shorter the holding period, the more frequently we’d need to adjust our positions to capitalise on the effect and the higher our cost of trading. Nothing comes for free, apparently.

To summarise our findings to this point, we see a strong momentum effect for formation periods of 3 to 12 months. And the effect is stronger for shorter holding periods.

You might, therefore, be convinced (and indeed many of our Bootcamp participants were) that we should only hold the assets in our risk premia universe with the highest 3-12 month momentum.

But so far we’ve only looked at the *mean* momentum outperformance over the entire 20-year data set. Markets are dynamic and noisy, and looking at summary statistics like the mean can hide important information.

Therefore, before we made any decisions, we looked at the consistency of momentum outperformance over time.

Here are some plots of 3-month and 6-month momentum outperformance over time for formation periods 3 to 12.

The dots represent mean outperformance of top-ranked assets over the holding period *annualised over the given year*. The lines are LOESS curves.

These plots suggest that the momentum train has been running out of steam for a number of years now. That is, we see a clear decline in the cross-sectional momentum effect over the sample period.

The momentum effect over the *whole sample* is significant. But the *decaying performance* suggests caution in trading the effect aggressively.

At this point, many of our Bootcamp participants were wondering why we’d still bother looking at momentum given the clear decaying performance in the previous charts.

But at this point, we were still taking it seriously because it has worked *exceptionally* well for *as long as we have history available.* But there is a real question to whether the increased turnover and potential reduction in diversification as we rotate into high-momentum assets is justified given the decaying performance.

Here’s a backtest for a strategy which, every month:

- ranks each asset according to its trailing six-month returns
- selects the top four assets and weights each in inverse proportion to its volatility over the previous three months

This backtest has a before-cost CAGR of 8.9% at a Sharpe ratio of 1.14. This is a higher return than our baseline strategy at a similar, though slightly lower, Sharpe ratio – probably due to a reduction in diversification.

We can get some insight into what this strategy is doing by looking at its asset weights over time:

Compare this to the asset weights of our baseline strategy:

Chalk and cheese. The momentum strategy has higher returns and better drawdown control. It’s lower Sharpe comes by way of increased concentration (reduced diversification), and it turns out that it has over *5x the turnover*.

We weren’t overly impressed by this trade-off. Specifically, we weren’t sold on the idea that there’s enough evidence to convince us to run a momentum strategy at the expense of diversification (what do you think? Let us know in the comments). However, despite its decay over the last decade or two, the historic momentum outperformance is *remarkable*. You won’t see a much bigger anomaly than that. We could therefore certainly entertain overweighting assets with high relative momentum and underweighting those with low relative momentum, based on the evidence we’ve seen to date.

Intuitively, we prefer a more subtle way to incorporate the momentum effect, one that adjusts portfolio weights slightly based on our estimate of relative (cross-sectional) momentum. That way, we’re always holding *some* of each asset in our universe, but we might be underweight when an asset class has been underperforming relative to the others.

It’s possible to get super-complicated with this (Black-Litterman, Bootstrapping, etc.). Knowing that any improvement is likely to be marginal above our already-decent strategy, we decided not to try anything too complicated here. We simply adjusted our baseline asset weights slightly depending on the relative momentum factor.

Here’s how that backtests:

This gives a CAGR of 7.4% at a Sharpe of 1.3. Here are the asset weights:

The portfolio is consistently well diversified and we’ve increased returns and Sharpe ratio over the baseline strategy. However, it turns over about 2x more than the baseline strategy. We feel that this is a much more attractive trade-off. Do you?

In our recent Bootcamp, we took a deep dive on the momentum effect and tried to make a sensible decision about incorporating it into our risk premia strategy. The evidence for the momentum effect includes:

- A wealth of empirical evidence in favour of momentum over many years
- On average, a clear and persistent momentum effect (noting that
*when looking at averages much detail is hidden*) - On average, clear outperformance of top-ranked assets over bottom

The evidence against includes:

- Outperformance is more pronounced for shorter hold periods, which implies more frequent rebalancing and higher costs
- Recent deterioration across the board

How can we weigh up this evidence in the context of our risk premia strategy? Here’s a summary of our thought process:

- We are confident that exposure to risk premia is a good idea that is rewarded over the long term.
- Every time we are not in the market we are giving up exposure to that risk premia.
- So we need to be pretty confident in our timing ability to get out of the market.
- We are not
*that*confident. - We can see an obvious and clear momentum effect (at least in the past).
- This effect has deteriorated in the past decade or two.
- We give up quite a lot to access the momentum effect if we make binary decisions to get in or out of an asset. Specifically, we give up exposures to certain risk premia at certain times, and we give up diversification benefit on our portfolio variances.
- We also increase turnover significantly.
- We can, to an extent, have our cake and eat it too by adjusting baseline asset weights based on the cross-sectional momentum factor, rather than making binary in-out decisions.
- Using this approach, we give up some of the momentum effect, but we retain the benefits of diversification as well as constant exposure to the risk premia.

That thought process seems to logically suggest that the momentum adjustment approach makes the most sense in the context of our risk premia strategy. This also implies that we need to give some thought to *how* we implement these adjustments.

In the end, we decided that a simple adjustment to the baseline weights based on our estimate of the momentum factor is sufficient – it affords us simple and easy access to the momentum effect without compromising our exposure to risk premia or the benefits of diversification. Previously, we alluded to some more complex approaches for adjusting these weights, such as Black-Litterman, which would probably allow us to squeeze out a couple more drops of performance.

But that’s not the best use of our time given the bigger picture of our broader trading operation. First and foremost, we’re not building a risk premia strategy – we’re helping our members and Bootcamp participants build out their trading capability. At the early stages, we stand to gain *a lot* from adding additional edges to our portfolio. We would probably gain *something* from a more complex momentum tilt on our risk premia strategy, but it’s going to be nowhere near as beneficial as diversifying across strategies. So we opted for a simple approach that gets us into the market and hot on the trail of active alpha strategies to add to the portfolio.

This bigger picture will change. When our portfolio is more mature, it will likely make a lot of sense to revisit the risk premia strategy and try to squeeze a little more out. There might well come a time when this is our biggest or most sensible opportunity. But that time isn’t now.

- Our strategy is based on exposure to risk premia for the long term.
- We think we might be able to gain some benefit from trying to time our risk premia exposures.
- Momentum has been a remarkable anomaly for a long time.
- Its performance has deteriorated for a decade or two.
- Strategy design is all about weighing evidence in the face of uncertainty.
- Context matters both at the strategy level and the bigger picture trading operation level.
- For our specific context, we found a way to incorporate momentum timing into our risk premia strategy with sensible trade-offs.

One of the most fun things about independent trading is not only weighing the strategy-level evidence that you collect yourself, but deciding what it actually means for *your* specific situation. No one can tell you what the right answer is – partly because it doesn’t exist, and partly because everyone’s context is different. *You* have to make a decision at some point and take action based on *your* best judgment. It’s the ultimate exercise in backing yourself and taking responsibility for your own decisions. That’s also why I think trading isn’t for everyone – not everyone is comfortable taking on that level of responsibility. But if you do, then trading is the best game in town.

The tricky part about weighing evidence and making smart trading decisions in the face of uncertainty is that *it takes experience to do it well*. You get that experience by getting kicked around in the markets for a few years – which isn’t particularly enjoyable or financially rewarding. In our Bootcamp program, we pass on the experience and intuition that we fought hard for over many years, minus the battle scars that we picked up along the way.

If that sounds like something you could benefit from, join the waiting list for our next Bootcamp program.

The post Momentum Is Dead! Long Live Momentum! appeared first on Robot Wealth.

]]>The post Harvesting Risk Premia appeared first on Robot Wealth.

]]>*This article is part of a series derived from our most recent Algo Boot Camp, in which we developed a strategy for harvesting risk premia. We have allocated proprietary capital to the strategy, and many of our members are trading it too.*

*In our Boot Camps we develop trading strategies in collaboration with the Robot Wealth community over an 8 week period. **The Boot Camp format is proving incredibly useful for teaching our members how to research, develop, think about the markets and execute real trading strategies. They get to watch us do it every step of the way, and watch every decision we make.*

*In our next Boot Camp, we’ll be developing a portfolio of active FX strategies. Find out more about Robot Wealth’s Algo Boot Camps, including how you can be a part of the next one, here.*

Trading and investing doesn’t have to be complicated. Check out this chart:

The blue line shows returns from US Stocks from 1900 to today. That’s a 48,000x increase in nominal value.

The yellow line shows returns from US Bonds from 1900 to today. That’s a 300x increase in nominal value.

So it’s pretty obvious what we need to do in order to make money in the markets. Assuming I have a fairly long investment horizon, I buy the stocks, I buy the bonds. I go to the beach.

But of course…it’s not quite that simple. Unless you’re a robot.

If you are human with normal human fears, feelings and lifestyle and income uncertainties, then we can’t discuss the rewards of buying stocks and bonds without discussing the risks.

The reason that stocks tend to go up in value over the long run is that they have a tendency to go down in value – sometimes quite considerably – in the short and medium term.

Look again at the chart above. Notice the logarithmic y-axis. That’s the best way to look at long term asset prices. But it does tend to misrepresent what the experience of holding US stock exposure over that period would actually have been like.

Check out this chart, which takes the blip in the red square, corresponding to the GFC, and plots the S&P 500 in dollar terms.

That 50% decline looks benign in the long-term chart, but how would you really feel if your million-dollar stock portfolio was suddenly worth $500k?

Obviously, it’s not very much fun to watch half of your asset value crumble in front of your eyes.

It bears repeating: the reason stocks go up in the long term, is that they tend to go down (sometimes violently) in the short and medium term. That’s a highly unattractive quality for an investment asset – so holders demand some kind of *reward* or *premium* for taking on that risk.

But it’s not just stocks. Any asset whose fundamental value is dependent on uncertain factors – or “risk” – tends to increase in value over the long term, more than the interest you would receive on the same amount of money. Rather than saying that investors are compensated for investing in particular assets, we instead say that investors are compensated for * taking on risk * – hence the concept of “risk premia.”

Under this paradigm, investing becomes an exercise in risk management. And good risk management requires a decent understanding of the risks being taken, coupled with some intuition around why reward should flow to the investor for taking on a particular risk.

If this sounds weird, consider that pretty much any investment you might make is based around you anticipating some reward or payoff, knowing that there’s some level of risk involved. For instance, say you purchase a government bond. In this case, you know with a fairly high level of certainty what the reward will be at maturity. The risks that you bear in making this investment are the chance of the government defaulting, as well as the volatility in the price of the bond between the purchase time and maturity (this is risky in the sense that if you needed to liquidate prior to maturity, volatility exposes you to the risk of making a loss on the sale).

If you instead invested in a stock, you may have a much less certain idea of the expected reward. In addition, the risks associated with stock investing are usually greater than buying bonds – just look at the historical volatility of stock indexes compared with bond markets.

The different risk-reward profiles of these investments (including their uncertainty) should give pause to the investor to consider their approach. Is one investment superior to the other? Should you put all your eggs in one basket? Is there an optimal allocation into both investments?

These questions are really the crux of risk premia investing and no doubt you can see that an understanding of the risks associated with each investment is key to any investment decision.

In a practical sense, being long risk premia means buying and holding assets that are exposed to various *risk factors.*

A risk factor is simply a class of risks that explain (or partially explain) the reward associated with buying and holding an asset. One model of common risk factors might include:

**real interest rates**: the risk of exposure to changing inflation-adjusted interest rates – in simple terms, this consists of the risk of incurring opportunity cost, and all investable assets carry this risk**inflation**: the risk that cash received from an investment won’t be worth what you thought it would thanks to prices rising relative to the value of cash**credit:**the risk that a counter-party is unable to meet the terms of an agreement**liquidity:**the risk that there won’t be a counter-party to whom to sell your asset, without incurring significant costs**growth:**the risk of uncertainty in economic growth, and macroeconomic conditions changing unexpectedly**political:**risks associated with changing regulation and political instability

We can think of a particular asset as being composed of various risk factors. For instance a US government bond is mostly going to be exposed to inflation risk. It carries little to no credit risk, since the US Treasury is almost sure to pay you back. A stock, on the other hand, is going to be exposed to all sorts of risk including economic, political, inflation and liquidity risk.

Here’s a chart that demonstrates this concept of “assets as risk factors:”

Being “long risk premia” is equivalent to being long some combination of risk factors. * But what is the optimal combination?* Does an optimal combination even exist from a trader’s perspective? We explored these questions throughout the Boot Camp, but we can start the discussion by thinking about the different conditions that generally give rise to premia for taking on different types of risk.

Asking “when are different risks rewarded?” is the same as asking “when do different assets go up in value?”

If you think about it, it makes sense that different risks tend to be rewarded under different conditions. For instance, during a market crash or recession, bonds have tended to outperform in the past. That’s another way of saying that taking on inflation risk is rewarded. During an equities bull market, taking on economic risk is rewarded through rising stock prices and dividends.

Here’s a couple of charts showing the type of risks that are have generally been rewarded under various market conditions:

h/t: GestaltU – Dynamic Asset Allocation for Practitioners

Years | Environment | Factors Most Rewarded |
---|---|---|

1980 – 1991 | Post-Inflation | Inflation |

1992 – 1999 | Equity Bull Market | Credit, Growth, Liquidity |

2000 – 2003 | Tech Collapse | Real Interest Rates, Liquidity |

2004 – 2007 | Equity Bull Market | Growth, Political |

2008 – 2011 | GFC | Real Interest Rates |

2011 – 2018 | Long Equity Recovery | Real Interest Rates, Credit |

As with all things related to the markets, hindsight is a wonderful thing. Anyone can look back and work out which risks were rewarded in the past. The real trick is predicting which risks will be rewarded in the future.

Of course, no one has a crystal ball, so there’s always uncertainty around our forecasts. * Often we are more interested in managing this uncertainty than we are in absolute returns*.

Therefore, the goal often becomes to construct a portfolio of various risk factors in pursuit of a good trade-off between future reward and uncertainty.

There are two broad approaches to constructing portfolios of risk factors:

This involves constructing a portfolio that aims to deliver a certain level of performance regardless of the prevailing conditions. The Bridgewater All Weather fund is an example of this approach:

The investment objective and policy of the Fund are to provide attractive returns with relatively limited risks, with no material bias to perform better or worse in any particular type of economic environment. The portfolio is expected to perform approximately as well in rising or falling inflation periods, or in periods of strong or weak economic growth.

- The All Weather Story, Bridgewater, 2016

Such portfolios will typically have proportionately significant dollar exposure to long and intermediate term government bonds, smaller dollar exposure to equities, and a minor allocation to gold and possibly other commodities. But many variations on this theme exist.

The significant exposure to low volatility, positive carrying, fixed income assets tends to give these sorts of portfolios a relatively smooth performance curve, at the expense of the additional upside that’s possible from exposure to equities.

This involves moving into and out of various risk exposures based on some signal or forecast. The well-known Dual Momentum strategy is a simple, yet extreme, example of this approach, as it shifts the entire allocation between US equities, international equities and government bonds. Most variants of tactical allocation instead involve re-weighting the portfolio’s allocation to be overweight certain assets at certain times, while still maintaining some allocation to other factors.

Many variations on the tactical allocation theme exist, and a significant proportion of the funds management industry is based on this approach.

But here’s the thing about tactical allocation: it’s * hard. *The premise of this approach is that a skilled manager can outperform a permanent allocation using clever timing and selection of factors. But using this approach, it’s all too easy to mis-time active decisions and wind up with something that underperforms a more strategic, low-turnover portfolio. So you can end up spending a lot of time and effort for little, or even negative reward.

Throughout the recent Boot Camp, we explored both approaches and developed an algorithm to manage our own long risk premia portfolio which combines both of these approaches. In the next blog articles, we’ll share with you some of the insights that we gained along the way.

The post Harvesting Risk Premia appeared first on Robot Wealth.

]]>