# quantitative analysis

Holding data in a tidy format works wonders for one's productivity. Here we will explore the tidyr package, which is all about creating tidy data. In particular, let's develop an understanding of the tidyr::pivot_longer and tidyr::pivot_wider functions for switching between different formats of tidy data. In this video, you'll learn: What tidy data looks like Why it's a sensible approach The difference between long and wide tidy data How to efficiently switch between the two format When and why you'd use each of the two formats What's tidy data? Tidy data is data where: Every column is variable. Every row is an observation. Every cell is a single value. Why do we care? It turns out there are huge benefits to thinking about the “shape” of your data and the best way to structure and manipulate it for your problem. Tidy data is a standard way of shaping data that facilitates analysis. In particular, tidy data works very well with the tidyverse tools. Which means less time spent transforming and cleaning data and more time spent solving problems. In...

If you want to make money trading, you're going to need a way to identify when an asset is likely to be cheap and when it is likely to be expensive. You want to be a net buyer of the cheap stuff and a net seller of the expensive stuff. Thanks, Capitain Obvious. You're welcome. How does this relate to equity options? If we take the (liquid) US Equity options market as an example then there are an absolute ton of options contracts you could be trading. 95% of them are sufficiently fairly valued that you won't make much money trading them once you've paid all the costs to buy and sell them and hedge your risk. The remaining 5% are worth looking for. Options have a positive dependency on volatility. In looking for "cheap" or "expensive" options, we're really looking for cheap or expensive "volatility". So we ask the following questions: When does the forward volatility "implied" by options prices tend to be lower than the volatility that realises in the subsequent stock price process? We would look to buy...

There are 2 good reasons to buy put options: because you think they are cheap because you want downside protection. In the latter case, you are looking to use the skewed payoff profile of the put option to protect a portfolio against large downside moves without capping your upside too much. The first requires a pricing model. Or, at the least, an understanding of when and under what conditions put options tend to be cheap. The second doesn't necessarily. We'll assume that we're going to have to pay a premium to protect our portfolio - and that not losing a large amount of money is more important than the exact price we pay for it. Let's run through an example… We have a portfolio comprised entirely of 100 shares of SPY. About $29k worth. We can plot a payoff profile for our whole portfolio. This is going to show the dollar P&L from our portfolio at various SPY prices. At the time of writing, SPY closed at $287.05 if (!require("pacman")) install.packages("pacman") pacman::p_load(tidyverse, rvest, slider, tidyquant, alphavantager, kableExtra) SPYprice <- 287.05...

We've been working on visualisation tools to make option pricing models intuitive to the non-mathematician. Fundamental to such an exercise is a way to model the random nature of asset price processes. The Geometric Brownian Motion (GBM) model is a ubiquitous way to do this. We can represent the price of an asset at time [latex] t [/latex] as the state [latex] x(t) [/latex] of a GBM process. [latex] x(t) [/latex] satisfies an Ito differential equation [latex display="true"] dx(t) = \mu x(t) dt + \sigma x(t) dw(t) [/latex] where [latex] w(t) [/latex] follows a wiener process with drift [latex] \mu [/latex] and volatility [latex] \sigma [/latex]. The probability distribution of future prices follows a log-normal distribution [latex display="true"] [(\mu - \frac{\sigma^2}{2}) + \log{x_0}, \sigma\sqrt{t}] [/latex] OK, nerd, but how do I get the probability distribution of future prices from the starting price of the asset, and assumptions about the return distribution? I couldn't work that out quickly, so I asked Wolfram Mathematica that question. PDF[GeometricBrownianMotionProcess[\[Mu], \[Sigma], Subscript[x, 0]][t], x] [latex display="true"] \frac{exp{\frac{(-t(\mu - \frac{\sigma^2}{2}) + \log{x} - log{x_0})^2}{2t\sigma^2}}}{\sqrt{2\pi}\sqrt{t}x\sigma}, x > 0...

When tinkering with trading ideas, have you ever wondered whether a certain variable might be correlated with the success of the trade? For instance, maybe you wonder if your strategy tends to do better when volatility is high? In this case, you can get very binary feedback by, say, running backtests with and without a volatility filter. But this can mask interesting insights that might surface if the relationship could be explored in more detail. Zorro has some neat tools that allow us to associate data of interest with particular trading decisions, and then export that data for further analysis. Here's how it works: Zorro implements a TRADE struct for holding information related to a particular position. This struct is a data container which holds information about each trade throughout the life of our simulation. We can also add our own data to this struct via the TRADEVAR array, which we can populate with values associated with a particular trade. Zorro stores this array, along with all the other information about each and every position, as members of the TRADE struct....

In our inaugural Algo Bootcamp, we teamed up with our super-active community of traders and developed a long-only, always-in-the-market strategy for harvesting risk premia. It holds a number of different ETFs, varying their relative weighting on a monthly basis. We're happy with it. However, the perennial question remains: can we do better? As you might expect, we found evidence suggesting that risk premia are time-varying. If we could somehow predict this variation, we could use that prediction to adjust the weightings of our portfolio and quite probably improve the strategy's performance. This might sound simple enough, but we actually found compelling evidence both for and against our ability to time risk premia returns. We're always telling our Bootcamp participants that developing trading and investment strategies requires the considered balancing of evidence in the face of uncertainty. In this case, we decided that there was enough evidence to suggest that we could weakly predict time-varying risk premia returns, at least to the extent that slight weight adjustments in accordance with these predictions might provide value. The strategy was already decent enough, so...

What if you had a tool that could help you decide when to apply mean reversion strategies and when to apply momentum to a particular time series? That's the promise of the Hurst exponent, which helps characterise a time series as mean reverting, trending, or a random walk. For a brief introduction to Hurst, including some Python code for its calculation, check out our previous post. Even if you have read this post previously, it is worth checking out again as we have updated our method for calculating Hurst and believe this new implementation is more accurate. It would be great if we could plug some historical time series data into the Hurst algorithm and know whether we expect the time series to mean revert or trend. But as is usually the case when we apply such tools to the financial domain, it isn't quite that straightforward. In the last post, we noted that Hurst gives different results depending on how it is calculated; this begs the question of how to choose a calculation method intelligently so that we avoid choosing...

This post comes to you from Dr Tom Starke, a good friend of Robot Wealth. Tom is a physicist, quant developer and experienced algo trader with keen interests in machine learning and quantum computing. I am thrilled that Tom is sharing his knowledge and expertise with the Robot Wealth community. Over to you, Tom. Unlike most other businesses, algorithmic trading has the advantage of giving you almost instant feedback on how good you are in your business. For anyone who is numerically inclined this is a very attractive proposition. I have seen articles written about this subject but they have never really addressed a lot of the issues I have come across on my journey. In this post I would like to talk about this a little as an inspiration or perhaps a deterrent for all the people who read this and consider making money that way. Nothing could be more amazing, a system that runs by itself and consistently spits out cash to finance prolonged stays in Bali, South America or with your mom if that’s what you’re after. However,...

In the first Mean Reversion and Cointegration post, I explored mean reversion of individual financial time series using techniques such as the Augmented Dickey-Fuller test, the Hurst exponent and the Ornstein-Uhlenbeck equation for a mean reverting stochastic process. I also presented a simple linear mean reversion strategy as a proof of concept. In this post, I’ll explore artificial stationary time series and will present a more practical trading strategy for exploiting mean reversion. Again this work is based on Ernie Chan's Algorithmic Trading, which I highly recommend and have used as inspiration for a great deal of my own research. Go easy on my design abilities... In presenting my results, I have purposefully shown equity curves from mean reversion strategies that go through periods of stellar performance as well as periods so bad that they would send most traders broke. Rather than cherry pick the good performance, I want to demonstrate what I think is of utmost importance in this type of trading, namely that the nature of mean reversion for any financial time series is constantly changing. At times this dynamism can...

In the last article, I described an application of the k-means clustering algorithm for classifying candlesticks based on the relative position of their open, high, low and close. This was a simple enough exercise, but now I tackle something more challenging: isolating information that is both useful and practical to real trading. I'll initially try two approaches: Investigate whether there are any statistically significant patterns in certain clusters following others Investigate the distribution of next day returns following the appearance of a candle from each cluster The insights gained from this analysis will hopefully inform the next direction of this research. Data preliminaries In the last article, I classified twelve months of daily candles (June 2014 - July 2015) into eight clusters. To simplify the analysis and ensure that enough instances of each cluster are observed, I'll reduce the number of clusters to four and extend the history to cover 2008-2015. I'll exclude my 2015 data for now in case I need a final, unseen test set at some point in the future. Here's a subset of the candles over the entire price history (2008-2014, 2015...