dplyr

Posted on May 28, 2020 by Kris Longmore
No Comments.
0 Views

When data is too big to fit into memory, one approach is to break it into smaller pieces, operate on each piece, and then join the results back together. Here's how to do that to calculate rolling mean pairwise correlations of a large stock universe. Background We've been using the problem of calculating mean rolling correlations of ETF constituents as a test case for solving in-memory computation limitations in R. We're interested in this calculation as a research input to a statistical arbitrage strategy that leverages ETF-driven trading in the constituents. We wrote about an early foray into this trade. Previously, we introduced this problem along with the concept of profiling code for performance bottlenecks here. We can do the calculation in-memory without any trouble for a regular ETF, say XLF (the SPDR financial sector ETF), but we quickly run into problems if we want to look at SPY. In this post, we're going to explore one workaround for R's in-memory limitations by splitting the problem into smaller pieces and recombining them to get our desired result. The problem When...

Posted on May 22, 2020 by Kris Longmore
No Comments.
0 Views

Recently, we wrote about calculating mean rolling pairwise correlations between the constituent stocks of an ETF. The tidyverse tools dplyr and slider solve this somewhat painful data wrangling operation about as elegantly and intuitively as possible. Why did you want to do that? We're building a statistical arbitrage strategy that relies on indexation-driven trading in the constituents. We wrote about an early foray into this trade - we're now taking the concepts a bit further. But what about the problem of scaling it up? When we performed this operation on the constituents of the XLF ETF, our largest intermediate dataframe consisted of around 3-million rows, easily within the capabilities of modern laptops. XLF currently holds 68 constituent stocks. So for any day, we have  [latex] \frac{68*67}{2} = 2,278 [/latex] correlations to estimate (67 because we don't want the diagonal of the correlation matrix, take half as we only need its upper or lower triangle). We calculated five years of rolling correlations, so we had  [latex] 5*250*2,278 = 2,847,500 [/latex] correlations in total. Piece of cake. The problem gets a lot...

Posted on May 20, 2020 by Kris Longmore
1 Comment.
0 Views

Working with modern APIs you will often have to wrangle with data in JSON format. This article presents some tools and recipes for working with JSON data with R in the tidyverse. We'll use purrr::map functions to extract and transform our JSON data. And we'll provide intuitive examples of the cross-overs and differences between purrr and dplyr. library(tidyverse) library(here) library(kableExtra) pretty_print <- function(df, num_rows) { df %>% head(num_rows) %>% kable() %>% kable_styling(full_width = TRUE, position = 'center') %>% scroll_box(height = '300px') } Load JSON as nested named lists This data has been converted from raw JSON to nested named lists using jsonlite::fromJSON with the simplify argument set to FALSE (that is, all elements are converted to named lists). The data consists of market data for SPY options with various strikes and expiries. We got it from the options data vendor Orats, whose data API I enjoy almost as much as their orange website. If you want to follow along, you can sign-up for a free trial of the API, and load the data directly from the Orats API with the...

Posted on May 18, 2020 by Kris Longmore
5 comments.
0 Views

How might we calculate rolling correlations between constituents of an ETF, given a dataframe of prices? For problems like this, the tidyverse really shines. There are a number of ways to solve this problem … read on for our solution, and let us know if you'd approach it differently! First, we load some packages and some data that we extracted earlier. xlfprices.RData contains a dataframe, prices_xlf, of constituents of the XLF ETF and their daily prices. You can get this data from our GitHub repository. The dataset isn't entirely accurate, as it contains prices of today's constituents and doesn't account for historical changes to the makeup of the ETF. But that won't matter for our purposes. library(tidyverse) library(lubridate) library(glue) library(here) theme_set(theme_bw()) load(here::here("data", "xlfprices.RData")) prices_xlf %>% head(10) ## # A tibble: 10 x 10 ## ticker date open high low close volume dividends closeunadj inSPX ## <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <lgl> ## 1 AFL 2019-11-29 54.8 55.1 54.8 54.8 1270649 0 54.8 TRUE ## 2 AIG 2019-11-29 52.8 53.2 52.6 52.7 2865501 0 52.7 TRUE ##...

Posted on May 14, 2020 by Robot James
3 comments.
0 Views

In this post, we're going to show how a quant trader can manipulate stock price data using the dplyr R package. Getting set up and loading data Load the dplyr package via the tidyverse package. if (!require('tidyverse')) install.packages('tidyverse') library(tidyverse) First, load some price data. energystockprices.RDS contains a data frame of daily price observations for 3 energy stocks. prices <- readRDS('energystockprices.RDS') prices We've organised our data so that Every column is variable. Every row is an observation. In this data set: We have 13,314 rows in our data frame. Each row represents a daily price observation for a given stock. For each observation measure the open, high, low and close prices, and the volume traded. This is a very helpful way to structure your price data. We'll see how we can use the dplyr package to manipulate price data for quant analysis. The main dplyr verbs There are 6 main functions to master in dplyr. filter() picks outs observations (rows) by some filter criteria arrange() reorders the observations (rows) select() picks out the variables (columns) mutate() creates new variables (columns) by...

Posted on May 12, 2020 by Robot James
2 comments.
0 Views

In this post, we are going to construct snapshots of historic S&P 500 index constituents, from freely available data on the internet. Why? Well, one of the biggest challenges in looking for opportunities amongst a broad universe of stocks is choosing what stock "universe" to look at. One approach to dealing with this is to pick the stocks that are currently in the S&P 500 index. Unfortunately, the stocks that are currently in the S&P 500 index weren't all there last year. A third of them weren't there ten years ago... If we create a historical data set by picking current S&P 500 index constituents, then we will be including historical data for smaller stocks that weren't in the index at that time. These are all going to be stocks that did very well, historically, or else they wouldn't have gotten in the index! So this universe selection technique biases our stock returns higher. The average past returns of current SPX constituents is higher than the average past returns of historic SPX constituents, due to this upward bias. It's easy...

Posted on Apr 30, 2020 by Robot James
2 comments.
0 Views

In the eye of the recent storm, with VIX up over 50, many traders were looking to "short the VIX" using products like TVIX. “Surely it's going to coming back down?” Well yeah, it will, eventually, but that doesn't mean that you can profitably short VIX products. First, some basics… What is VIX? VIX is a benchmark index for SPX volatility. It shows the SPX options market's expected value of volatility over the next 30 days. You can think of it as “What does the SPX options market think SPX volatility will be over the next 30 days?” Technically it is calculated based on the 30-day variance swap and then converted to a volatility calculation. So the VIX index is just a mathematical calculation… you can't trade it. So what can you trade? VX Futures The CBOE lists VX futures which settle to the cash value of the VIX index at their expiration date. For example, the 20 May VX contract settles to the value of the VIX index on 20 May. (Give or take… there are a few technicalities...

Posted on Mar 31, 2020 by Kris Longmore
1 Comment.
0 Views

To say we're living through extraordinary times would be an understatement. We saw the best part of 40% wiped off stock indexes in a matter of weeks, unprecedented co-ordinated central bank intervention on a global scale, and an unfolding health crisis that for many has already turned into a tragedy. As an investor or trader, what do you do? You manage your exposures the best you can, dial everything down, and go hunting for the opportunities that inevitably present themselves in a stressed out market. We've been hunting pretty much since this thing kicked off - and we want to show you what we found. And, more importantly, the tools and approach we used to find them. To that end, we are opening the gates to our Robot Wealth Pro community, a tight-knit network of independent traders with whom we share our firm's research, data, systematic trading strategies, and real-time ideas. We normally insist that you go through an introductory Bootcamp before joining our Pro team, but these are extraordinary times and we want to get after these opportunities as...