﻿ slider Archives - Robot Wealth

# slider Posted on May 28, 2020 by
0 Views

When data is too big to fit into memory, one approach is to break it into smaller pieces, operate on each piece, and then join the results back together. Here's how to do that to calculate rolling mean pairwise correlations of a large stock universe. Background We've been using the problem of calculating mean rolling correlations of ETF constituents as a test case for solving in-memory computation limitations in R. We're interested in this calculation as a research input to a statistical arbitrage strategy that leverages ETF-driven trading in the constituents. We wrote about an early foray into this trade. Previously, we introduced this problem along with the concept of profiling code for performance bottlenecks here. We can do the calculation in-memory without any trouble for a regular ETF, say XLF (the SPDR financial sector ETF), but we quickly run into problems if we want to look at SPY. In this post, we're going to explore one workaround for R's in-memory limitations by splitting the problem into smaller pieces and recombining them to get our desired result. The problem When... Posted on May 22, 2020 by
Recently, we wrote about calculating mean rolling pairwise correlations between the constituent stocks of an ETF. The tidyverse tools dplyr and slider solve this somewhat painful data wrangling operation about as elegantly and intuitively as possible. Why did you want to do that? We're building a statistical arbitrage strategy that relies on indexation-driven trading in the constituents. We wrote about an early foray into this trade - we're now taking the concepts a bit further. But what about the problem of scaling it up? When we performed this operation on the constituents of the XLF ETF, our largest intermediate dataframe consisted of around 3-million rows, easily within the capabilities of modern laptops. XLF currently holds 68 constituent stocks. So for any day, we have  $\frac{68*67}{2} = 2,278$ correlations to estimate (67 because we don't want the diagonal of the correlation matrix, take half as we only need its upper or lower triangle). We calculated five years of rolling correlations, so we had  $5*250*2,278 = 2,847,500$ correlations in total. Piece of cake. The problem gets a lot... 