Way back in November 2007, literally weeks after SPX put in its pre-GFC all-time high, Friedman, Hastie and Tibshirani published their Graphical Lasso algorithm for estimation of the sparse inverse covariance matrix.
Are you suggesting that Friedman and his titans of statistical learning somehow caused the GFC by publishing their Graphical Lasso algorithm?
Not at all. I’m just setting you up to demonstrate the fallacy of mistaking correlation with causation (thanks for playing along).
Seeing patterns where there are none is part of what it means to be human. Of course, Friedman and his gang of statisticians didn’t cause the GFC. But they did help us deal with our psychological flaws by providing us with a powerful tool for detecting spurious correlations.
Their tool allows one to figure out if variables are correlated with one another directly, or whether any measured connection is merely due to a common connection to something else.
Confusing? Let’s look at an example
Consider the two stocks ABB, a multinational industrial automation company, and PUK, a multinational life insurance company. Over the period 2015 to 2018, these companies’ returns had a correlation of around 0.6 – which suggests a significant relationship.
Now consider the classical pairs trade, where we bet on the convergence of related financial instruments following a dislocation in their prices. How would you feel about trading ABB and PUK as a pair? Would you be willing to bet that if they diverged, they’d come back together thanks to their significant correlation over the sample period?
Of course, the answer is no. You intuitively know that this isn’t a sensible bet. You know that these stocks aren’t moving together because they’re related to one another, but because they’re huge firms with similar beta to the broader market.
Said differently, even though they’re correlated, they don’t explain one another’s returns. A third variable, the S&P500 (to which they both have a strong relationship), is the main driver of their correlation.
ABB and PUK are therefore conditionally independent given the S&P500.
Conditionally independent means that there is no direct relationship between ABB and PUK when you account for the effects of other variables.
This is a common feature of correlation estimates from stock returns data. Such estimates are often misleading due to spurious correlations and the existence of confounding variables related to both returns series.
Where does the Graphical Lasso fit into this?
Given a series of stock returns:
- The Graphical Lasso can be used to estimate an inverse covariance matrix.
- The elements of the inverse covariance matrix are proportional to the partial correlation coefficients between a pair of stocks.
- The partial correlation of two variables is a measure of their relationship given all the other variables in the data set.
That is, the Graphical Lasso can help us remove effects such as market beta and recover real, direct relationships between stocks.
Does this tell us anything we didn’t already know?
To be fair, you don’t really need a fancy machine-learning algorithm to tell you that industrial automation companies aren’t directly related to life insurance providers. Intuition and common sense will do.
However, Graphical Lasso can still be useful:
- It can provide validation of relationships determined through other means (including intuition)
- It can highlight hidden relationships that don’t necessarily surface through other means
- It can quantify the strength of a relationship, since the magnitude of the partial correlation is informative
- It can help us construct visualisation tools, such as the interactive network graph we’ll build shortly, which help us reason about and understand a large universe stocks
Further, it makes intuitive sense that in a large universe, most stocks would be conditionally independent. Therefore, we’d favour an inverse covariance matrix that highlights strong relationships and zeroes correlations that are dependent on a third variable. The Graphical Lasso algorithm allows us to refine this sparsity condition by tuning it’s only parameter. More on this shortly.
The inverse covariance matrix’s relationship to partial correlation
There are some great resources that explore in excruciating detail the math behind the Graphical Lasso and the inverse covariance matrix.
There’s little point repeating that material here, but I do think there’s value in clarifying something that tripped me up. When I first used this tool, I assumed that the terms in the inverse covariance matrix were equivalent to the partial correlation between the two corresponding variables.
This is wrong.
The inverse covariance matrix is proportional to the partial correlation. There’s another step needed to transform an element in the inverse covariance matrix to the corresponding partial correlation. But perhaps surprisingly, information about how to do this transformation is in somewhat short supply.
After much searching, I found the details buried at about the eight-minute mark of the twenty-second (!) lecture of Ryan Tibshirani’s Statistical Machine Learning course taught at Carnegie Mellon in the Spring of 2017.
(Fun fact: Ryan is the son of Robert Tibshirani, one of the authors of the original graphical lasso paper. Imagine the dinner conversations in that household.)
On the off chance that you don’t have time to rip through a post-graduate course in statistical learning, here’s the critical information:
If \( R \) is a matrix of partial correlations and \( \Omega \) is the corresponding inverse covariance matrix, then the \( j^{th}, k^{th} \) element of \( R \) is given by:
\( R_{j,k} = \frac{-\Omega_{j,k}}{\sqrt{\Omega_{j,j}\Omega_{k,k}}} \)What’s super interesting about this relationship is that the partial correlation is proportional to the negative of the corresponding element of the inverse covariance matrix (the numerator in the equation above). Thus, if one were to assume that the elements of the inverse correlation matrix directly corresponded to the partial correlation, one would end up with anti-correlation where there was a positive correlation, and vice versa. Quite the error!
Applying the Graphical Lasso to stock data using R
Now for the fun part.
We’re going to take a universe of US equities and apply the Graphical Lasso algorithm to estimate an inverse covariance matrix. Then, we’ll apply the transform given by the equation above to construct a sparse matrix of partial correlations.
We can think of this sparse matrix as representing a network with edges (connections) between nodes (stocks ) that have some sort of relationship, independent of any of the other variables.
Thinking of our matrix in this way leads us to the concept of a network graph which we can use as a visual tool to aid our understanding of and ability to reason about a large universe of stocks.
Our data consists of daily returns for the top roughly 1,100 US stocks by market cap between 2010 and 2019. Each returns series is standardised to have zero mean and unit variance.
Firstly, we group stocks into clusters based on loadings to statistical factors obtained from Principal Components Analysis (PCA) using the DBSCAN clustering algorithm. In our graph, we will colour stocks according to their cluster. All going well, we should see more connections between stocks within the same cluster.
We’ll gloss over the code for performing the clustering operations here – the subject of another blog post perhaps.
Next, we calculate a covariance matrix of stock returns.
Since I can’t share our stock price database, you’ll find the covariance matrix and the output of the clustering algorithm linked below (in exchange for the princely sum of your email address).
[thrive_leads id=’19617′]
I’ll provide the code for you to reproduce the analysis from this point. We’ll use the glasso
package, which implements the Graphical Lasso algorithm, the igraph
package, which contains tools for building network graphs, and the threejs
and htmlwidgets
packages for creating interactive plots.
The first thing we need to do is load these and a few other packages and the data:
# install and load required packages required.packages <- c('glasso', 'colorRamps', 'igraph', 'RColorBrewer', 'threejs', 'htmlwidgets') new.packages <- required.packages[!(required.packages %in% installed.packages()[,"Package"])] if(length(new.packages)) install.packages(new.packages, repos='http://cran.us.r-project.org') library(glasso);library(colorRamps);library(igraph);library(RColorBrewer);library(threejs);library(htmlwidgets); # load data load("./clusters_covmat.RData")
This will load the covariance matrix into the variable S
and a dataframe of tickers and their corresponding clusters into the variable cl
.
Then, to apply the Graphical Lasso, we choose a value for rho
, which is the regularisation parameter that controls the degree of sparsity in the resulting inverse covariance matrix. Higher values lead to greater sparsity.
In our application, there is no “correct” value of rho
, but it can be tuned for your use case.
For instance, if you wanted to isolate the strongest relationships in your data you would choose a higher value rho
. If you were interested in preserving more tenuous connections, perhaps identifying stocks with connections to multiple groups, you’d choose a lower value of rho
. Finding a sensible value requires experimentation.
It’s also not a bad idea to check for symmetry in the resulting inverse covariance matrix. Assymmetry can arise due to numerical computation and rounding errors, which can cause problems later depending on what you want to do with the matrix.
# estimate precision matrix using glasso rho <- 0.75 invcov <- glasso(S, rho=rho) P <- invcov$wi colnames(P) <- colnames(S) rownames(P) <- rownames(S) # check symmetry if(!isSymmetric(P)) { P[lower.tri(P)] = t(P)[lower.tri(P)] }
Next, we calculate the partial correlation matrix and set the terms on the diagonal to zero – this prevents stocks having connections with themselves in the network graph we’ll be shortly constructing:
# calculate partial correlation matrix parr.corr <- matrix(nrow=nrow(P), ncol=ncol(P)) for(k in 1:nrow(parr.corr)) { for(j in 1:ncol(parr.corr)) { parr.corr[j, k] <- -P[j,k]/sqrt(P[j,j]*P[k,k]) } } colnames(parr.corr) <- colnames(P) rownames(parr.corr) <- colnames(P) diag(parr.corr) <- 0
Now if you run View(parr.corr)
in R Studio, you’ll see a very sparse partial correlation matrix. In fact, only about 6,000 of 1.35 million elements will contain non-zeroes! The non-zero elements represent a connection between two stocks, with the strength of the connection determined by the magnitude of the partial correlation. Here’s a snapshot that gives you an idea of the level of sparsity:
The partial correlation matrix can be used to build a network graph, where stocks are represented as nodes and non-zero elements are represented as edges between two stocks.
The igraph
package has some fantastic tools for building, manipulating and displaying graphs. We’ll only use a fraction of the package’s features here, but if you’re interested in getting to know it, check out Katya Ognyanova’s tutorial (it’s really excellent and got me up and running with igraph
in a matter of hours).
This next block of code constructs the network graph, assigns a colour to each node according to its cluster and drops any node with no connections.
# build network graph stock_graph <- graph_from_adjacency_matrix(parr.corr, mode="undirected", weighted=TRUE) # color by cluster V(stock_graph)$cluster <- as.numeric(cl$cluster) num_clusters <- length(unique(cl$cluster)) cols <- colorRamps::primary.colors(n=num_clusters+1) # hack to replace black colour with something else cols <- cols[2:length(cols)] # hack to replace black colour with something else V(stock_graph)$color <- cols[V(stock_graph)$cluster+1] # drop vertices with no edges isolated <- which(degree(stock_graph) == 0) stock_graph <- delete.vertices(stock_graph, isolated)
And finally, we can construct, save and view (in a browser) an interactive network graph:
# make interactive graph stock_graph_js <- graphjs(g=stock_graph, layout_with_fr(stock_graph, weights=30*E(stock_graph)$width, dim=3), # can choose other layout algorithms. `?layout` to get a list # vertex.shape = names(V(ig_wt)), # plot nodes as tickers rather than circles vertex.size=0.7, vertex.frame.color="white", vertex.frame.width=0.2, vertex.label=names(V(stock_graph)), # label nodes with tickers brush=TRUE, # enable highlighting clicked nodes and their connections showLabels=TRUE, # show node labels on hover edge.alpha=0.6, # edge opacity - lower helps when there are dense connections bg="black", # background colour main="Network graph from Graphical Lasso") # save graph graph_filename <- paste0("./network_graph_rho_", rho, ".html") saveWidget(stock_graph_js, file=graph_filename) # open in browser browseURL(graph_filename)
Here’s the resulting network graph. You can interact with it by:
- Clicking and dragging to rotate the graph
- Scrolling your mouse wheel to zoom in and out
- Hovering on a node to see the name of the stock
- Clicking on a node to highlight its connections
Pretty cool hey?
What insights do you get from exploring the graph? Most obviously we see:
- A large lime green cluster with strong intra-cluster partial correlations corresponding to banks, asset managers and insurance companies.
- Another darker green cluster with strong intra-cluster partial correlations corresponding to REITs.
- An orange cluster with strong intra-cluster partial correlations corresponding to utilities.
- Stocks coloured red were unclassified by our clustering process. But where a connection exists, it seems to make sense. For instance, the Graphical Lasso identified a connection between WYNN and LVS, which both operate resorts in Las Vegas.
- The smaller purple cluster consists of residential construction companies.
- There’s another darker purple cluster, but its intra-cluster connections are weaker, resulting in small, dispersed groups of stocks from this cluster. It consists of oil and gas companies.
- The small aqua coloured cluster corresponds to Canadian banks.
- Finally, the small peach coloured cluster corresponds to global banks listed as ADRs in the US.
This is all very interesting. But isn’t it telling us what we already know?
Yes and no.
The results all make intuitive sense. You don’t need a fancy algorithm to tell you that the casinos of Las Vegas are exposed to similar risk factors.
But what about stuff that’s been filtered from our universe of stocks? There are more residential construction companies than exist in that little purple cluster for instance. Is what’s been filtered valuable?
In our recent Machine Learning and Big Data Bootcamp, we built an equity pairs trading universe selection model that whittles down a list of several million potential pairs to around twenty to trade in the next period. One of the inputs to that model was a sparse partial correlation matrix estimated using Graphical Lasso.
We found that in fourteen of seventeen years, pairs with a non-zero partial correlation outperformed the wider universe of potential pairs in the next period:
This plot shows the unleveraged annualised returns to a simple pairs trading algorithm for pairs whose constituent stocks had a non-zero partial correlation (pinkish-red bars) versus returns to pairs in the wider universe (greenish-blue bars). For each year, the partial correlation matrix was estimated on the prior three years’ returns.
This is just one input to a bigger model, but clearly it’s a useful one. What’s more, we found that this added value beyond the obvious things such as pairing stocks in the same industry group.
We can also make a less sparse graph in order to explore inter-cluster relationships. In this case, we’ll use a smaller value of rho
(0.65) and we’ll drop any nodes that have zero or only one connection:
This is also interesting. We can see here certain stocks bridging the gaps between clusters. For instance, we have BHP (an Australian natural resources company listed as an ADR in the US) connected to other mining stocks, oil and gas companies and industrial manufacturers, such as CAT, which produces a lot of heavy equipment used in BHP’s vast operations.
We also see numerous connections to financial services companies, including basic materials companies, tech companies and industrials. Perhaps this is indicative of the central role of financial services in the modern economy.
Parting thoughts
Modern statistical learning techniques continue to transform the way in which we interact with data and the insights we can tease out. I find it quite astounding that the Graphical Lasso algorithm is able to take us from a noisy covariance matrix subject to all sorts of estimation errors, to a sparse matrix of partial correlations – the relationships between variables that remain when all the correlations with the other confounding variables have been stripped out.
One application is the universe selection problem in the context of equities pairs trading. Can you think of other use cases for the Graphical Lasso? I would love to hear them in the comments.
[thrive_leads id=’19617′]
Hi ,Kris. Thanks for the blog helping me to recap the graphical lasso. Is it possible to use it for feature selection, eliminating correlated features ?
Hey Steven, that’s an interesting use case. I think Graphical Lasso could provide some insight into this problem, as it would help you figure out which features were correlated given other confounding variables. I think from a feature selection perspective you’d be more interested in figuring out which variables are the confounding ones and why – Graphical Lasso could help fill in one piece of the puzzle, but you’d want to look at that particular problem from multiple angles.
Hii
Yes using GL as feature is good idea.Is GL substitude to mutual information?
Hi Kris, great post! Do you think this can be used to select the subset of less correlated assets for markowitz like “it is left as an excercise for the reader” 🙂 in this post of jcl ? https://financial-hacker.com/get-rich-slowly/
If so, how you would get the uncorrelated assets? Are the ones that do not have connections or the ones that didn’t even get plotted? Or both?
Thanks again for your content!
They’d be the ones that don’t have connections – ie zeros in the precision matrix. Although that’s not really going to help maximise anti-correlation – for that, I’d start with some simple algorithm that looked at pairwise corelations and took a subset that minimised their average.