Data analysis plays a central role in making sense of financial markets. But how can you verify the conclusions others draw, or better yet, uncover your own insights?
Microsoft Excel remains one of the most powerful and accessible tools for financial data analysis, allowing anyone—from beginners to seasoned analysts—to explore data, test hypotheses, and make informed decisions.
Here’s an example of using Excel to analyse financial data, test conclusions, and uncover the hidden stories within the numbers.
Let’s get to it.
You may have seen a lot of charts like this recently:
That chart shows a proxy of S&P 500 “valuation” on the x-axis and the subsequent total returns of an S&P 500 investment on the y-axis.
The conclusions people tend to draw from this chart are:
- there is an obvious and strong relationship between valuation and expected future returns (cheap = good, expensive = bad)
- valuation estimates are currently historically high; therefore, expected returns of the S&P 500 are historically low.
We should always be wary of drawing strong conclusions from stuff people share on the internet or in sell-side research.
There are a few reasons to be sceptical of the strong conclusions people tend to make on seeing this:
- the chart might just be wrong (people screw up financial data analysis all the time)
- 10 years is a really long time horizon
- all of the 10-year total returns are actually positive
- why are there so many points? How may 10 year periods has the index even existed for?!
The good news is that, with a few simple skills, we don’t have to believe what randos on the internet say.
Even if we can’t write code, we can use Microsoft Excel and free online tools to investigate these things ourselves. We’re currently working on a new course to teach you how to do this.
In this particular case, it turns out Prof Robert Schiller from the Yale department of economics makes all the data we need to do this analysis available here: https://shillerdata.com/
The data looks like this:
One initial obvious observation is that the data goes back to 1871! The S&P 500 index officially began in 1957, so the first part of this dataset will be reconstructed from other available data. You can read the docs if you’re interested in how he did this. (I’m not particularly interested. I’m happy just knowing not to trust that data)
To recreate the chart above, we just need observations of:
- an S&P500 valuation metric (to plot on the x-axis)
- the total returns of the index over the subsequent 10-year period (to plot on the y-axis)
We’re going to use the Cyclically Adjusted Price Earnings Ratio for our valuation metric. (Because that’s what we have in the dataset, and every way we estimate valuation will look roughly the same.)
And we’re going to use the 10-year Annualized Real (after inflation) return as our return measurement.
We can select those columns in Excel and insert a scatterplot. It looks like this:
It looks similar to the chart we were trying to recreate.
Ours goes negative on the y-axis because we’re plotting real (after inflation) total returns, whereas i assume the original chart is just plotting total returns of the index.
Now we can start to pull this apart if we want. And Excel is pretty good for this kind of exploratory analysis.
The first thing we can address is that many observations are basically the same.
Each of those points is a monthly observation of the valuation proxy (x-axis), and S&P500 returns ten years following that observation (y-axis).
Ten years is 120 months. So, subsequent observations in our data will have 119 months in common.
Our dataset has 1726 monthly data points, but most of them are basically the same thing.
We can instead restrict our observations to unique non-overlapping ones like this:
This requires reducing our data down to only the unique data points. That’s not hard, but it involves a little bit of work. So first we can do something similar and easier by reducing the length of our return window.
Our observations are monthly, so if we plot next month’s returns vs CAPE, then all our observations are unique. That’s gonna look a lot noisier (because anything will happen in a month), but we might still expect to see an effect if one exists.
Essentially, we are doing this:
To do this, we need to calculate next month’s excess real returns in the table our data lives in and plot it against CAPE, as before.
Unsurprisingly, that looks like a big blob. (Anything with monthly returns on the y-axis will look like a big blob.)
But there is a weak negative linear relationship there.
Now we’ve done the easy thing, we can collect more evidence by looking at the slightly harder thing. We can look at longer non-overlapping periods. Let’s keep with the 10-year forward window and look at decades.
To do this, we can use the Excel data model to pull our data from the Shiller website and then manipulate and filter it.
We pull the data from here and use the Excel data model to clean and format the columns correctly.
This will populate a table in Excel with the Schiller dataset that we can manipulate.
When the dataset is refreshed, we can refresh our spreadsheet, and it will pull and process the latest data for us.
Now, we can pull that data into a different data model, add the decade to each observation, and filter the data so we only get the first observation each decade.
Now we have unique observations.
The problem is that we now only have 15 observations! Ten years is a long time, and we simply don’t have that many unique non-overlapping ten-year periods.
And we certainly don’t have many unique non-overlapping ten-year periods that are similar to the current market structure and competitive environment.
But we never have enough data, and we have to do our best. So, let’s recreate that chart with our 15 non-overlapping 10-year observations.
We’d be a lot less sure about the effect, looking at this scatterplot. But it does still go down and to the left!
We can do the same thing with annual return observations. (We get 10 times as many points doing this.)
We keep seeing evidence of a weak predictive effect that makes sense.
It’s nowhere near as strong as the original scatterplot suggested – but we’d probably bet on it being a thing.
Now, whenever we see an effect, we should ask what other than our pet theory might be causing that effect to appear. In particular a lot has changed over that time period. The market looks nothing like what it did in 1900 today.
And, indeed, if we plot a time series of our valuation metric, it looks kinda drifty.
(Excuse the ugly x-axis. I’m lazy.)
It’s not really reasonable, I don’t think, to assume that CAPE 20 would “mean” the same thing in 2024 as it did in 1900.
One cheap and dirty way we can make that metric a bit less drifty and more comparable over time is to standardize it by its values over a recent rolling window.
For example, here I’ve standardized it as a 10yr rolling score. (Not necessarily cos I think that’s the right thing to do – I just want to make a point).
Now it looks a lot more stationary. It stays in the same range. It doesn’t drift off. This is unsurprising cos we forced it to look like that.
Now, we can plot our next year’s returns vs this standardized zscore.
If we still see an effect when we do this, it would make us more confident in the valuation effect. If we don’t, it won’t destroy our confidence because we’ve made some pretty arbitrary and dubious scaling choices here.
Indeed, at least with this scaling choice, we don’t see the effect we are looking for.
That’s ok. That’s the nature of work like this. We’re just exploring, trying to break things. We try to look at things from as many different angles as we can and see how much of the limited evidence lines up.
Excel is great for this kind of work because it’s very tactile and hands-on. We can filter our data to explore what these scatterplots look like for different periods in our dataset. We’re just trying to collect as many views as we can, weighing them up.
What do I think?
I think the evidence (and economic sense) supports the idea that high valuations are correlated with lower expected returns. But it’s nowhere near as clear-cut as the initial scatterplot suggests. We simply don’t have enough data, and the market is constantly changing underneath us, making it hard for us to draw strong inferences.
Would you sit out of stock index investments based on this? I wouldn’t.