This post presents an analysis of the SPY returns process using the QuantConnect research platform.
QuantConnect is a strategy development platform that lets you research ideas, import data, create algorithms, and trade in the cloud, all in one place. For this research, I’ve used their online research notebook, and it came preinstalled with all the libraries and data (intraday) I needed to complete the analysis at no cost. It’s very impressive indeed.
I love those predictions of the future from the past. It’s funny how the people viewed the future back then. You can see how their preconceived notions on the world were simply extrapolated forward into the future.
Consider a flying wooden car. They guessed that cars would have to be flying by then (Elon, do it already!) but they missed the fact that wood will no longer be the preferred material for building cars. Take look at this excellent article from the Independent.
The tale of beginner traders can be similar. They are deeply entrenched in their beliefs of how the market works, and they try to extrapolate based on those beliefs, never once questioning if they were right in the first place.
- Noob trader: I need to find the optimal ATR levels to place my trailing stop losses
- Quant trader: I need to research if the assumptions of a stop loss make sense in my trading strategy
Let’s see how a quant might approach starting to analyse a returns process…
The Backtest Cycle of Doom
When you’re doing research in financial data, you shouldn’t start from the perspective of trying to find a tradable strategy. Not right away, anyway. That can quickly lead you into what we call “The Backtest Cycle of Doom”.
You’ll eventually find a set of rules that made pretend money in your pretend trading, but almost certainly it won’t be anything but an overfit strategy. That’s not how a quant would approach the research phase.
A quant trader would start researching just for the sake of understanding, not goal seeking. They are trying to understanding what drives the returns process, and they’re fully aware that they’ll never completely understand it.
The Holy Grail Delusion
Beginner traders often think that the big banks the top hedge funds, the Illuminati, etc are hiding some sort of hidden algorithm that predicts the markets at will and prints money on demand.
A good quant trader knows that any edge they might have in the market is very noisy and will only manifest itself after a long period of time, so they stop chasing the holy grail, and they diversify sensibly across many different edges.
What follows is an example of basic analysis of the SPY returns process…
Example Research With QuantConnect Code
Using the Quantconnect ecosystem in a typical quant workflow.
We will be looking at:
- Distribution of returns
- Comparing SPY returns to a normal distribution
- Researching possible seasonal patters in SPY returns
- Auto-correlation of returns
Note: This code is meant to be used within QuantConnect research environment
# Import dependecies import numpy as np import seaborn as sns import matplotlib.pyplot as plt plt.style.use('ggplot') #There is a positive correlation between chart pretiness and risk-adjusted returns plt.rcParams['figure.figsize'] = [10, 7] # QuantBook Analysis Tool # Load SPY historical data qb = QuantBook() spy = qb.AddEquity("SPY") history = qb.History(qb.Securities.Keys, 5000, Resolution.Daily) #5000 days of SPY daily data # Drop pandas level history = history.reset_index().drop('symbol',axis=1) # Calculate SPY returns and fillna history['returns'] = (history['close'].pct_change() * 100).fillna(0)
1. Analysing the return distribution
Now that we have SPY daily returns let’s quickly see what we’re dealing with.
count 5000.000000 mean 0.030071 std 1.235997 min -11.638806 25% -0.443536 50% 0.061797 75% 0.573180 max 11.360371 Name: returns, dtype: float64
Let’s look at the extreme values of returns ie max and min
history[history['returns'] == min(history['returns'])]
history[history['returns'] == max(history['returns'])]
The recent corona drawdown is the biggest single-day market drop in history, and we have the biggest up move in 2008.
Let’s look at the distribution of daily returns for the SPY
sns.distplot(history['returns'],label='Distribution of SPY returns') plt.legend()
One thing that really stands out in financial assets returns distributions is the tails (read high kurtosis if you are a nerd). This will become more evident once we compare it to a true normal distribution.
2. Comparing to a normal distribution
Let’s first create some random data and plot their distribution
random = np.random.normal(scale=1.23,size=500000) sns.distplot(random,label='Returns sampled from normal distribution',color='blue') plt.legend() random_series = pd.Series(random)
There it is, a beautiful well behaved normal distribution, Let’s see how this compares to our SPY returns distribution.
sns.distplot(history['returns'],label='Distribution of SPY returns') sns.distplot(random,label='Returns sampled from normal distribution') plt.legend()
Now the high kurtosis of the SPY returns becomes even more apparent.
So far we’ve learned that:
- SPY returns do resemble random returns
- but they have big tails in their distribution
- which means we can expect outsized moves to the upside and downside, more so than a normal distribution would suggest.
Now let’s look at a simple workflow for researching, seasonal patterns in our financial data.
3. Researching possible seasonal patterns
So far we’ve looked at daily resolution returns data.
Now let’s look at hourly resolution data. Usually, this would cost you a bit of time and effort to get hold of, but it’s not a problem if you’re using QuantConnect.
spy_hour = qb.History(qb.Securities.Keys, timedelta(days=5000), Resolution.Hour) #5000 days of SPY hourly data spy_hour = spy_hour.reset_index().drop('symbol',axis=1) #Explode datetime into hours spy_hour['hour'] = spy_hour['time'].apply(lambda x: x.hour) #Calculate Hourly returns spy_hour['returns'] = (spy_hour['close'].pct_change() * 100).fillna(0) spy_hour.head()
Looking at Close to Close returns it seems to be a good idea to go long overnight, there are a few research papers that go into this effect in-depth.
Here is one that we’ve looked at https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3546173 in our Trading Through Armageddon Bootcamp
If we use Open to Close returns the overnight effect from the hour 10 bar should not be there anymore
spy_hour['OC'] = (spy_hour['close']-spy_hour['open'])/spy_hour['open'] * 100 sns.barplot(x='hour',y='OC',data=spy_hour)
Using Open to Close returns we can see that the effect on the 10-hour bar vanishes, this is a good indicator that what we saw previously was an effect of the excess overnight return anomaly
A more intuitive way for traders to visualize this would be to plot the cumulative performance from the overnight returns and those of intraday
Keep in mind, this part is not there to make a trading strategy, it’s simply a more intuitive way of visualizing the effect.
spy_hour.query("hour == 10")['returns'].cumsum().plot(label='Overnight Returns') spy_hour.query("hour != 10")['returns'].cumsum().plot(label='Intraday Returns') plt.legend()
This indicates that there might be something there, it’s definitely interesting to see that the SPY has gained more value outside trading hours than during.
So that seemed to work pretty well, but how will you know when it didn’t?
#Remember our Daily dataset history['DOM'] = history['time'].apply(lambda x:x.day) sns.barplot(x='DOM',y='returns',data=history)
Now if you squint really hard, and close one eye you might see a pattern here, and there might very well be one, but I find it hard to get excited about this.
To me it looks pretty random, there is no wrong answer here you could try and dig in further into this one, but keep in mind that since you are dealing with day-of-month “seasonality” you will have much fewer data points to base your decisions on.
4. Looking for auto-correlation (trend) in the return process
Autocorrelation is the degree of similarity between a time series and a lagged version of itself over successive time intervals
Essentially, what traders call a “trend” should be visible in an ACF plot as a positive relation of the time series with its lagged version
from statsmodels.graphics.tsaplots import plot_acf from statsmodels.tsa.stattools import acf acf_plot = plot_acf(history['returns'])
Over the whole dataset, we see negative auto-correlation at lag 1. This implies that negative returns have tended to be followed by positive returns, and vice versa… at least on average over all our data. Everything else looks like a wash.
I don’t really love summary plots… I like viewing time series…
So I want to calculate the lag 1 Autocorrelation over a lookback window and see if the AC structure changes at all over time. We have a post geared towards beginners on how windowing works, check it out here
history['rolling_lag_1'] = history['returns'].rolling(window=100).apply(lambda x:acf(x),raw=True) history['rolling_lag_1'].plot(label='Rolling 100 window ACF lag 1') plt.axhline(history['rolling_lag_1'].mean(),label='Mean',linestyle='--',color='green') plt.legend()
We can see that the rolling autocorrelation of the 1st lag is all but fixed, it seems to vary.
Visually inspecting the plot there seems to be some relation to how much the 1st lag autocorrelation diverges from 0 and Major stock market events
A very good proxy for major stock market events would be the market volatility itself, so it might be worth investigating if there is any relationship between volatility and rolling autocorrelation of returns
Now let’s investigate if there is any relationship between volatility and rolling lag 1 autocorrelation
history['annVol'] = history['returns'].rolling(window=100).std() * np.sqrt(252) history['annVol'].plot()
Now that we have our volatility we can run a scatter plot on it and see how it relates to rolling autocorrelation
There seems to be a negative relationship between Volatility of SPY and it’s rolling lag 1 autocorrelation.
Don’t expect to see a smack in the face screaming effect in finance, it’s going to be subtle at best, and not there at worst.
But let’s see how this relationship held up through time.
We’ll now plot the same thing, but this time subsetting our data, so it doesn’t get skewed by a few major events. The way you subset your data is a bit ad-hoc, just remember that there is a trade-off between the number of subsets you make and the level of noise in your limited dataset, here I’m just using 5 year periods.
As we can expect this is a noisy relationship, but it does seem to exist, and it’s quite persistent.
Here’s what we can summarize from this post.
- Always check your assumptions and beliefs
- Make as few assumptions as possible
- Stop searching for the perfect edge
- Approach the research with the intent of understanding, not finding a good backtest
- We see potentially interesting daily auto-correlation effects in SPY and an interesting relationship to volatility.