In this post, we explain how to use SQL to extend back ETF price data with total return data from mutual funds or indexes.
On Zero to Robot Master Bootcamp, we teach how to build a portfolio of three automated systematic trading strategies.
One of them is a long term Risk Premia Harvesting strategy which trades asset class ETFs.
ETFs are useful instruments for analysing long term (tradeable) performance of various asset classes – but many have been introduced only relatively recently and have limited data available.
It’s great to analyse the ETF data (cos that’s what we’re going to trade) but it’s also useful to look at longer-term history.
To this end, we’re using mutual fund and index data (adjusted for estimated expenses and other trading frictions) to “extend back” our ETF price data.
It’s often useful to be able to do this kind of transformation “in place”, so here’s how we did it in SQL.
Extending ETF Price data using SQL
We’re going to extend ETF data as follows:
- US Equities – Extend VTI with VFINX mutual fund
- EMEA Equities – Extend VEA with total returns from XMSC MSCI Index and subtract expected expenses
- EM Equities – Extend VWO with total returns from MXEA MSCI Index
- US Treasury notes – Extend IEF with VFITX mutual fund
- US Treasury bonds – Extend TLT with VUSTX mutual fund
- Emerging market bonds – Extend EMB with returns from T Row Price EM Bond fund (the most tenuous extension)
- Gold – Extend GLD with total returns from the GCSI Gold Index, extended back further with cost adjusted spot prices.
Our ETF price data is structured like this:
Each row represents a day of split- and dividend-adjusted OHLC price data for a given ticker.
The data we are going to use to extend back these prices are structured similarly…
Again, we have one row for each index per day.
returns
contains the periodic simple returns for the index.cumreturns
contains the cumulative returns for the index, calculated ascumprod(1+returns)
For clarity, here’s cumreturns
plotted for the stock indexes in Google Data Studio.
You can see all series start at 1 in 1995.
Now let’s extend our ETF data back.
- Get the first traded date for the ETFs
- Calculate the ratio of the close price on the first date, with
cumreturns
on the index on that date - Multiply
cumreturns
by this factor on the index data - Append the index data to the ETF prices.
Here’s the SQL code:
-- Map the ETFs with the indexes to extend them WITH extend AS ( SELECT 'VTI' as ETFticker, 'EQ_US' as indexticker UNION ALL SELECT 'VEA' as ETFticker, 'EQ_NONUS_DEV' as indexticker UNION ALL SELECT 'VWO' as ETFticker, 'EQ_EMER' as indexticker UNION ALL SELECT 'GLD' as ETFticker, 'GOLD' as indexticker UNION ALL SELECT 'IEF' as ETFticker, 'TN_US' as indexticker UNION ALL SELECT 'TLT' as ETFticker, 'TB_US' as indexticker UNION ALL SELECT 'EMB' as ETFticker, 'EQ_EMER' as indexticker ), -- Get the first traded date we have for the ETFs mindates AS ( SELECT ticker, MIN(date) as mindate FROM `rw-algotrader.master_daily_adj.ETF_d*` WHERE ticker in (SELECT ETFticker FROM extend) AND date >= '1990-01-01' GROUP BY ticker ), -- Get the overlap prices and scaling factor for the index data overlapprice AS ( SELECT ex.ETFticker, ex.indexticker, md.mindate as overlapdate, ep.close as etfclose, ip.cumreturns as indexclose, ep.close / ip.cumreturns as scalingfactor FROM `rw-algotrader.master_daily_adj. ETF*` ep INNER JOIN mindates md ON ep.ticker = md.ticker AND ep.date = md.mindate INNER JOIN extend ex ON ep.ticker = ex.ETFticker INNER JOIN `rw-algotrader.load_assetclass.assetclass_indexes` ip ON ex.indexticker = ip.ticker AND md.mindate = ip.date WHERE ep.date >= '1990-01-01' ) select ticker, date, open, high, low, close, volume FROM ( -- ETF prices SELECT ticker, date, open, high, low, close, volume FROM `rw-algotrader.master_daily_adj.ETF_d*` ep WHERE ticker IN (SELECT ETFticker FROM extend) AND date > '1990-01-01' UNION ALL -- Extend back with scaled historic cumulative returns select ov.ETFticker as ticker, ip.date, ip.cumreturns * ov.scalingfactor as open, ip.cumreturns * ov.scalingfactor as high, ip.cumreturns * ov.scalingfactor as low, ip.cumreturns * ov.scalingfactor as close, 0 as volume FROM `rw-algotrader.load_assetclass.assetclass_indexes` ip INNER JOIN overlapprice ov ON ip.ticker = ov.indexticker WHERE ip.date < ov.overlapdate ) x ORDER BY x.date
Let’s plot the extended data:
We now have a total return series which:
- Matches the (adjusted) close price of the ETF for the period in which the ETF was available
- Extends the series back with mutual fund or (cost-adjusted) index data.
Conclusion
It is extremely useful to be table to do basic financial data manipulation in SQL.
Code has been provided to allow you to extend ETF prices back in time with total return data from another source.
If you enjoyed this you’ll probably like these too…
How to Fill Gaps in Large Stock Data Universes Using tidyr and dplyr