In this post, we explain how to use SQL to extend back ETF price data with total return data from mutual funds or indexes.
On Zero to Robot Master Bootcamp, we teach how to build a portfolio of three automated systematic trading strategies.
One of them is a long term Risk Premia Harvesting strategy which trades asset class ETFs.
ETFs are useful instruments for analysing long term (tradeable) performance of various asset classes – but many have been introduced only relatively recently and have limited data available.

It’s great to analyse the ETF data (cos that’s what we’re going to trade) but it’s also useful to look at longer-term history.
To this end, we’re using mutual fund and index data (adjusted for estimated expenses and other trading frictions) to “extend back” our ETF price data.
It’s often useful to be able to do this kind of transformation “in place”, so here’s how we did it in SQL.
Extending ETF Price data using SQL
We’re going to extend ETF data as follows:
- US Equities – Extend VTI with VFINX mutual fund
- EMEA Equities – Extend VEA with total returns from XMSC MSCI Index and subtract expected expenses
- EM Equities – Extend VWO with total returns from MXEA MSCI Index
- US Treasury notes – Extend IEF with VFITX mutual fund
- US Treasury bonds – Extend TLT with VUSTX mutual fund
- Emerging market bonds – Extend EMB with returns from T Row Price EM Bond fund (the most tenuous extension)
- Gold – Extend GLD with total returns from the GCSI Gold Index, extended back further with cost adjusted spot prices.
Our ETF price data is structured like this:

Each row represents a day of split- and dividend-adjusted OHLC price data for a given ticker.
The data we are going to use to extend back these prices are structured similarly…

Again, we have one row for each index per day.
returnscontains the periodic simple returns for the index.cumreturnscontains the cumulative returns for the index, calculated ascumprod(1+returns)
For clarity, here’s cumreturns plotted for the stock indexes in Google Data Studio.

You can see all series start at 1 in 1995.
Now let’s extend our ETF data back.
- Get the first traded date for the ETFs
- Calculate the ratio of the close price on the first date, with
cumreturnson the index on that date - Multiply
cumreturnsby this factor on the index data - Append the index data to the ETF prices.
Here’s the SQL code:
-- Map the ETFs with the indexes to extend them
WITH extend AS (
SELECT 'VTI' as ETFticker, 'EQ_US' as indexticker UNION ALL
SELECT 'VEA' as ETFticker, 'EQ_NONUS_DEV' as indexticker UNION ALL
SELECT 'VWO' as ETFticker, 'EQ_EMER' as indexticker UNION ALL
SELECT 'GLD' as ETFticker, 'GOLD' as indexticker UNION ALL
SELECT 'IEF' as ETFticker, 'TN_US' as indexticker UNION ALL
SELECT 'TLT' as ETFticker, 'TB_US' as indexticker UNION ALL
SELECT 'EMB' as ETFticker, 'EQ_EMER' as indexticker
),
-- Get the first traded date we have for the ETFs
mindates AS (
SELECT
ticker,
MIN(date) as mindate
FROM `rw-algotrader.master_daily_adj.ETF_d*`
WHERE ticker in (SELECT ETFticker FROM extend)
AND date >= '1990-01-01'
GROUP BY ticker
),
-- Get the overlap prices and scaling factor for the index data
overlapprice AS (
SELECT
ex.ETFticker,
ex.indexticker,
md.mindate as overlapdate,
ep.close as etfclose,
ip.cumreturns as indexclose,
ep.close / ip.cumreturns as scalingfactor
FROM `rw-algotrader.master_daily_adj. ETF*` ep
INNER JOIN mindates md ON ep.ticker = md.ticker AND ep.date = md.mindate
INNER JOIN extend ex ON ep.ticker = ex.ETFticker
INNER JOIN `rw-algotrader.load_assetclass.assetclass_indexes` ip ON ex.indexticker = ip.ticker AND md.mindate = ip.date
WHERE ep.date >= '1990-01-01'
)
select
ticker,
date,
open,
high,
low,
close,
volume
FROM (
-- ETF prices
SELECT
ticker,
date,
open,
high,
low,
close,
volume
FROM `rw-algotrader.master_daily_adj.ETF_d*` ep
WHERE ticker IN (SELECT ETFticker FROM extend)
AND date > '1990-01-01'
UNION ALL
-- Extend back with scaled historic cumulative returns
select
ov.ETFticker as ticker,
ip.date,
ip.cumreturns * ov.scalingfactor as open,
ip.cumreturns * ov.scalingfactor as high,
ip.cumreturns * ov.scalingfactor as low,
ip.cumreturns * ov.scalingfactor as close,
0 as volume
FROM `rw-algotrader.load_assetclass.assetclass_indexes` ip
INNER JOIN overlapprice ov ON ip.ticker = ov.indexticker
WHERE ip.date < ov.overlapdate
) x
ORDER BY x.date
Let’s plot the extended data:

We now have a total return series which:
- Matches the (adjusted) close price of the ETF for the period in which the ETF was available
- Extends the series back with mutual fund or (cost-adjusted) index data.
Conclusion
It is extremely useful to be table to do basic financial data manipulation in SQL.
Code has been provided to allow you to extend ETF prices back in time with total return data from another source.
If you enjoyed this you’ll probably like these too…
How to Fill Gaps in Large Stock Data Universes Using tidyr and dplyr

