How to extend ETF prices with mutual fund data using SQL

In this post, we explain how to use SQL to extend back ETF price data with total return data from mutual funds or indexes.

 

On Zero to Robot Master Bootcamp, we teach how to build a portfolio of three automated systematic trading strategies.

One of them is a long term Risk Premia Harvesting strategy which trades asset class ETFs.

ETFs are useful instruments for analysing long term (tradeable) performance of various asset classes – but many have been introduced only relatively recently and have limited data available.

It’s great to analyse the ETF data (cos that’s what we’re going to trade) but it’s also useful to look at longer-term history.

To this end, we’re using mutual fund and index data (adjusted for estimated expenses and other trading frictions) to “extend back” our ETF price data.

It’s often useful to be able to do this kind of transformation “in place”, so here’s how we did it in SQL.

Extending ETF Price data using SQL

We’re going to extend ETF data as follows:

  • US Equities – Extend VTI with VFINX mutual fund
  • EMEA Equities – Extend VEA with total returns from XMSC MSCI Index and subtract expected expenses
  • EM Equities – Extend VWO with total returns from MXEA MSCI Index
  • US Treasury notes – Extend IEF with VFITX mutual fund
  • US Treasury bonds – Extend TLT with VUSTX mutual fund
  • Emerging market bonds – Extend EMB with returns from T Row Price EM Bond fund (the most tenuous extension)
  • Gold – Extend GLD with total returns from the GCSI Gold Index, extended back further with cost adjusted spot prices.

Our ETF price data is structured like this:

Each row represents a day of split- and dividend-adjusted OHLC price data for a given ticker.

The data we are going to use to extend back these prices are structured similarly…

Again, we have one row for each index per day.

  • returns contains the periodic simple returns for the index.
  • cumreturns contains the cumulative returns for the index, calculated as cumprod(1+returns)

For clarity, here’s cumreturns plotted for the stock indexes in Google Data Studio.

You can see all series start at 1 in 1995.

Now let’s extend our ETF data back.

  • Get the first traded date for the ETFs
  • Calculate the ratio of the close price on the first date, with cumreturns on the index on that date
  • Multiply cumreturns by this factor on the index data
  • Append the index data to the ETF prices.

Here’s the SQL code:

-- Map the ETFs with the indexes to extend them
WITH extend AS (
  SELECT 'VTI' as ETFticker, 'EQ_US' as indexticker UNION ALL
  SELECT 'VEA' as ETFticker, 'EQ_NONUS_DEV' as indexticker UNION ALL
  SELECT 'VWO' as ETFticker, 'EQ_EMER' as indexticker UNION ALL
  SELECT 'GLD' as ETFticker, 'GOLD' as indexticker UNION ALL
  SELECT 'IEF' as ETFticker, 'TN_US' as indexticker UNION ALL
  SELECT 'TLT' as ETFticker, 'TB_US' as indexticker UNION ALL
  SELECT 'EMB' as ETFticker, 'EQ_EMER' as indexticker
),

-- Get the first traded date we have for the ETFs
mindates AS (
  SELECT 
    ticker, 
    MIN(date) as mindate
  FROM `rw-algotrader.master_daily_adj.ETF_d*` 
  WHERE ticker in (SELECT ETFticker FROM extend)
    AND date >= '1990-01-01'
  GROUP BY ticker
),

-- Get the overlap prices and scaling factor for the index data
overlapprice AS (
  SELECT 
    ex.ETFticker,
    ex.indexticker,
    md.mindate as overlapdate,
    ep.close as etfclose,
    ip.cumreturns as indexclose,
    ep.close / ip.cumreturns as scalingfactor
   FROM `rw-algotrader.master_daily_adj. ETF*` ep
   INNER JOIN mindates md ON ep.ticker = md.ticker AND ep.date = md.mindate
   INNER JOIN extend ex ON ep.ticker = ex.ETFticker
   INNER JOIN `rw-algotrader.load_assetclass.assetclass_indexes` ip ON ex.indexticker = ip.ticker AND md.mindate = ip.date
   WHERE ep.date >= '1990-01-01' 
)

select 
  ticker,
  date,
  open,
  high,
  low,
  close,
  volume
FROM (
  
  -- ETF prices
  SELECT 
    ticker,
    date,
    open,
    high,
    low,
    close,
    volume
  FROM `rw-algotrader.master_daily_adj.ETF_d*` ep
  WHERE ticker IN (SELECT ETFticker FROM extend) 
    AND date > '1990-01-01'

  UNION ALL
  
  -- Extend back with scaled historic cumulative returns
  select
    ov.ETFticker as ticker,
    ip.date,
    ip.cumreturns * ov.scalingfactor as open,
    ip.cumreturns * ov.scalingfactor as high,
    ip.cumreturns * ov.scalingfactor as low,
    ip.cumreturns * ov.scalingfactor as close,
    0 as volume
  FROM `rw-algotrader.load_assetclass.assetclass_indexes` ip
  INNER JOIN overlapprice ov ON ip.ticker = ov.indexticker
  WHERE ip.date < ov.overlapdate 
) x
ORDER BY x.date

Let’s plot the extended data:

We now have a total return series which:

  • Matches the (adjusted) close price of the ETF for the period in which the ETF was available
  • Extends the series back with mutual fund or (cost-adjusted) index data.

Conclusion

It is extremely useful to be table to do basic financial data manipulation in SQL.

Code has been provided to allow you to extend ETF prices back in time with total return data from another source.

If you enjoyed this you’ll probably like these too…

How To Get Historical S&P 500 Constituents Data For Free

Financial Data Manipulation in dplyr for Quant Traders

How to Fill Gaps in Large Stock Data Universes Using tidyr and dplyr

Leave a Comment