Traders typically have many ideas for trading strategies – more than they can ever implement in practice!
Therefore it’s useful to be able to move quickly in the early research phase. You want to disprove things as quickly as possible so that you can move onto the next thing.
Obviously there is immense value in reliable and easy data access. You don’t want to be wrangling large data sets every time you want to do a quick piece of analysis.
Yahoo Finance provides a decent historical stock price dataset that includes adjustments for dividends and corporate actions. A while back, they discontinued their API. But using some simple HTTP calls, we can pull that data directly into a research session.
The approach presented here is something I use regularly. It’s free, scales really well (up to Yahoo’s rate limits), and presents you with a tidy dataframe ready for analysis. It’s great for moving fast in the early phase of research.
The only caveat is that Yahoo could change things on their end at any time, making this obsolete. But for now, it works a treat.
First, load the libraries we need:
library(tidyverse)
library(purrr)
library(httr)
library(glue)
The following functions do all the heavy lifting.
The one you’ll call is yahoo_prices
and you simply pass it a list of tickers, and a from_date
and to_date
in the format YYYY-MM-DD
.
You can either stick these functions in a file and source
them in your R session, or copy and paste them into a notebook environment. One day, I’ll put this and some other stuff I use regularly into an R package.
#' Parse a Yahoo! Finance prices unofficial API response object to dataframe
#' @param response: response object from call to https://query1.finance.yahoo.com/v7/finance/download
#' return dataframe of price data
parse_yahoo_prices <- function(response) {
response %>%
content(as = "text", encoding = "UTF-8") %>%
read.table(
text = .,
sep=",",
fill = TRUE,
header = TRUE,
stringsAsFactors = FALSE
) %>%
mutate(Date = as.Date(Date))
}
#' Get historical prices from Yahoo! Finance unofficial API for a single ticker
#' @param ticker: price ticker eg "TSLA"
#' @param from_date: start date in format "YYYY-MM-DD"
#' @param to_date: end date in format "YYYY-MM-DD"
#' @return dataframe of price data
single_ticker_prices_yahoo <- function(ticker, from_date, to_date = Sys.Date()) {
base_url = "https://query1.finance.yahoo.com/v7/finance/download"
period1 = as.numeric(as.POSIXct(from_date, format = "%Y-%m-%d"))
period2 = as.numeric(as.POSIXct(to_date, format = "%Y-%m-%d"))
resp <- httr::GET(
glue::glue("{base_url}/{ticker}"),
query = list(
period1 = period1,
period2 = period2,
interval = "1d",
events = "history",
includeAdjustedClose = "true"
))
parse_yahoo_prices(resp)
}
#' Get historical prices from Yahoo! Finance unofficial API for a single ticker
#' @param tickers: list of tickers eg c("TSLA", "GOOG")
#' @param from_date: start date in format "YYYY-MM-DD"
#' @param to_date: end date in format "YYYY-MM-DD"
#' @return dataframe of price data
yahoo_prices <- function(tickers, from_date, to_date) {
# Helper function for adding a Ticker column and arranging columns
fun <- function(ticker, from_date = from_date, to_date = to_date) {
single_ticker_prices_yahoo(
ticker,
from_date = from_date,
to_date = to_date
) %>%
mutate(Ticker = ticker) %>%
relocate(Ticker, .after = Date)
}
# get a long dataframe
# prefer combining into a long dataframe as we bind rows (ie use map_dfr) not
# columns - no problem if we have different number of rows per ticker
tickers %>%
map_dfr(~fun(ticker = .x, from_date = from_date, to_date = to_date)) %>%
arrange(Date)
}
Here’s an example of how you’d use it and what the data it returns looks like. Notice that it returns a long (tidy) dataframe – perfect for analysis.
tickers <- c("TSLA", "AAPL", "CAT", "GOOG")
prices <- yahoo_prices(
tickers,
from_date = "2020-01-01",
to_date = "2023-09-07"
)
head(prices)
Date | Ticker | Open | High | Low | Close | Adj.Close | Volume | |
---|---|---|---|---|---|---|---|---|
<date> | <chr> | <dbl> | <dbl> | <dbl> | <dbl> | <dbl> | <int> | |
1 | 2019-12-31 | TSLA | 27.0000 | 28.08600 | 26.80533 | 27.88867 | 27.88867 | 154285500 |
2 | 2019-12-31 | AAPL | 72.4825 | 73.42000 | 72.38000 | 73.41250 | 71.61503 | 100805600 |
3 | 2019-12-31 | CAT | 147.4300 | 148.23000 | 146.78999 | 147.67999 | 135.14485 | 1952500 |
4 | 2019-12-31 | GOOG | 66.5055 | 66.90000 | 66.45425 | 66.85100 | 66.85100 | 19236000 |
5 | 2020-01-02 | TSLA | 28.3000 | 28.71333 | 28.11400 | 28.68400 | 28.68400 | 142981500 |
6 | 2020-01-02 | AAPL | 74.0600 | 75.15000 | 73.79750 | 75.08750 | 73.24902 | 135480400 |
We can plot the data to make sure it looks as expected:
# Set chart options
options(repr.plot.width = 14, repr.plot.height=7)
theme_set(theme_bw())
theme_update(text = element_text(size = 20))
# plot
prices %>%
ggplot(aes(x = Date, y = Adj.Close)) +
geom_line() +
facet_wrap(~Ticker, ncol = 2) +
labs(
title = "Adjusted US stock prices"
)
Thanks for reading and I hope these little functions speed up your research work.
Why don’t you use the getSymbols function from the quantmod package?
It already comes as a time series object in xts & zoo classes….
quantmod::getSymbols()
https://www.quantmod.com/documentation/getSymbols.html
Yahoo has a history of changing the API (which is unofficial and unsupported). Quantmod is great, but I prefer to have my own utility for this that I can modify when Yahoo changes again. But there’s absolutely nothing wrong with using Quantmod either.