Price data from Yahoo Finance in R – the Easy Way!

Traders typically have many ideas for trading strategies – more than they can ever implement in practice!

Therefore it’s useful to be able to move quickly in the early research phase. You want to disprove things as quickly as possible so that you can move onto the next thing.

Obviously there is immense value in reliable and easy data access. You don’t want to be wrangling large data sets every time you want to do a quick piece of analysis.

Yahoo Finance provides a decent historical stock price dataset that includes adjustments for dividends and corporate actions. A while back, they discontinued their API. But using some simple HTTP calls, we can pull that data directly into a research session.

The approach presented here is something I use regularly. It’s free, scales really well (up to Yahoo’s rate limits), and presents you with a tidy dataframe ready for analysis. It’s great for moving fast in the early phase of research.

The only caveat is that Yahoo could change things on their end at any time, making this obsolete. But for now, it works a treat.

First, load the libraries we need:

library(tidyverse)
library(purrr)
library(httr)
library(glue)

The following functions do all the heavy lifting.

The one you’ll call is yahoo_prices and you simply pass it a list of tickers, and a from_date and to_date in the format YYYY-MM-DD.

You can either stick these functions in a file and source them in your R session, or copy and paste them into a notebook environment. One day, I’ll put this and some other stuff I use regularly into an R package.

#' Parse a Yahoo! Finance prices unofficial API response object to dataframe
#' @param response: response object from call to https://query1.finance.yahoo.com/v7/finance/download
#' return dataframe of price data
parse_yahoo_prices <- function(response) {
  response %>%
    content(as = "text", encoding = "UTF-8") %>% 
    read.table(
      text = ., 
      sep=",", 
      fill = TRUE, 
      header = TRUE, 
      stringsAsFactors = FALSE
    ) %>%
    mutate(Date = as.Date(Date))
}

#' Get historical prices from Yahoo! Finance unofficial API for a single ticker
#' @param ticker: price ticker eg "TSLA"
#' @param from_date: start date in format "YYYY-MM-DD"
#' @param to_date: end date in format "YYYY-MM-DD"
#' @return dataframe of price data
single_ticker_prices_yahoo <- function(ticker, from_date, to_date = Sys.Date()) {
  base_url = "https://query1.finance.yahoo.com/v7/finance/download"

  period1 = as.numeric(as.POSIXct(from_date, format = "%Y-%m-%d"))
  period2 = as.numeric(as.POSIXct(to_date, format = "%Y-%m-%d"))

  resp <- httr::GET(
    glue::glue("{base_url}/{ticker}"), 
    query = list(
      period1 = period1, 
      period2 = period2, 
      interval = "1d", 
      events = "history", 
      includeAdjustedClose = "true"
    ))

  parse_yahoo_prices(resp)
}

#' Get historical prices from Yahoo! Finance unofficial API for a single ticker
#' @param tickers: list of tickers eg c("TSLA", "GOOG")
#' @param from_date: start date in format "YYYY-MM-DD"
#' @param to_date: end date in format "YYYY-MM-DD"
#' @return dataframe of price data
yahoo_prices <- function(tickers, from_date, to_date) {
  # Helper function for adding a Ticker column and arranging columns
  fun <- function(ticker, from_date = from_date, to_date = to_date) {
    single_ticker_prices_yahoo(
      ticker, 
      from_date = from_date, 
      to_date = to_date
    ) %>%
      mutate(Ticker = ticker) %>% 
      relocate(Ticker, .after = Date)
  }
    
  # get a long dataframe 
  # prefer combining into a long dataframe as we bind rows (ie use map_dfr) not 
  # columns - no problem if we have different number of rows per ticker
  tickers %>%
    map_dfr(~fun(ticker = .x, from_date = from_date, to_date = to_date)) %>%
    arrange(Date)
}

Here’s an example of how you’d use it and what the data it returns looks like. Notice that it returns a long (tidy) dataframe – perfect for analysis.

tickers <- c("TSLA", "AAPL", "CAT", "GOOG")
prices <- yahoo_prices(
    tickers, 
    from_date = "2020-01-01", 
    to_date = "2023-09-07"
)

head(prices)
A data.frame: 6 × 8
DateTickerOpenHighLowCloseAdj.CloseVolume
<date><chr><dbl><dbl><dbl><dbl><dbl><int>
12019-12-31TSLA 27.0000 28.08600 26.80533 27.88867 27.88867154285500
22019-12-31AAPL 72.4825 73.42000 72.38000 73.41250 71.61503100805600
32019-12-31CAT 147.4300148.23000146.78999147.67999135.14485 1952500
42019-12-31GOOG 66.5055 66.90000 66.45425 66.85100 66.85100 19236000
52020-01-02TSLA 28.3000 28.71333 28.11400 28.68400 28.68400142981500
62020-01-02AAPL 74.0600 75.15000 73.79750 75.08750 73.24902135480400

We can plot the data to make sure it looks as expected:

# Set chart options
options(repr.plot.width = 14, repr.plot.height=7)
theme_set(theme_bw())
theme_update(text = element_text(size = 20))

# plot
prices %>% 
  ggplot(aes(x = Date, y = Adj.Close)) +
  geom_line() +
  facet_wrap(~Ticker, ncol = 2) +
  labs(
    title = "Adjusted US stock prices"
  )

Thanks for reading and I hope these little functions speed up your research work.

6 thoughts on “Price data from Yahoo Finance in R – the Easy Way!”

    • Yahoo has a history of changing the API (which is unofficial and unsupported). Quantmod is great, but I prefer to have my own utility for this that I can modify when Yahoo changes again. But there’s absolutely nothing wrong with using Quantmod either.

      Reply

Leave a Comment