Optimising the rsims package for fast backtesting in R

Posted on Aug 20, 2021 by Kris Longmore
2 comments
157 Views

rsims is a new package for fast, quasi event-driven backtesting in R. You can find the source on GitHub, docs here, and an introductory blog post here.

Our use case for rsims was accurate but fast simulation of trading strategies. I’ve had a few questions about how I made the backtester as fast as it is – after all, it uses a giant for loop, and R is notoriously slow for such operations – so here’s a post about how I optimised rsims for speed.

Approach

I firstly wrote the backtester with a focus on ease of understanding. I wanted something that worked as expected, and that I could think about without too much effort. At this stage, I wasn’t thinking much about speed or efficiency.

Once I had that working, I focused on finding bottlenecks with profvis. It’s a good idea to resist the urge to optimise before you know where you’ll get the most bang for your buck (speaking from experience here).

Then I simply tackled those bottlenecks one at a time until I reached a point of diminishing returns (which is a little subjective and dependent on one’s use case).

There’s one thing I’d do differently if I were starting over: write the tests first, rather than after the fact. In hindsight, this would have saved a ton of time. To be honest, it’s something that I say after every little development effort, but this time I’ve really learned my lesson.

Profiling with profvis

The original code looked a lot different from the current version. It had lots of data frames and dplyr pipelines for operating on data:

positions_from_no_trade_buffer <- function(current_positions, current_prices, current_theo_weights, cap_equity, num_assets, trade_buffer) {
  current_weights <- current_positions*current_prices/cap_equity
  target_positions <- current_positions
  for(j in 1:num_assets) {
    if(is.na(current_theo_weights[j]) || current_theo_weights[j] == 0) {
      target_positions[j] <- 0
      next
    }
    # note: we haven't truncated to nearest whole coin, as coins are divisible (unlike shares)
    if(current_weights[j] < current_theo_weights[j] - trade_buffer) {
      target_positions[j] <- (current_theo_weights[j] - trade_buffer)*cap_equity/current_prices[j]
    } else if(current_weights[j] > current_theo_weights[j] + trade_buffer) {
      target_positions[j] <- (current_theo_weights[j] + trade_buffer)*cap_equity/current_prices[j]
    }
  }

  unlist(target_positions)
}

cash_backtest_original <- function(backtest_df_long, trade_buffer = 0., initial_cash = 10000, commission_pct = 0, capitalise_profits = FALSE) {

  # Create wide data frames 
  wide_prices <- backtest_df_long %>%
    pivot_wider(date, names_from = 'ticker', values_from = 'price')

  wide_theo_weights <- backtest_df_long %>%
    pivot_wider(date, names_from = 'ticker', values_from = 'theo_weight')

  # get tickers for later
  tickers <- colnames(wide_prices)[-1]

  # initial state
  num_assets <- ncol(wide_prices) - 1  # -1 for date column
  current_positions <- rep(0, num_assets)
  previous_theo_weights <- rep(0, num_assets)
  row_list <- list() 
  cash <- initial_cash

  # backtest loop
  for(i in 1:nrow(wide_prices)) {
    current_date <- wide_prices[i, 1] %>% pull() %>% as.Date()
    current_prices <- wide_prices[i, -1] %>% as.numeric()
    current_theo_weights <- wide_theo_weights[i, -1] %>% as.numeric()

    # update equity
    equity <- sum(current_positions * current_prices, na.rm = TRUE) + cash
    cap_equity <- ifelse(capitalise_profits, equity, min(initial_cash, equity))  # min reflects assumption that we don't top up strategy equity if in drawdown

    # update positions based on no-trade buffer
    target_positions <- positions_from_no_trade_buffer(current_positions, current_prices, current_theo_weights, cap_equity, num_assets, trade_buffer)

    # calculate position deltas, trade values and commissions
    trades <- target_positions - current_positions
    trade_value <- trades * current_prices
    commissions <- abs(trade_value) * commission_pct

    # adjust cash by value of trades
    cash <- cash - sum(trade_value, na.rm = TRUE) - sum(commissions, na.rm = TRUE)
    current_positions <- target_positions
    position_value <- current_positions * current_prices
    equity <- sum(position_value, na.rm = TRUE) + cash

    # Create data frame and add to list
    row_df <- data.frame(
       Ticker = c('Cash', tickers),
       Date = rep(current_date, num_assets + 1),
       Close = c(0, current_prices),
       Position = c(0, current_positions),
       Value = c(cash, position_value),
       Trades = c(-sum(trade_value), trades),
       TradeValue = c(-sum(trade_value), trade_value),
       Commission = c(0, commissions)
    )

    row_list[[i]] <- row_df

    previous_theo_weights <- current_theo_weights
  }

  # Combine list into dataframe
  bind_rows(row_list)
}

This version of cash_backtest takes a long data frame of prices and weights by date, which is a very convenient format for data analysis. We can make such a data frame of randomly generated prices and weights:

library(tidyverse)

# function for generating prices from GBM process
gbm_sim <- function(nsim = 100, t = 25, mu = 0, sigma = 0.1, S0 = 100, dt = 1./365) {
  # matrix of random draws - one for each day for each simulation
  epsilon <- matrix(rnorm(t*nsim), ncol = nsim, nrow = t)

  # get GBM paths
  gbm <- exp((mu - sigma * sigma / 2) * dt + sigma * epsilon * sqrt(dt)) - 1

  # convert to price paths
  gbm <- apply(rbind(rep(S0, nsim), gbm + 1), 2, cumprod)

  gbm
}

# generate prices and weights
years <- 20
universe <- 100
x <- 1
tickers <- vector() 
repeat{
  tickers[[x]] <- paste0(sample(LETTERS, 5, replace = TRUE), collapse = "")
  x <- x + 1
  if(x == universe + 1) 
    break
}
stopifnot(n_distinct(tickers) == universe)
date <- seq(as.numeric(as.Date("1980-01-01")), as.numeric(as.Date("1980-01-01"))+(years*365))
prices <- cbind(date, gbm_sim(nsim = universe, t = years*365, mu = 0.1, sigma = 0.1))
colnames(prices) <- c("date", tickers)
weights <- cbind(date, rbind(rep(0, universe), matrix(rnorm(years*365*universe), nrow = years*365)))
colnames(weights) <- c("date", tickers)

backtest_df_long <- prices %>% 
  as.data.frame() %>% 
  mutate(date = as.Date(date, origin ="1970-01-01")) %>% 
  pivot_longer(-date, names_to = "ticker", values_to = "price") %>% 
  left_join(
    weights %>% 
      as.data.frame() %>% 
      mutate(date = as.Date(date, origin ="1970-01-01")) %>% 
      pivot_longer(-date, names_to = "ticker", values_to = "theo_weight"),
    by = c("date", "ticker")
  )

head(backtest_df_long)
#> # A tibble: 6 x 4
#>   date       ticker price theo_weight
#>   <date>     <chr>  <dbl>       <dbl>
#> 1 1980-01-01 TVEZI    100           0
#> 2 1980-01-01 XIERO    100           0
#> 3 1980-01-01 XGYMU    100           0
#> 4 1980-01-01 PMVPF    100           0
#> 5 1980-01-01 KNCIP    100           0
#> 6 1980-01-01 JEBOY    100           0

backtest_df_long has prices and weights for 100 tickers over 7301 days:

backtest_df_long %>% 
  summarise(
    num_days = n_distinct(date),
    num_tickers = n_distinct(ticker)
  )
#> # A tibble: 1 x 2
#>   num_days num_tickers
#>      <int>       <int>
#> 1     7301         100

To get some insight into how quickly the backtest runs and where the bottlenecks are, profvis is your friend:

library(profvis)

profvis({cash_backtest_original(backtest_df_long)}, interval = 0.01)

(To get deeper insight, you can extract the function logic as a series of expressions and pass these directly to profvis – but this shortcut is fine for our purposes)

profvis tells us that cash_backtest_original took about 7.5 seconds to run and that most of the time was spent messing around with data frames:

In R, data frames are very convenient for doing data analysis, but they can be slow, particularly when you repeatedly index or subset them. Matrixes tend to be much faster, which suggests that we can get a big win from switching our data frames to matrixes.

There are some considerations. For example, a matrix can only store data of the same type, so we need to convert our convenient, human-readable date column to a Unix timestamp. Later, we’ll also rely on upstream data alignment and we’ll index by row number rather than date, which requires more work by the user, including checking that data alignment is in fact correct. Everything is a trade-off, no free lunches and all that.

Here’s how the code would look with the data frame to matrix conversions – the main changes are around storing each day’s trade information in a list of matrixes, then converting that back to a data frame at the end (this time I’ll pass the function contents directly to profvis as a series of expressions):

profvis({
  # Create matrixes (data.matrix automatically coerces dates to numeric Unix timestamps) 
  wide_prices <- backtest_df_long %>%
    pivot_wider(date, names_from = 'ticker', values_from = 'price') %>% 
    data.matrix()

  wide_theo_weights <- backtest_df_long %>%
    pivot_wider(date, names_from = 'ticker', values_from = 'theo_weight') %>% 
    data.matrix()

  # get tickers for later
  tickers <- colnames(wide_prices)[-1]

  # initial state
  num_assets <- ncol(wide_prices) - 1  # -1 for date column
  current_positions <- rep(0, num_assets)
  previous_theo_weights <- rep(0, num_assets)
  row_list <- list() 
  cash <- initial_cash

  # backtest loop
  for(i in 1:nrow(wide_prices)) {
    current_date <- wide_prices[i, 1]
    current_prices <- wide_prices[i, -1]
    current_theo_weights <- wide_theo_weights[i, -1]

    # update equity
    equity <- sum(current_positions * current_prices, na.rm = TRUE) + cash
    cap_equity <- ifelse(capitalise_profits, equity, min(initial_cash, equity))  # min reflects assumption that we don't top up strategy equity if in drawdown

    # update positions based on no-trade buffer
    target_positions <- positions_from_no_trade_buffer(current_positions, current_prices, current_theo_weights, cap_equity, num_assets, trade_buffer)

    # calculate position deltas, trade values and commissions
    trades <- target_positions - current_positions
    trade_value <- trades * current_prices
    commissions <- abs(trade_value) * commission_pct

    # adjust cash by value of trades
    cash <- cash - sum(trade_value, na.rm = TRUE) - sum(commissions, na.rm = TRUE)
    current_positions <- target_positions
    position_value <- current_positions * current_prices
    equity <- sum(position_value, na.rm = TRUE) + cash

    # store date in matrix as numeric (then convert date back to date format later)
    row_mat <- matrix(
      data = c(
        rep(as.numeric(current_date), num_assets + 1),
        c(1, current_prices),
        c(cash, current_positions),
        c(cash, position_value),
        c(-sum(trade_value), trades),
        c(-sum(trade_value), trade_value),
        c(0, commissions)
      ),
      nrow = num_assets + 1,
      ncol = 7,
      byrow = FALSE,
      dimnames = list(
        # tickers are row names
        c("Cash", tickers),
        # column names
        c("Date", "Close", "Position", "Value", "Trades", "TradeValue", "Commission")
      )
    )

    row_list[[i]] <- row_mat

    previous_theo_weights <- current_theo_weights
  } 
  # Combine list of matrixes into dataframe
  do.call(rbind, row_list) %>%
    tibble::as_tibble(rownames = "ticker") %>%
    dplyr::mutate(
      Date = as.Date(Date, origin ="1970-01-01")
    )
}, interval = 0.01)

On my machine, this took about 500 milliseconds, a 15x speedup.

According to profvis, there are a couple of minor bottlenecks. One is the storing of each day’s trade details in a list: row_list[[i]] <- row_mat.

A simple trick for speeding up this operation is to preallocate row_list by replacing row_list <- list() with row_list <- vector(mode = "list", length = nrow(wide_prices)).

This gave me a slight speedup overall (450 vs 500ms) and removed this as a bottleneck.

We all know that R plays really nicely with vectorisation. If we can make use of vectorised functions where possible, we can usually speed things up considerably. One place we can do this is the positions_from_no_trade_buffer function, which in its original form is a series of if-else statements.

A vectorised version, making use of R’s vectorised ifelse function would look like this:

positions_from_no_trade_buffer_vec <- function(current_positions, current_prices, current_theo_weights,cap_equity, num_assets, trade_buffer) {
  current_weights <- current_positions*current_prices/cap_equity
  target_positions <- ifelse(
    is.na(current_theo_weights) | current_theo_weights == 0,
    0,
    ifelse(
      current_weights < current_theo_weights - trade_buffer,
      (current_theo_weights - trade_buffer)*cap_equity/current_prices,
      ifelse(
        current_weights > current_theo_weights + trade_buffer,
        (current_theo_weights + trade_buffer)*cap_equity/current_prices,
        current_positions
      )
    )
  )

  as.numeric(target_positions)  # convert to unamed vector
}

When I use the vectorised version, I get another small speedup (370ms total), and the time spent in this function reduces by more than a factor of two.

It’s worth thinking about this function a little more. The time it requires will scale by both the number of time periods in the backtest, as well as (to a lesser extent) by the number of assets in the universe. For that reason, it made sense to try to squeeze out a little more performance and rewrite it in C++ via the Rcpp package:

#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
NumericVector positions_from_no_trade_buffer_cpp(NumericVector current_positions, NumericVector current_prices, NumericVector current_theo_weights, double cap_equity, int num_assets, double trade_buffer) {
  NumericVector current_weights(num_assets);
  NumericVector target_positions(num_assets);
  std::copy( current_positions.begin(), current_positions.end(), target_positions.begin() ) ;

  int j = 0;
  for(j = 0; j < num_assets; j++)
  {
    current_weights[j] = current_positions[j]*current_prices[j]/cap_equity;
  }

  j = 0;
  for(j = 0; j < num_assets; j++)
  {
    //Rprintf(\"%i %f %f \\n\", j, current_theo_weights[j], current_weights[j]);
    if(R_IsNA(current_theo_weights[j]) | current_theo_weights[j] == 0)
      target_positions[j] = 0;
    else if(current_weights[j] < current_theo_weights[j] - trade_buffer)
      target_positions[j] = (current_theo_weights[j] - trade_buffer)*cap_equity/current_prices[j];
    else if(current_weights[j] > current_theo_weights[j] + trade_buffer)
      target_positions[j] = (current_theo_weights[j] + trade_buffer)*cap_equity/current_prices[j];
  }

  return target_positions;
}

Benchmarking the three versions of these functions is illustrative:

library(microbenchmark)

# make wide matrixes
wide_prices <- backtest_df_long %>%
  pivot_wider(date, names_from = 'ticker', values_from = 'price') %>% 
  data.matrix()
wide_theo_weights <- backtest_df_long %>%
  pivot_wider(date, names_from = 'ticker', values_from = 'theo_weight') %>% 
  data.matrix()

# pick a day's prices and weights
current_prices <- wide_prices[100, -1]
current_theo_weights <- wide_theo_weights[100, -1]
num_assets <- ncol(wide_prices) - 1  # -1 for date column
current_positions <- rep(0, num_assets)
cap_equity <- 10000
trade_buffer = 0.04


# check versions match 
all.equal(
  positions_from_no_trade_buffer_cpp(current_positions, current_prices, current_theo_weights, cap_equity, num_assets, trade_buffer),
  positions_from_no_trade_buffer_vec(current_positions, current_prices, current_theo_weights, cap_equity, num_assets, trade_buffer)
)
#> [1] TRUE
all.equal(
  positions_from_no_trade_buffer_cpp(current_positions, current_prices, current_theo_weights, cap_equity, num_assets, trade_buffer),
  positions_from_no_trade_buffer(current_positions, current_prices, current_theo_weights, cap_equity, num_assets, trade_buffer)
)
#> [1] TRUE

# make more concise expressions for microbenchmark
for_loop_version <- function() {
  positions_from_no_trade_buffer(current_positions, current_prices, current_theo_weights, cap_equity, num_assets, trade_buffer)
}

vectorised_version <- function() {
  positions_from_no_trade_buffer_vec(current_positions, current_prices, current_theo_weights, cap_equity, num_assets, trade_buffer)
}

cpp_version <- function() {
  positions_from_no_trade_buffer_cpp(current_positions, current_prices, current_theo_weights, cap_equity, num_assets, trade_buffer)
}

# benchmark
microbenchmark(
  for_loop_version(),
  vectorised_version(),
  cpp_version(),
  times = 1000
) %>% 
  summary()
#>                   expr     min       lq       mean   median      uq      max neval
#> 1   for_loop_version() 239.801 268.5515 364.200015 315.2005 394.101 8460.800  1000
#> 2 vectorised_version()  25.701  29.4010  51.636307  31.5510  43.751 9537.701  1000
#> 3        cpp_version()   2.501   3.3010   7.439904   3.9010   5.001 2211.001  1000

The C++ version is roughly 7x faster than the vectorised version and 10x faster than the original version if we use the median execution time. The units here are microseconds, and admittedly, when we’re in this realm, such a speedup makes little practical difference, but it conceivably could if we were backtesting a large enough universe over enough time periods.

Because we use the function so much, using the C++ version gets our profvis output down to about 200ms, from 370ms previously.

Another efficiency can be gained by pushing data transformations that only need to happen once outside of the function whose speed matters. In this case, we could, for example, make the wide price and weight matrixes upstream of the backtesting function, pushing that responsibility back to the user. This is, in fact, what we’ve done in the current implementation of rsims. Profiling the current version results in the following output:

library(rsims)

profvis({
  cash_backtest(
    prices,
    weights,
    trade_buffer = 0.,
    initial_cash = 1000,
    commission_pct = 0.001,
    capitalise_profits = FALSE
  )
}, interval = 0.01)

The total time taken was 160ms, a 47x speedup over the original version.

How does rsims scale?

Finally, let’s see how rsims performs as we increase the number of time steps and the size of the universe. We’ll benchmark the performance with universe sizes from 100 to 1,000, and time periods from 2,500 to 10,000 days (approximately 10 to 40 trading years):

library(rsims)

get_mean_time <- function(days, universe, times = 5) {
  dates <- seq(as.numeric(as.Date("1980-01-01")), as.numeric(as.Date("1980-01-01"))+(days))
  prices <- cbind(dates, gbm_sim(nsim = universe, t = days, mu = 0.1, sigma = 0.1))
  weights <- cbind(dates, rbind(rep(0, universe), matrix(rnorm(days*universe), nrow = days)))

  res <- microbenchmark(
    cash_backtest(
      prices,
      weights,
      trade_buffer = 0.,
      initial_cash = 1000,
      commission_pct = 0.001,
      capitalise_profits = FALSE
    ),
    times = times
  )
  mean(res$time)/1e9
}

num_assets <- seq(100, 1000, 100)
num_days <- c(10, 20, 30, 40)*252

means <- list()
for(universe in num_assets) {
  print(glue::glue("Doing universe size {universe}"))
  for(days in num_days) {
    print(glue::glue("Doing {days} days"))
    means <- c(means, get_mean_time(days, universe, times = 10))
  }
}

Plotting the results:

df <- as.data.frame(matrix(unlist(means), ncol = length(num_assets))) %>%
  mutate(days = num_days)

colnames(df) <- c(num_assets, "days")

df %>%
  pivot_longer(cols = -days, names_to = "universe_size", values_to = "mean_sim_time") %>%
  mutate(universe_size = as.numeric(universe_size)) %>%
  ggplot(aes(x = universe_size, y = mean_sim_time, colour = factor(days))) +
    geom_line() +
    geom_point() +
    labs(
      x = "Universe size",
      y = "Mean simulation time, seconds",
      title = "Mean simulation time from 10 iterations",
      colour = "Time Steps"
    ) +
    theme_bw()

We can see that rsims scales well in general. I suspect that there was a blow out for the universe sizes of 900 and 1,000 for the 40-year backtest due to memory constraints of my local setup (100 Chrome tabs anyone?).

Other ideas not implemented

There are some other tricks for speeding up R code that weren’t applicable here, but that are worth knowing about.

Parallel processing

Parallel processing is a well-trodden path for doing computations in parallel on more than one processor. In R, the parallel package is the original parallel processing toolkit and is included in base R. It parallelises some standard R functions out of the box, such as the apply functions. There’s also the foreach and doParallel packages.

In our application, parallelisation won’t work for the event loop because of path dependency – tomorrow’s trades depend on yesterday’s positions, so we can’t do yesterday’s and today’s trades in parallel.

We could potentially parallelise the position delta calculations for each asset within each loop iteration, as these aren’t dependent on one another. This operation is already fast – on the order of microseconds – so we have little to gain in absolute terms, and I think the overhead of setting up and managing parallel processes would probably negate any speed gains anyway.

Intelligent application of logical operators

A common inefficiency is using vectorised AND and OR operators (&, |) in comparisons involving scalars. The vectorised versions always evaluate both sides of the logical operator, whereas the non-vectorised versions (&&, ||) only execute the right-hand side (and subsequent comparisons) if necessary.

For example, the expression (1 > 4) & (3 < 5) evaluates both sides of the &, while (1 > 4) && (3 < 5) only evaluates the first, because the expression is falsified by the first comparison.

Granted, this is a very minor inefficiency but can make a difference if you’re doing a lot of such operations. Just be careful not to use scalar && and || on vectors, as they will only evaluate the first element!

Conclusion

By far the biggest efficiency gains came with converting data frames to matrixes. This is worth considering when speed is important, so long as the trade-offs around data consistency and convenience make sense for the application.

Smaller but useful efficiency gains came from:

  • Preallocating data containers rather than growing them on the fly
  • Pushing data transformations that only need to happen once outside the function whose speed matters (for example, make the wide price and weights matrixes once, then run many fast backtests with different parameters)
  • Vectorising where possible
  • Using C++ via Rcpp

You might also want to consider parallel processing and careful usage of logical operators.

(2) Comments

August 21, 2021 at 5:00 am

Have you considered using xts instead of a matrix? That would get you some free lunch. It’s still faster to index xts objects by a number, instead of a timestamp. But you can get the row of the timestamp once, then use the row number for all later subset operations. You also get xts’ ISO8601 range-based subsetting.

August 22, 2021 at 5:42 pm

I hate to admit that I didn’t think of using xts – what an oversight considering how much I’ve benefited from using it in the past! Thanks for bringing it up. I’m going to work on replacing matrixes with xts objects in the next release.

Leave a Comment