rsims
is a new package for fast, quasi event-driven backtesting in R. You can find the source on GitHub, docs here, and an introductory blog post here.
Our use case for rsims
was accurate but fast simulation of trading strategies. I’ve had a few questions about how I made the backtester as fast as it is – after all, it uses a giant for
loop, and R is notoriously slow for such operations – so here’s a post about how I optimised rsims
for speed.
Approach
I firstly wrote the backtester with a focus on ease of understanding. I wanted something that worked as expected, and that I could think about without too much effort. At this stage, I wasn’t thinking much about speed or efficiency.
Once I had that working, I focused on finding bottlenecks with profvis
. It’s a good idea to resist the urge to optimise before you know where you’ll get the most bang for your buck (speaking from experience here).
Then I simply tackled those bottlenecks one at a time until I reached a point of diminishing returns (which is a little subjective and dependent on one’s use case).
There’s one thing I’d do differently if I were starting over: write the tests first, rather than after the fact. In hindsight, this would have saved a ton of time. To be honest, it’s something that I say after every little development effort, but this time I’ve really learned my lesson.
Profiling with profvis
The original code looked a lot different from the current version. It had lots of data frames and dplyr
pipelines for operating on data:
positions_from_no_trade_buffer <- function(current_positions, current_prices, current_theo_weights, cap_equity, num_assets, trade_buffer) { current_weights <- current_positions*current_prices/cap_equity target_positions <- current_positions for(j in 1:num_assets) { if(is.na(current_theo_weights[j]) || current_theo_weights[j] == 0) { target_positions[j] <- 0 next } # note: we haven't truncated to nearest whole coin, as coins are divisible (unlike shares) if(current_weights[j] < current_theo_weights[j] - trade_buffer) { target_positions[j] <- (current_theo_weights[j] - trade_buffer)*cap_equity/current_prices[j] } else if(current_weights[j] > current_theo_weights[j] + trade_buffer) { target_positions[j] <- (current_theo_weights[j] + trade_buffer)*cap_equity/current_prices[j] } } unlist(target_positions) } cash_backtest_original <- function(backtest_df_long, trade_buffer = 0., initial_cash = 10000, commission_pct = 0, capitalise_profits = FALSE) { # Create wide data frames wide_prices <- backtest_df_long %>% pivot_wider(date, names_from = 'ticker', values_from = 'price') wide_theo_weights <- backtest_df_long %>% pivot_wider(date, names_from = 'ticker', values_from = 'theo_weight') # get tickers for later tickers <- colnames(wide_prices)[-1] # initial state num_assets <- ncol(wide_prices) - 1 # -1 for date column current_positions <- rep(0, num_assets) previous_theo_weights <- rep(0, num_assets) row_list <- list() cash <- initial_cash # backtest loop for(i in 1:nrow(wide_prices)) { current_date <- wide_prices[i, 1] %>% pull() %>% as.Date() current_prices <- wide_prices[i, -1] %>% as.numeric() current_theo_weights <- wide_theo_weights[i, -1] %>% as.numeric() # update equity equity <- sum(current_positions * current_prices, na.rm = TRUE) + cash cap_equity <- ifelse(capitalise_profits, equity, min(initial_cash, equity)) # min reflects assumption that we don't top up strategy equity if in drawdown # update positions based on no-trade buffer target_positions <- positions_from_no_trade_buffer(current_positions, current_prices, current_theo_weights, cap_equity, num_assets, trade_buffer) # calculate position deltas, trade values and commissions trades <- target_positions - current_positions trade_value <- trades * current_prices commissions <- abs(trade_value) * commission_pct # adjust cash by value of trades cash <- cash - sum(trade_value, na.rm = TRUE) - sum(commissions, na.rm = TRUE) current_positions <- target_positions position_value <- current_positions * current_prices equity <- sum(position_value, na.rm = TRUE) + cash # Create data frame and add to list row_df <- data.frame( Ticker = c('Cash', tickers), Date = rep(current_date, num_assets + 1), Close = c(0, current_prices), Position = c(0, current_positions), Value = c(cash, position_value), Trades = c(-sum(trade_value), trades), TradeValue = c(-sum(trade_value), trade_value), Commission = c(0, commissions) ) row_list[[i]] <- row_df previous_theo_weights <- current_theo_weights } # Combine list into dataframe bind_rows(row_list) }
This version of cash_backtest
takes a long data frame of prices and weights by date, which is a very convenient format for data analysis. We can make such a data frame of randomly generated prices and weights:
library(tidyverse) # function for generating prices from GBM process gbm_sim <- function(nsim = 100, t = 25, mu = 0, sigma = 0.1, S0 = 100, dt = 1./365) { # matrix of random draws - one for each day for each simulation epsilon <- matrix(rnorm(t*nsim), ncol = nsim, nrow = t) # get GBM paths gbm <- exp((mu - sigma * sigma / 2) * dt + sigma * epsilon * sqrt(dt)) - 1 # convert to price paths gbm <- apply(rbind(rep(S0, nsim), gbm + 1), 2, cumprod) gbm } # generate prices and weights years <- 20 universe <- 100 x <- 1 tickers <- vector() repeat{ tickers[[x]] <- paste0(sample(LETTERS, 5, replace = TRUE), collapse = "") x <- x + 1 if(x == universe + 1) break } stopifnot(n_distinct(tickers) == universe) date <- seq(as.numeric(as.Date("1980-01-01")), as.numeric(as.Date("1980-01-01"))+(years*365)) prices <- cbind(date, gbm_sim(nsim = universe, t = years*365, mu = 0.1, sigma = 0.1)) colnames(prices) <- c("date", tickers) weights <- cbind(date, rbind(rep(0, universe), matrix(rnorm(years*365*universe), nrow = years*365))) colnames(weights) <- c("date", tickers) backtest_df_long <- prices %>% as.data.frame() %>% mutate(date = as.Date(date, origin ="1970-01-01")) %>% pivot_longer(-date, names_to = "ticker", values_to = "price") %>% left_join( weights %>% as.data.frame() %>% mutate(date = as.Date(date, origin ="1970-01-01")) %>% pivot_longer(-date, names_to = "ticker", values_to = "theo_weight"), by = c("date", "ticker") ) head(backtest_df_long) #> # A tibble: 6 x 4 #> date ticker price theo_weight #> <date> <chr> <dbl> <dbl> #> 1 1980-01-01 TVEZI 100 0 #> 2 1980-01-01 XIERO 100 0 #> 3 1980-01-01 XGYMU 100 0 #> 4 1980-01-01 PMVPF 100 0 #> 5 1980-01-01 KNCIP 100 0 #> 6 1980-01-01 JEBOY 100 0
backtest_df_long
has prices and weights for 100 tickers over 7301 days:
backtest_df_long %>% summarise( num_days = n_distinct(date), num_tickers = n_distinct(ticker) ) #> # A tibble: 1 x 2 #> num_days num_tickers #> <int> <int> #> 1 7301 100
To get some insight into how quickly the backtest runs and where the bottlenecks are, profvis
is your friend:
library(profvis) profvis({cash_backtest_original(backtest_df_long)}, interval = 0.01)
(To get deeper insight, you can extract the function logic as a series of expressions and pass these directly to profvis
– but this shortcut is fine for our purposes)
profvis
tells us that cash_backtest_original
took about 7.5 seconds to run and that most of the time was spent messing around with data frames:
In R, data frames are very convenient for doing data analysis, but they can be slow, particularly when you repeatedly index or subset them. Matrixes tend to be much faster, which suggests that we can get a big win from switching our data frames to matrixes.
There are some considerations. For example, a matrix can only store data of the same type, so we need to convert our convenient, human-readable date column to a Unix timestamp. Later, we’ll also rely on upstream data alignment and we’ll index by row number rather than date, which requires more work by the user, including checking that data alignment is in fact correct. Everything is a trade-off, no free lunches and all that.
Here’s how the code would look with the data frame to matrix conversions – the main changes are around storing each day’s trade information in a list of matrixes, then converting that back to a data frame at the end (this time I’ll pass the function contents directly to profvis
as a series of expressions):
profvis({ # Create matrixes (data.matrix automatically coerces dates to numeric Unix timestamps) wide_prices <- backtest_df_long %>% pivot_wider(date, names_from = 'ticker', values_from = 'price') %>% data.matrix() wide_theo_weights <- backtest_df_long %>% pivot_wider(date, names_from = 'ticker', values_from = 'theo_weight') %>% data.matrix() # get tickers for later tickers <- colnames(wide_prices)[-1] # initial state num_assets <- ncol(wide_prices) - 1 # -1 for date column current_positions <- rep(0, num_assets) previous_theo_weights <- rep(0, num_assets) row_list <- list() cash <- initial_cash # backtest loop for(i in 1:nrow(wide_prices)) { current_date <- wide_prices[i, 1] current_prices <- wide_prices[i, -1] current_theo_weights <- wide_theo_weights[i, -1] # update equity equity <- sum(current_positions * current_prices, na.rm = TRUE) + cash cap_equity <- ifelse(capitalise_profits, equity, min(initial_cash, equity)) # min reflects assumption that we don't top up strategy equity if in drawdown # update positions based on no-trade buffer target_positions <- positions_from_no_trade_buffer(current_positions, current_prices, current_theo_weights, cap_equity, num_assets, trade_buffer) # calculate position deltas, trade values and commissions trades <- target_positions - current_positions trade_value <- trades * current_prices commissions <- abs(trade_value) * commission_pct # adjust cash by value of trades cash <- cash - sum(trade_value, na.rm = TRUE) - sum(commissions, na.rm = TRUE) current_positions <- target_positions position_value <- current_positions * current_prices equity <- sum(position_value, na.rm = TRUE) + cash # store date in matrix as numeric (then convert date back to date format later) row_mat <- matrix( data = c( rep(as.numeric(current_date), num_assets + 1), c(1, current_prices), c(cash, current_positions), c(cash, position_value), c(-sum(trade_value), trades), c(-sum(trade_value), trade_value), c(0, commissions) ), nrow = num_assets + 1, ncol = 7, byrow = FALSE, dimnames = list( # tickers are row names c("Cash", tickers), # column names c("Date", "Close", "Position", "Value", "Trades", "TradeValue", "Commission") ) ) row_list[[i]] <- row_mat previous_theo_weights <- current_theo_weights } # Combine list of matrixes into dataframe do.call(rbind, row_list) %>% tibble::as_tibble(rownames = "ticker") %>% dplyr::mutate( Date = as.Date(Date, origin ="1970-01-01") ) }, interval = 0.01)
On my machine, this took about 500 milliseconds, a 15x speedup.
According to profvis
, there are a couple of minor bottlenecks. One is the storing of each day’s trade details in a list: row_list[[i]] <- row_mat
.
A simple trick for speeding up this operation is to preallocate row_list
by replacing row_list <- list()
with row_list <- vector(mode = "list", length = nrow(wide_prices))
.
This gave me a slight speedup overall (450 vs 500ms) and removed this as a bottleneck.
We all know that R plays really nicely with vectorisation. If we can make use of vectorised functions where possible, we can usually speed things up considerably. One place we can do this is the positions_from_no_trade_buffer
function, which in its original form is a series of if-else statements.
A vectorised version, making use of R’s vectorised ifelse
function would look like this:
positions_from_no_trade_buffer_vec <- function(current_positions, current_prices, current_theo_weights,cap_equity, num_assets, trade_buffer) { current_weights <- current_positions*current_prices/cap_equity target_positions <- ifelse( is.na(current_theo_weights) | current_theo_weights == 0, 0, ifelse( current_weights < current_theo_weights - trade_buffer, (current_theo_weights - trade_buffer)*cap_equity/current_prices, ifelse( current_weights > current_theo_weights + trade_buffer, (current_theo_weights + trade_buffer)*cap_equity/current_prices, current_positions ) ) ) as.numeric(target_positions) # convert to unamed vector }
When I use the vectorised version, I get another small speedup (370ms total), and the time spent in this function reduces by more than a factor of two.
It’s worth thinking about this function a little more. The time it requires will scale by both the number of time periods in the backtest, as well as (to a lesser extent) by the number of assets in the universe. For that reason, it made sense to try to squeeze out a little more performance and rewrite it in C++ via the Rcpp
package:
#include <Rcpp.h> using namespace Rcpp; // [[Rcpp::export]] NumericVector positions_from_no_trade_buffer_cpp(NumericVector current_positions, NumericVector current_prices, NumericVector current_theo_weights, double cap_equity, int num_assets, double trade_buffer) { NumericVector current_weights(num_assets); NumericVector target_positions(num_assets); std::copy( current_positions.begin(), current_positions.end(), target_positions.begin() ) ; int j = 0; for(j = 0; j < num_assets; j++) { current_weights[j] = current_positions[j]*current_prices[j]/cap_equity; } j = 0; for(j = 0; j < num_assets; j++) { //Rprintf(\"%i %f %f \\n\", j, current_theo_weights[j], current_weights[j]); if(R_IsNA(current_theo_weights[j]) | current_theo_weights[j] == 0) target_positions[j] = 0; else if(current_weights[j] < current_theo_weights[j] - trade_buffer) target_positions[j] = (current_theo_weights[j] - trade_buffer)*cap_equity/current_prices[j]; else if(current_weights[j] > current_theo_weights[j] + trade_buffer) target_positions[j] = (current_theo_weights[j] + trade_buffer)*cap_equity/current_prices[j]; } return target_positions; }
Benchmarking the three versions of these functions is illustrative:
library(microbenchmark) # make wide matrixes wide_prices <- backtest_df_long %>% pivot_wider(date, names_from = 'ticker', values_from = 'price') %>% data.matrix() wide_theo_weights <- backtest_df_long %>% pivot_wider(date, names_from = 'ticker', values_from = 'theo_weight') %>% data.matrix() # pick a day's prices and weights current_prices <- wide_prices[100, -1] current_theo_weights <- wide_theo_weights[100, -1] num_assets <- ncol(wide_prices) - 1 # -1 for date column current_positions <- rep(0, num_assets) cap_equity <- 10000 trade_buffer = 0.04 # check versions match all.equal( positions_from_no_trade_buffer_cpp(current_positions, current_prices, current_theo_weights, cap_equity, num_assets, trade_buffer), positions_from_no_trade_buffer_vec(current_positions, current_prices, current_theo_weights, cap_equity, num_assets, trade_buffer) ) #> [1] TRUE all.equal( positions_from_no_trade_buffer_cpp(current_positions, current_prices, current_theo_weights, cap_equity, num_assets, trade_buffer), positions_from_no_trade_buffer(current_positions, current_prices, current_theo_weights, cap_equity, num_assets, trade_buffer) ) #> [1] TRUE # make more concise expressions for microbenchmark for_loop_version <- function() { positions_from_no_trade_buffer(current_positions, current_prices, current_theo_weights, cap_equity, num_assets, trade_buffer) } vectorised_version <- function() { positions_from_no_trade_buffer_vec(current_positions, current_prices, current_theo_weights, cap_equity, num_assets, trade_buffer) } cpp_version <- function() { positions_from_no_trade_buffer_cpp(current_positions, current_prices, current_theo_weights, cap_equity, num_assets, trade_buffer) } # benchmark microbenchmark( for_loop_version(), vectorised_version(), cpp_version(), times = 1000 ) %>% summary() #> expr min lq mean median uq max neval #> 1 for_loop_version() 239.801 268.5515 364.200015 315.2005 394.101 8460.800 1000 #> 2 vectorised_version() 25.701 29.4010 51.636307 31.5510 43.751 9537.701 1000 #> 3 cpp_version() 2.501 3.3010 7.439904 3.9010 5.001 2211.001 1000
The C++ version is roughly 7x faster than the vectorised version and 10x faster than the original version if we use the median execution time. The units here are microseconds, and admittedly, when we’re in this realm, such a speedup makes little practical difference, but it conceivably could if we were backtesting a large enough universe over enough time periods.
Because we use the function so much, using the C++ version gets our profvis
output down to about 200ms, from 370ms previously.
Another efficiency can be gained by pushing data transformations that only need to happen once outside of the function whose speed matters. In this case, we could, for example, make the wide price and weight matrixes upstream of the backtesting function, pushing that responsibility back to the user. This is, in fact, what we’ve done in the current implementation of rsims
. Profiling the current version results in the following output:
library(rsims) profvis({ cash_backtest( prices, weights, trade_buffer = 0., initial_cash = 1000, commission_pct = 0.001, capitalise_profits = FALSE ) }, interval = 0.01)
The total time taken was 160ms, a 47x speedup over the original version.
How does rsims
scale?
Finally, let’s see how rsims
performs as we increase the number of time steps and the size of the universe. We’ll benchmark the performance with universe sizes from 100 to 1,000, and time periods from 2,500 to 10,000 days (approximately 10 to 40 trading years):
library(rsims) get_mean_time <- function(days, universe, times = 5) { dates <- seq(as.numeric(as.Date("1980-01-01")), as.numeric(as.Date("1980-01-01"))+(days)) prices <- cbind(dates, gbm_sim(nsim = universe, t = days, mu = 0.1, sigma = 0.1)) weights <- cbind(dates, rbind(rep(0, universe), matrix(rnorm(days*universe), nrow = days))) res <- microbenchmark( cash_backtest( prices, weights, trade_buffer = 0., initial_cash = 1000, commission_pct = 0.001, capitalise_profits = FALSE ), times = times ) mean(res$time)/1e9 } num_assets <- seq(100, 1000, 100) num_days <- c(10, 20, 30, 40)*252 means <- list() for(universe in num_assets) { print(glue::glue("Doing universe size {universe}")) for(days in num_days) { print(glue::glue("Doing {days} days")) means <- c(means, get_mean_time(days, universe, times = 10)) } }
Plotting the results:
df <- as.data.frame(matrix(unlist(means), ncol = length(num_assets))) %>% mutate(days = num_days) colnames(df) <- c(num_assets, "days") df %>% pivot_longer(cols = -days, names_to = "universe_size", values_to = "mean_sim_time") %>% mutate(universe_size = as.numeric(universe_size)) %>% ggplot(aes(x = universe_size, y = mean_sim_time, colour = factor(days))) + geom_line() + geom_point() + labs( x = "Universe size", y = "Mean simulation time, seconds", title = "Mean simulation time from 10 iterations", colour = "Time Steps" ) + theme_bw()
We can see that rsims
scales well in general. I suspect that there was a blow out for the universe sizes of 900 and 1,000 for the 40-year backtest due to memory constraints of my local setup (100 Chrome tabs anyone?).
Other ideas not implemented
There are some other tricks for speeding up R code that weren’t applicable here, but that are worth knowing about.
Parallel processing
Parallel processing is a well-trodden path for doing computations in parallel on more than one processor. In R, the parallel
package is the original parallel processing toolkit and is included in base R. It parallelises some standard R functions out of the box, such as the apply
functions. There’s also the foreach
and doParallel
packages.
In our application, parallelisation won’t work for the event loop because of path dependency – tomorrow’s trades depend on yesterday’s positions, so we can’t do yesterday’s and today’s trades in parallel.
We could potentially parallelise the position delta calculations for each asset within each loop iteration, as these aren’t dependent on one another. This operation is already fast – on the order of microseconds – so we have little to gain in absolute terms, and I think the overhead of setting up and managing parallel processes would probably negate any speed gains anyway.
Intelligent application of logical operators
A common inefficiency is using vectorised AND
and OR
operators (&
, |
) in comparisons involving scalars. The vectorised versions always evaluate both sides of the logical operator, whereas the non-vectorised versions (&&
, ||
) only execute the right-hand side (and subsequent comparisons) if necessary.
For example, the expression (1 > 4) & (3 < 5)
evaluates both sides of the &
, while (1 > 4) && (3 < 5)
only evaluates the first, because the expression is falsified by the first comparison.
Granted, this is a very minor inefficiency but can make a difference if you’re doing a lot of such operations. Just be careful not to use scalar &&
and ||
on vectors, as they will only evaluate the first element!
Conclusion
By far the biggest efficiency gains came with converting data frames to matrixes. This is worth considering when speed is important, so long as the trade-offs around data consistency and convenience make sense for the application.
Smaller but useful efficiency gains came from:
- Preallocating data containers rather than growing them on the fly
- Pushing data transformations that only need to happen once outside the function whose speed matters (for example, make the wide price and weights matrixes once, then run many fast backtests with different parameters)
- Vectorising where possible
- Using C++ via
Rcpp
You might also want to consider parallel processing and careful usage of logical operators.
Have you considered using xts instead of a matrix? That would get you some free lunch. It’s still faster to index xts objects by a number, instead of a timestamp. But you can get the row of the timestamp once, then use the row number for all later subset operations. You also get xts’ ISO8601 range-based subsetting.
I hate to admit that I didn’t think of using
xts
– what an oversight considering how much I’ve benefited from using it in the past! Thanks for bringing it up. I’m going to work on replacing matrixes withxts
objects in the next release.