How to Model Features as Expected Returns

Modeling features as expected returns can be a useful way to develop trading strategies, but it requires some care.

The main advantage is that it directly aligns with the objective of predicting and capitalising on future returns. This can make optimisation and implementation more intuitive. It also facilitates direct comparison between features and provides a common framework for incorporating new signals or reassessing existing ones.

Finally, expected return models can be integrated with common risk models such as covariance estimates, and enable a direct comparison with trading costs, opening up the possibility to use mathematical optimisation techniques to manage the trade-off between return, risk, and turnover in the face of real world constraints.

However, a little care is needed in the modeling process. It is all too easy to overfit models to past data, and the risk of mis-specifying a model (choosing a model that doesn’t reflect the underlying data) is real.

You can somewhat mitigate these risks by keeping things simple and having a good understanding of your futures that informs your choice of model. I’ll share some examples below.

This article continues our recent stat arb series. The previous articles are linked below:

In this article, we’ll take the carry, momentum, and breakout features used in the previous articles and model them as expected returns to use in a trading strategy. I’ll show you some simple tricks to avoid the most common pitfalls.

First, load libraries and set session options:

# session  options
options(repr.plot.width = 14, repr.plot.height=7, warn = -1)

library(tidyverse)
library(tidyfit)
library(tibbletime)
library(roll)
library(patchwork)
pacman::p_load_current_gh("Robot-Wealth/rsims", dependencies = TRUE) 

# chart options
theme_set(theme_bw())
theme_update(text = element_text(size = 20))
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
 dplyr     1.1.4      readr     2.1.4
 forcats   1.0.0      stringr   1.5.1
 ggplot2   3.4.4      tibble    3.2.1
 lubridate 1.9.3      tidyr     1.3.1
 purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
 dplyr::filter() masks stats::filter()
 dplyr::lag()    masks stats::lag()
 Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Attaching package: 'tibbletime'


The following object is masked from 'package:stats':

    filter


Next, load the data we’ve been using in this series. It consists of daily perpetual futures contract prices from Binance. You can get it here.

perps <- read_csv("https://github.com/Robot-Wealth/r-quant-recipes/raw/master/quantifying-combining-alphas/binance_perp_daily.csv")
head(perps)
Rows: 187251 Columns: 11
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr  (1): ticker
dbl  (9): open, high, low, close, dollar_volume, num_trades, taker_buy_volum...
date (1): date

 Use `spec()` to retrieve the full column specification for this data.
 Specify the column types or set `show_col_types = FALSE` to quiet this message.
A tibble: 6 × 11
tickerdateopenhighlowclosedollar_volumenum_tradestaker_buy_volumetaker_buy_quote_volumnefunding_rate
<chr><date><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl>
BTCUSDT2019-09-1110172.1310293.11 9884.31 9991.84 8595536910928 5169.153 52110075-3e-04
BTCUSDT2019-09-12 9992.1810365.15 9934.1110326.581572234981938411822.980119810012-3e-04
BTCUSDT2019-09-1310327.2510450.1310239.4210296.5718905512925370 9198.551 94983470-3e-04
BTCUSDT2019-09-1410294.8110396.4010153.5110358.0020603134931494 9761.462100482121-3e-04
BTCUSDT2019-09-1510355.6110419.9710024.8110306.3721132687427512 7418.716 76577710-3e-04
BTCUSDT2019-09-1610306.7910353.8110115.0010120.0720821137629030 7564.376 77673986-3e-04

For the purposes of this example, we’ll create the same crypto universe that we used last time – the top 30 Binance perpetual futures contracts by trailing 30-day dollar-volume, with stables and wrapped tokens removed.

We’ll also calculate returns at this step for later use.

# get same universe as before - top 30 by rolling 30-day dollar volume, no stables

# remove stablecoins
# list of stablecoins from defi llama
url <- "https://stablecoins.llama.fi/stablecoins?includePrices=true"
response <- httr::GET(url)

stables <- response %>%
  httr::content(as = "text", encoding = "UTF-8") %>%
  jsonlite::fromJSON(flatten = TRUE) %>%
  pluck("peggedAssets") %>%
  pull(symbol)

# sort(stables)

perps <- perps %>% 
  filter(!ticker %in% glue::glue("{stables}USDT")) 
# get the top 30 by trailing 30-day volume
trading_universe_size <- 30

universe <- perps %>%
  group_by(ticker) %>% 
  # also calculate returns for later
  mutate(
    total_return_simple = funding_rate + (close - lag(close, 1))/lag(close, 1),
    total_return_log = log(1 + total_return_simple),
    total_fwd_return_simple = dplyr::lead(funding_rate, 1) + (dplyr::lead(close, 1) - close)/close,
    total_fwd_return_log = log(1 + total_fwd_return_simple) 
  ) %>% 
  mutate(trail_volume = roll_mean(dollar_volume, 30)) %>% 
  na.omit() %>% 
  group_by(date) %>%
  mutate(
    volume_rank = row_number(-trail_volume),
    is_universe = volume_rank <= trading_universe_size,
  ) 

universe %>%
  group_by(date, is_universe) %>%
  summarize(count = n(), .groups = "drop") %>%
  ggplot(aes(x=date, y=count, color = is_universe)) + 
  geom_line() + 
  labs(
    title = 'Universe size'
)

Next, calculate our features as before. We have:

  • Short-term (10-day) cross-sectional momentum (buckted into deciles by date)
  • Short-term (1-day) cross-sectional carry (also bucketed into deciles by date)
  • A breakout feature defined as the number of days since the 20-day high which we use as a time-series return predictor.
# calculate features as before
rolling_days_since_high_20 <- purrr::possibly(
  tibbletime::rollify(
    function(x) {
      idx_of_high <- which.max(x)
      days_since_high <- length(x) - idx_of_high
      days_since_high
    }, 
    window = 20, na_value = NA), 
  otherwise = NA
)

features <- universe %>%
  group_by(ticker) %>%
  arrange(date) %>%
  mutate(
    breakout = 9.5 - rolling_days_since_high_20(close),  # puts this feature on a scale -9.5 to +9.5
    momo = close - lag(close, 10)/close,
    carry = funding_rate
  ) %>%
  ungroup() %>% 
  na.omit()

# create a model df on our universe with momo and carry features scaled into deciles
model_df <- features %>%
  filter(is_universe) %>% 
  group_by(date) %>%
  mutate(
    carry_decile = ntile(carry, 10),
    momo_decile = ntile(momo, 10),
    # also calculate demeaned return for everything in our universe each day for later
    demeaned_return = total_return_simple - mean(total_return_simple, na.rm = TRUE),
    demeaned_fwd_return = total_fwd_return_simple - mean(total_fwd_return_simple, na.rm = TRUE)
  ) %>% 
  ungroup()

  # start simulation from date we first have n tickers in the universe
start_date <- features %>%
  group_by(date, is_universe) %>%
  summarize(count = n(), .groups = "drop") %>%
  filter(count >= trading_universe_size) %>%
  head(1) %>%
  pull(date)

Now that we have our features, we can model expected returns.

In the article on Quantifying and Combining Alphas, we saw that the carry feature showed a very rough and noisily linear relationship with forward cross-sectional returns. It looks even nicer, although not perfectly linear, on our cut-down universe:

model_df %>%
  group_by(carry_decile) %>% 
  summarise(mean_return = mean(demeaned_fwd_return)) %>% 
  ggplot(aes(x = factor(carry_decile), y = mean_return)) +
  geom_bar(stat = "identity")

While not perfectly linear in expected returns, it would be entirely acceptable to model this feature with a simple linear model. In fact, it’s probably the optimal thing to do because it will limit your ability to overfit to the past. Having said that, to the extent that the outlying return to the tenth decile is real and stable through time, you might explicitly account for it in your model – I’d probably just keep things simple though.

In contrast, the momentum feature doesn’t look quite so nice on our cut-down universe. In fact, it looks quite random:

model_df %>%
  group_by(momo_decile) %>% 
  summarise(mean_return = mean(demeaned_fwd_return)) %>% 
  ggplot(aes(x = factor(momo_decile), y = mean_return)) +
  geom_bar(stat = "identity")

This is interesting. Perhaps there’s some liquidity dependency here given that the factor looks rather non-predictive on our more-liquid universe (although it didn’t look so good on our larger universe either).

But at this point, it’s hard to make the case for a linear model doing this feature justice.

Let’s see how this feature’s relationship with forward returns has changed over time. We’ll plot a factor plot for each year in our data set:

model_df %>%
  mutate(Year = year(date)) %>% 
  group_by(momo_decile, Year) %>% 
  summarise(mean_return = mean(demeaned_fwd_return, na.rm = TRUE), .groups = "drop") %>% 
  ggplot(aes(x = factor(momo_decile), y = mean_return)) +
  geom_bar(stat = "identity") +
  facet_wrap(~Year) +
  labs(
    title = "Momentum factor plot by year"
  )

Interesting. You could argue that in 2020, 2021, and (maybe) 2022, the momentum feature showed a noisy inverse relationship, implying mean reversion. In 2023 and 2024, that looks to have flipped.

When we take all the years together as in the first plot, it looks like a big random mess. But digging a bit deeper reveals a (potentially) real but changing relationship instead. And you could make a case that a linear model was a reasonable choice for each year in the plot above… it’s just that a different model seems appropriate for each year.

One significant benefit of the approach we’ll use here is that you can update your model of expected returns through time, and thus capture changing relationships such as this.

As an aside, this is the sort of analysis you’d do for all of your features – you really want to understand them in as much detail as possible. But this is already going to be a long article, so let’s press on.

Next let’s check what our breakout feature looks like:

model_df %>%
  group_by(breakout) %>% 
  summarise(mean_return = mean(total_fwd_return_simple)) %>% 
  ggplot(aes(x = factor(breakout), y = mean_return)) +
  geom_bar(stat = "identity")

This provides another good example of thinking about a model that adequately describes our feature.

You can see that for the values -9.5 through about 3.5, the relationship with forward returns looks entirely random.

But from about 4.5 onwards, it looks quite non-random. You could argue for a linear model on this portion of the feature’s range. This leads to a stepwise linear model of expected returns, where expected returns are zero for -9.5 through 3.5 and linear from 3.5 through 9.5.

options(repr.plot.width = 10, repr.plot.height=4)
data.frame(breakout = seq(from = -9.5, to = 9.5, by = 1)) %>% 
    mutate(expected_return = case_when(breakout <= 3.5 ~ 0, TRUE ~ breakout * 0.005)) %>% 
    mutate(groups = case_when(breakout <= 3.5 ~ 0, TRUE ~ 1)) %>% 
    ggplot(aes(x = breakout, y = expected_return, group = groups)) +
    geom_line() +
    geom_point() +
    labs(
        title = "Example stepwise linear model for breakout feature"
    )

It would also be quite acceptable to model this feature’s expected returns as zero from -9.5 through 3.5 and the mean of the expected returns to the remaining deciles above 3.5:

data.frame(breakout = seq(from = -9.5, to = 9.5, by = 1)) %>% 
    mutate(expected_return = case_when(breakout <= 3.5 ~ 0, TRUE ~ 0.0025)) %>% 
    mutate(groups = case_when(breakout <= 3.5 ~ 0, TRUE ~ 1)) %>% 
    ggplot(aes(x = breakout, y = expected_return, group = groups)) +
    geom_line() +
    geom_point() +
    labs(
        title = "Example stepwise uniform model for breakout feature"
    )
# session  options
options(repr.plot.width = 14, repr.plot.height=7)

This has the advantage of simplicity, but I do think that it makes sense that this feature would behave linearly at the top end (that is, expected returns are higher the closer we are to the 20-day high).

Now that we have a reasonable specification for our models, let’s go ahead and build them.

This is slightly complicated by the fact that we should do this on a rolling basis, only including information that was available at the time.

This leads to rolling estimates for our model coefficients.

These rolling regressions were once a bit of a pain to set up, but the tidymodels ecosystem now makes this quite simple and intuitive and integrates with the rest of the tidyverse.

This modeling ecosystem is incredibly rich and provides a single interface to many model specifications. The caret package was an early attempt at this, but tidymodels takes it further with tight integration with the rest of the tidyverse and a consistent interface regardless of the model type.

An example of this richness is that we can estimate a standard linear model using ordinary least squares with lm. But we can also estimate one with robust standard errors (for example, accounting for autocorrelation and heteroscedasicity) by simply passing the vcov. argument, which will instead fit the model using the sandwich package – all without changing the model interface. Very cool. This won’t make much difference in this application, but I’ll include it in the example.

We’ll fit our model on 90-day windows of data, and refit every 10 days. 90 days doesn’t sound like a lot of data, but in general I find that fitting to the recent past tends to work better.

# use a 90-day window and refit every 10 days
is_days <- 90
step_size <- trading_universe_size*10  

# rolling model for cross-sectional features
roll_xs_coeffs_df <- model_df %>% 
  filter(date >= start_date) %>% 
  regress(
    demeaned_fwd_return ~ carry_decile + momo_decile,
    m("lm", vcov. = "HAC"),  
    .cv = "sliding_index", 
    .cv_args = list(lookback = days(is_days), step = step_size, index = "date"),  
    .force_cv = TRUE, 
    .return_slices = TRUE
  )

# rolling model for time series features 
breakout_cutoff <- 5.5  # below this level, we set our expected return to zero
roll_ts_coeffs_df <- model_df %>% 
  filter(date >= start_date) %>% 
  # setting regression weights to zero when breakout < breakout_cutoff will give these data points zero weight in estimating coefficients
  mutate(regression_weights = case_when(breakout < breakout_cutoff ~ 0, TRUE ~ 1)) %>% 
  regress(
    total_fwd_return_simple ~ breakout,
    m("lm", vcov. = "HAC"), 
    .weights = "regression_weights", 
    .cv = "sliding_index", 
    .cv_args = list(lookback = days(is_days), step = step_size, index = "date"),  
    .force_cv = TRUE, 
    .return_slices = TRUE
  )

This results in a nested dataframe that contains the model objects and various metadata:

roll_xs_coeffs_df %>% head
roll_ts_coeffs_df %>% select(-settings) %>% head
A tidyfit.models: 6 × 7
modelestimator_fctsize (MB)grid_idmodel_objectsettingsslice_id
<chr><chr><dbl><chr><list><list><chr>
lmstats::lm1.339600#0010000<environment: 0x000001ee56a9a2f0>HAC2021-02-11
lmstats::lm1.345544#0010000<environment: 0x000001ee48593470>HAC2021-02-21
lmstats::lm1.358320#0010000<environment: 0x000001ee551fa4a8>HAC2021-03-03
lmstats::lm1.363056#0010000<environment: 0x000001ee554e3170>HAC2021-03-13
lmstats::lm1.363000#0010000<environment: 0x000001ee48e2b898>HAC2021-03-23
lmstats::lm1.363136#0010000<environment: 0x000001ee4769c6a0>HAC2021-04-02
A tidyfit.models: 6 × 6
modelestimator_fctsize (MB)grid_idmodel_objectslice_id
<chr><chr><dbl><chr><list><chr>
lmstats::lm1.224192#0010000<environment: 0x000001ee47e4b2d0>2021-02-11
lmstats::lm1.236392#0010000<environment: 0x000001ee548f9590>2021-02-21
lmstats::lm1.233184#0010000<environment: 0x000001ee54a51478>2021-03-03
lmstats::lm1.233880#0010000<environment: 0x000001ee54dc9dc0>2021-03-13
lmstats::lm1.238816#0010000<environment: 0x000001ee5519bf28>2021-03-23
lmstats::lm1.235976#0010000<environment: 0x000001ee5569ce28>2021-04-02

slice_id is the date the model goes out of sample – so we’ll need to make sure that we align our model coefficients to avoid using them on the data they were fitted on.

This requires a little data wrangling:

# for this to work, need to install.packages("sandwich", "lmtest")
xs_coefs <- roll_xs_coeffs_df %>% 
  coef()

xs_coefs_df <- xs_coefs %>% 
  ungroup() %>% 
  select(term, estimate, slice_id) %>% 
  pivot_wider(id_cols = slice_id, names_from = term, values_from = estimate) %>% 
  mutate(slice_id = as_date(slice_id)) %>% 
  # need to lag slice id to make it oos
  # slice_id_oos is the date we start using the parameters
  mutate(slice_id_oos = lead(slice_id)) %>% 
  rename("xs_intercept" = `(Intercept)`)

ts_coefs <- roll_ts_coeffs_df %>% 
  coef()

ts_coefs_df <- ts_coefs %>% 
  ungroup() %>% 
  select(term, estimate, slice_id) %>% 
  pivot_wider(id_cols = slice_id, names_from = term, values_from = estimate) %>% 
  mutate(slice_id = as_date(slice_id)) %>% 
  # need to lag slice id to make it oos
  # slice_id_oos is the date we start using the parameters
  mutate(slice_id_oos = lead(slice_id)) %>% 
  rename("ts_intercept" = `(Intercept)`)

xs_coefs_df %>% head
# ts_coefs_df %>% head
A tibble: 6 × 5
slice_idxs_interceptcarry_decilemomo_decileslice_id_oos
<date><dbl><dbl><dbl><date>
2021-02-11-0.0048722080.001732322-0.00082397382021-02-21
2021-02-21-0.0061296080.001902553-0.00076401902021-03-03
2021-03-03-0.0095381800.002148154-0.00038972202021-03-13
2021-03-13-0.0083743020.001953655-0.00041415152021-03-23
2021-03-23-0.0082750460.002211137-0.00068988112021-04-02
2021-04-02-0.0086134230.002413307-0.00082984512021-04-12

Here’s a plot of our cross-sectional features’ regression coefficients through time:

# plot cross-sectional estimates
xs_coefs_df %>% 
  select(-slice_id) %>% 
  pivot_longer(cols = -slice_id_oos, names_to = "coefficient", values_to = "estimate") %>% 
  ggplot(aes(x = slice_id_oos, y = estimate)) +
  geom_line() +
  facet_wrap(~coefficient, ncol = 1, scales = "free_y") +
  labs(
    title = "Cross-sectional features model parameters",
    x = "Date",
    y = "Estimate"
  )

# plot time-series estimates
# ts_coefs_df %>% 
#   select(-slice_id) %>% 
#   pivot_longer(cols = -slice_id_oos, names_to = "coefficient", values_to = "estimate") %>% 
#   ggplot(aes(x = slice_id_oos, y = estimate)) +
#   geom_line() +
#   facet_wrap(~coefficient, ncol = 1, scales = "free_y") +
#   labs(
#     title = "Time-series features model parameters",
#     x = "Date",
#     y = "Estimate"
#   )

The estimates for the model coefficients for our carry and momentum features change over time to reflect the changing relationship with forward returns.

In particular, notice how the momentum coefficient flipped sign a few times, but especially from mid-2022, which is in line with our understanding of how the feature evolved.

Now we can plot a time series of returns to a frictionless trading strategy based on these expected return estimates. This isn’t a backtest – it makes no attempt to address real-world issues such as costs and turnover. It simply plots the returns to our predictions of expected returns over time.

I won’t actually use the linear model of the breakout feature – instead I’ll just set its expected return to 0.002 when it’s greater than 5 and 0 otherwise. I don’t like that the breakout coefficients go negative from time to time.

I’ll calculate target positions proportional to their cross-sectional return estimates. I’ll then let the breakout feature tilt the portfolio net long, but I’ll constrain the maximum delta that this feature can add to a position.

# join and fill using slice_id to designate when the model goes oos
exp_return_df <- model_df %>% 
  left_join(
    xs_coefs_df %>% left_join(ts_coefs_df, by = c("slice_id", "slice_id_oos")), 
    by = join_by(closest(date > slice_id_oos)), suffix = c("_factor", "_coef")
  ) %>% 
  na.omit() %>% 
  # forecast cross-sectional expected return as
  mutate(expected_xs_return = carry_decile_factor*carry_decile_coef + momo_decile_factor*momo_decile_coef + xs_intercept ) %>% 
  # mean expected xs return each day is zero
  # let total expected return be xs return + ts return - allows time series expected return to tilt weights
  mutate(expected_ts_return = case_when(breakout_factor >= 5.5 ~ 0.002, TRUE ~ 0)) %>% 
  ungroup() 

# long-short the xs expected return
# layer ts expected return on top
# position by expected return

# 1 in the numerator lets it get max 100% long due to breakout
max_ts_pos <- 0.5/trading_universe_size

strategy_df <- exp_return_df %>% 
  filter(date >= start_date) %>% 
  group_by(date) %>% 
  mutate(xs_position = expected_xs_return - mean(expected_xs_return, na.rm = TRUE)) %>% 
  # scale positions so that leverage is 1
  group_by(date) %>% 
  mutate(xs_position = if_else(xs_position == 0, 0, xs_position/sum(abs(xs_position)))) %>%
  # layer ts expected return prediction
  ungroup() %>% 
  mutate(ts_position = sign(expected_ts_return)) %>% 
  # constrain maximum delta added by time series prediction
  mutate(ts_position = if_else(ts_position >= 0, pmin(ts_position, max_ts_pos), pmax(ts_position, -max_ts_pos))) %>% 
  mutate(position = xs_position + ts_position) %>% 
  # strategy return
  mutate(strat_return = position*total_fwd_return_simple) %>% 
  # scale back to leverage 1
  group_by(date) %>% 
  mutate(position = if_else(position == 0, 0, position/sum(abs(position)))) 
  
returns_plot <- strategy_df %>%
  group_by(date) %>% 
  summarise(total_ret = sum(strat_return)) %>% 
  ggplot(aes(x = date, y = cumsum(log(1+total_ret)))) +
  geom_line() +
  labs(
    title = "Cumulative strategy return",
    y = "Cumulative return"
  )

weights_plot <- strategy_df %>%
  summarise(net_pos = sum(position)) %>% 
  ggplot(aes(x = date, y = net_pos)) +
  geom_line() +
  labs(
    x = "Date",
    y = "Net Weight"
  )

returns_plot / weights_plot + plot_layout(heights = c(2,1))

Notice that I still had to make some manual adjustments to the positions resulting from the time-series predictions which were quite imprecise (for instance, using the sign of the prediction as the position then scaling it back using a maximum delta constraint).

This is OK, but I’d rather use my predictions a little more directly. I want to model my features as best I can, and then at each decision point, I want to answer the question “Given these expected returns, what’s the best portfolio given my constraints?” That’s an optimisation problem, and one that we can actually solve without resorting to manual adjustments and heuristics. I’ll show you how to do that in the next article.

But first, let’s do a more accurate simulation given our target weights and trading costs. We’ll use rsims, like in the previous example.

First, wrangle our data into matrixes of target positions, prices, and funding rates:

# get weights as a wide matrix 
# note that date column will get converted to unix timestamp
backtest_weights <- strategy_df %>%
  pivot_wider(id_cols = date, names_from = ticker, values_from = c(close, position)) %>%  # pivot wider guarantees prices and theo_weight are date aligned
  select(date, starts_with("position_")) %>%
  data.matrix()

# NA weights should be zero
backtest_weights[is.na(backtest_weights)] <- 0

head(backtest_weights, c(5, 5))

# get prices as a wide matrix
# note that date column will get converted to unix timestamp
backtest_prices <- strategy_df %>%
  pivot_wider(id_cols = date, names_from = ticker, values_from = c(close, position)) %>%  # pivot wider guarantees prices and theo_weight are date aligned
  select(date, starts_with("close_")) %>%
  data.matrix()

head(backtest_prices, c(5, 5))

# get funding as a wide matrix
# note that date column will get converted to unix timestamp
backtest_funding <- strategy_df %>%
  pivot_wider(id_cols = date, names_from = ticker, values_from = c(close, funding_rate)) %>%  # pivot wider guarantees prices and funding_returns_simple are date aligned
  select(date, starts_with("funding_rate_")) %>%
  data.matrix()

head(backtest_funding, c(5, 5))
A matrix: 5 × 5 of type dbl
dateposition_1INCHUSDTposition_AAVEUSDTposition_ADAUSDTposition_ALPHAUSDT
18680 0.002694931 0.010544330.0762376871 0.014704422
18681 0.027559434-0.076756130.0125204272-0.010399337
18682 0.040915764-0.053028190.0012411217 0.002097874
18683 0.014953494-0.026676210.0062533668-0.010292540
18684-0.023077149 0.010789000.0002980306-0.023706301
A matrix: 5 × 5 of type dbl
dateclose_1INCHUSDTclose_AAVEUSDTclose_ADAUSDTclose_ALPHAUSDT
186804.3783399.5820.988011.44859
186813.6600347.4450.972911.24510
186823.7108384.9901.026101.28602
186834.1634379.8151.121481.69307
186844.7594340.0261.163221.34361
A matrix: 5 × 5 of type dbl
datefunding_rate_1INCHUSDTfunding_rate_AAVEUSDTfunding_rate_ADAUSDTfunding_rate_ALPHAUSDT
18680-0.00136194-0.00031975-0.00000653-0.00142453
18681-0.00025597-0.00077154-0.00030000-0.00045960
18682-0.00021475-0.00030000-0.00030000-0.00030000
18683-0.00040013-0.00045891-0.00060473-0.00055099
18684-0.00030000-0.00010783-0.00030000-0.00048984

We’ll start with a cost-free simulation that trades frictionlessly into our target positions. The result should look a lot like the returns plot above:

# cost-free, no trade buffer
# simulation parameters
initial_cash <- 10000
fee_tier <- 0
capitalise_profits <- FALSE  # remain fully invested?
trade_buffer <- 0.
commission_pct <- 0.
margin <- 0.05

# simulation
results_df <- fixed_commission_backtest_with_funding(
  prices = backtest_prices,
  target_weights = backtest_weights,
  funding_rates = backtest_funding,
  trade_buffer = trade_buffer,
  initial_cash = initial_cash,
  margin = margin,
  commission_pct = commission_pct,
  capitalise_profits = capitalise_profits
  ) %>%
  mutate(ticker = str_remove(ticker, "close_")) %>%
  # remove coins we don't trade from results
  drop_na(Value)
# make a nice plot with some summary statistics
# plot equity curve from output of simulation
plot_results <- function(backtest_results, trade_on = "close") {
  margin <- backtest_results %>%
    group_by(Date) %>%
    summarise(Margin = sum(Margin, na.rm = TRUE))

  cash_balance <- backtest_results %>%
    filter(ticker == "Cash") %>%
    select(Date, Value) %>%
    rename("Cash" = Value)

  equity <- cash_balance %>%
    left_join(margin, by = "Date") %>%
    mutate(Equity = Cash + Margin)

  fin_eq <- equity %>%
    tail(1) %>%
    pull(Equity)

  init_eq <- equity %>%
    head(1) %>%
    pull(Equity)

  total_return <- (fin_eq/init_eq - 1) * 100
  days <- nrow(equity)
  ann_return <- total_return * 365/days
  sharpe <- equity %>%
    mutate(returns = Equity/lag(Equity)- 1) %>%
    na.omit() %>%
    summarise(sharpe = sqrt(365)*mean(returns)/sd(returns)) %>%
    pull()

  equity %>%
    ggplot(aes(x = Date, y = Equity)) +
      geom_line() +
      labs(
        title = "Crypto Stat Arb Simulation",
        subtitle = glue::glue(
          "Costs {commission_pct*100}% of trade value, trade buffer = {trade_buffer}, trade on {trade_on}
          {round(total_return, 1)}% total return, {round(ann_return, 1)}% annualised, Sharpe {round(sharpe, 2)}"
        )
      )
}

# calculate sharpe ratio from output of simulation
calc_sharpe <- function(backtest_results) {
  margin <- backtest_results %>%
    group_by(Date) %>%
    summarise(Margin = sum(Margin, na.rm = TRUE))

  cash_balance <- backtest_results %>%
    filter(ticker == "Cash") %>%
    select(Date, Value) %>%
    rename("Cash" = Value)

  equity <- cash_balance %>%
    left_join(margin, by = "Date") %>%
    mutate(Equity = Cash + Margin)

    equity %>% 
        mutate(returns = Equity/lag(Equity)- 1) %>%
        na.omit() %>%
        summarise(sharpe = sqrt(355)*mean(returns)/sd(returns)) %>%
        pull()
}

plot_results(results_df)

Now we’ll add costs:

# explore costs-turnover tradeoffs
# with costs, no trade buffer
commission_pct <- 0.0015

# simulation
results_df <- fixed_commission_backtest_with_funding(
  prices = backtest_prices,
  target_weights = backtest_weights,
  funding_rates = backtest_funding,
  trade_buffer = trade_buffer,
  initial_cash = initial_cash,
  margin = margin,
  commission_pct = commission_pct,
  capitalise_profits = capitalise_profits
  ) %>% 
  mutate(ticker = str_remove(ticker, "close_")) %>%
  # remove coins we don't trade from results
  drop_na(Value)

results_df %>%
  plot_results()

Costs are quite a drag on performance. Let’s use the no-trade buffer heuristic from the last article to do the minimum amount of trading to harness the edge:

# find appropriate trade buffer by optimising historical sharpe
sharpes <- list()
trade_buffers <- seq(0, 0.1, by = 0.01)
for(trade_buffer in trade_buffers) {
  sharpes <- c(
    sharpes,
    fixed_commission_backtest_with_funding(
      prices = backtest_prices,
      target_weights = backtest_weights,
      funding_rates = backtest_funding,
      trade_buffer = trade_buffer,
      initial_cash = initial_cash,
      margin = margin,
      commission_pct = commission_pct,
      capitalise_profits = capitalise_profits
    ) %>%
      calc_sharpe()
  )
}

sharpes <- unlist(sharpes)
data.frame(
  trade_buffer = trade_buffers,
  sharpe = sharpes
) %>%
  ggplot(aes(x = trade_buffer, y = sharpe)) +
    geom_line() +
    geom_point(colour = "blue") +
    geom_vline(xintercept = trade_buffers[which.max(sharpes)], linetype = "dashed") +
    labs(
      x = "Trade Buffer Parameter",
      y = "Backtested Sharpe Ratio",
      title = glue::glue("Trade Buffer Parameter vs Backtested Sharpe, costs {commission_pct*100}% trade value"),
      subtitle = glue::glue("Max Sharpe {round(max(sharpes), 2)} at buffer param {trade_buffers[which.max(sharpes)]}")
    )

A value of 0.03 maximised our historical after-cost Sharpe. You might pick a value a little higher than 0.03 to mitigate the risk that your out-of-sample performance will be worse than your in-sample (almost always a good assumption). But for now, let’s just simulate 0.03:

# get back original with costs simulation results
trade_buffer <- 0.03

# simulation
results_df <- fixed_commission_backtest_with_funding(
    prices = backtest_prices,
    target_weights = backtest_weights,
    funding_rates = backtest_funding,
    trade_buffer = trade_buffer,
    initial_cash = initial_cash,
    margin = margin,
    commission_pct = commission_pct,
    capitalise_profits = capitalise_profits
  ) %>% 
  mutate(ticker = str_remove(ticker, "close_")) %>%
  # remove coins we don't trade from results
  drop_na(Value)

# simulation results
results_df %>%
  plot_results()

Performance is a little higher with this approach (note that this simulation starts later than the previous one because we have to exclude the first in-sample model estimation period – over the same period, this one does better, especially recently).

Turnover is a bit higher, but that’s more to do with the lower trade buffer parameter selected:

results_df %>%
  filter(ticker != "Cash") %>% 
  group_by(Date) %>% 
  summarise(Turnover = 100*sum(abs(TradeValue))/initial_cash) %>% 
  ggplot(aes(x = Date, y = Turnover)) +
    geom_line() +
    geom_point() +
    labs(
      title = "Turnover as % of trading capital",
      y = "Turnover, %"
    )

While this is a good result for such a simple modeling approach, the main benefit of this approach is that your features are all modelled on the same scale, making them directly comparable. It also means that incorporating new signals into your system is straightforward and follows the same modeling process. We also saw that we can potentially capture changing feature dynamics directly in the modeling process.

New biases

It’s important to be aware of the new biases that this approach introduces.

For example, there’s a degree of future peeking in specifying a model type for each feature. We chose a linear model for carry and momentum because we saw a (noisily) linear relationship between these features and expected returns. And while we estimate the coefficients of these models on a rolling basis that prevents future peeking, the model type itself was chosen based on knowledge of the entire dataset.

We also introduce two new meta-parameters: the length of the model estimation window, and the frequency with which we refit the model. Inevitably, you’ll try a bunch of values and intoduce some data snooping that way as well.

I think these are fairly benign biases, but they’re real and it’s a good idea to assume that performance will deteriorate out of sample as a result.

Conclusion

Modeling features as expected returns provides a consistent framework for making trading decisions, including by structuring them as an optimisation problem.

However, it does require some care.

In particular, you need to have a good understanding of your features:

  • How they’re distributed
  • How their relationship with expected returns changes across their range
  • Whether there is a sensible basis for modeling those changes or whether they’re just noise
  • How their predictive power changes through time

You need to do the grunt work.

You also need to understand the biases you introduce by adopting this approach.

Soon, we’ll swap out our heuristic approach for managing turnover with an approach that solves an optimisation problem at each decision step. This allows you to directly compare expected returns with costs, and can also accommodate a risk model and real-world constraints. But before we get to that, in the next article I’ll help you develop some intuition for how portfolio optimisation works.

Revision history:

  • 21 Feb 2024: Removed non-USDT denominated contracts from tradeable universe.

8 thoughts on “How to Model Features as Expected Returns”

  1. Kris – really enjoying this series. so many features seem to be predictive to a point, but lose power in the extremes. for example, on a scale of -10 to +10, +1 to +6 predict a positive return, -1 to -6 predict a negative return, and there is little if any predicted return > |6|. what is the best way to model this kind of feature which is obviously non-linear?

    Reply
    • That’s an interesting observation. Normally I like to keep things really simple and prefer a linear model unless I had a really compelling reason not to. If you thought your feature really was non-linear as you describe, then you could model it by flattening the expected return in the extremes, for example by applying a piecewise linear regression with breakpoints at appropriate places. A generalised additive model could also be used.

      But before going down that path, I’d look at your features relationship with returns over time. For example, do you see that non-linearity if you look at each year separately? If not, I’d suggest it’s probably just a random artefact and not something you should include in your model.

      Reply
  2. Hey Kris,
    Thanks for yet another amazing post!
    One thing I noticed is that the data set includes pairs that trade against BTC, such as ETHBTC. In my own testing, I found that adding that to the filter step changes the expected return graphs a little bit :).

    Reply
    • Thank you! Yes you are absolutely right. I’ve removed those non-USDT contracts and updated the posts (as well as the other posts in the series). Thanks for letting me know!

      Reply
  3. Hi Kris, when building models for expected returns, are you generally running a regression on the average return within a feature’s decile/bucket vs. the decile/bucket of the feature instead of running a regression on the return vs. each raw feature value because that relationship is too noisy?

    Reply
    • It really depends on the feature. Transforming the raw feature by bucketing it into deciles is just one way to do it. I like that approach because it’s simple and intuitive. Z-score scaling is another one I like. Sometimes I’ll calculate the z-score’s standard deviation with a longer lookback than the mean – this can sometimes smooth things out as well. In my experience, no one approach tends to always outperform – you have to experiment a bit, figure out what works and what makes sense for each feature. This is where most of the actual work is.

      Reply

Leave a Comment