## Fitting time series models to the forex market: are ARIMA/GARCH predictions profitable?

Recently, I wrote about fitting mean-reversion time series models to financial data and using the models’ predictions as the basis of a trading strategy. Continuing my exploration of time series modelling, I decided to research the autoregressive and conditionally heteroskedastic family of time series models. In particular, I wanted to understand the autogressive integrated moving average (ARIMA) and generalized autoregressive conditional heteroskedasticity (GARCH) models, since they are referenced frequently in the quantitative finance literature, and its about time I got up to speed. What follows is a summary of what I learned about these models, a general fitting procedure and a simple trading strategy based on the forecasts of a fitted model.

Several definitions are necessary to set the scene. I don’t want to reproduce the theory I’ve been wading through; rather here is my very high level summary of what I’ve learned about time series modelling, in particular the ARIMA and GARCH models and how they are related to their component models:

At its most basic level, fitting ARIMA and GARCH models is an exercise in uncovering the way in which observations, noise and variance in a time series affect subsequent values of the time series. Such a model, properly fitted, would have some predictive utility, assuming of course that the model remained a good fit for the underlying process for some time in the future.

An ARMA model (note: no “I”) is a linear combination of an autoregressive (AR) model and moving average (MA) model. An AR model is one whose predictors are the previous values of the series. An MA model is structurally similar to an AR model, except the predictors are the noise terms. An autoregressive moving average model of order p,q – ARMA(p,q) – is a linear combination of the two and can be defined as:

(1)

where is white noise and and are coefficients of the model.

An ARIMA(p,d,q) model is simply an ARMA(p,q) model differenced ‘d’ times – or integrated (I)- to produce a stationary series.

Finally, a GARCH model attempts to also explain the heteroskedastic behavior of a time series (that is, the characteristic of volatility clustering) as well as the serial influences of the previous values of the series (explained by the AR component) and the noise terms (explained by the MA component). A GARCH model uses an autoregressive process for the variance itself, that is, it uses past values of the variance to account for changes to the variance over time.

With that context setting out of the way, I next fit an ARIMA/GARCH model to the EUR/USD exchange rate and use it as the basis of a trading system. The model’s parameters for each day are estimated using a fitting procedure, that model is then used to predict the next day’s return and a position is entered accordingly and held for one trading day. If the prediction is the same as for the previous day, the existing position is maintained.

A rolling window of log returns is used to fit an optimal ARIMA/GARCH model at the close of each trading day. The fitting procedure is based on a brute force search of the parameters that minimize the Aikake Information Criterion, but other methods can be used. For example, we could choose parameters that minimize the Bayesian Information Criterion, which may help to reduce overfitting by penalizing complex models (that is, models with a large number of parameters). This fitting procedure was inspired by Michael Halls-Moore’s post about an ARIMA+GARCH trading strategy for the S&P500, and I borrowed some of his code.

I chose to use a rolling window of 1000 days to fit the model, but this is a parameter for optimization. There is a case for using as much data as possible in the rolling window, but this may fail to capture the evolving model parameters quickly enough to adapt to a changing market. I won’t explore this too much here, but it would be interesting to investigate the strategy’s performance as a function of the lookback window. Here’s the code:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 |
### ARIMA/GARCH trading model library(quantmod) library(timeSeries) library(rugarch) # get data and initialize objects to hold forecasts EURUSD <- read.csv('EURUSD.csv', header = T) EURUSD[, 1] <- as.Date(as.character(EURUSD[, 1]), format="%d/%m/%Y") returns <- diff(log(EURUSD$C)) ## ttr::ROC can also be used: calculates log returns by default window.length <- 1000 forecasts.length <- length(returns) - window.length forecasts <- vector(mode="numeric", length=forecasts.length) directions <- vector(mode="numeric", length=forecasts.length) p.val <- vector(mode="numeric", length=forecasts.length) # loop through every trading day, estimate optimal model parameters from rolling window # and predict next day's return for (i in 0:forecasts.length) { roll.returns <- returns[(1+i):(window.length + i)] # create rolling window final.aic <- Inf final.order <- c(0,0,0) # estimate optimal ARIMA model order for (p in 0:5) for (q in 0:5) { # limit possible order to p,q <= 5 if (p == 0 && q == 0) next # p and q can't both be zero arimaFit <- tryCatch( arima(roll.returns, order = c(p,0,q)), error = function( err ) FALSE, warning = function( err ) FALSE ) if (!is.logical( arimaFit)) { current.aic <- AIC(arimaFit) if (current.aic < final.aic) { # retain order if AIC is reduced final.aic <- current.aic final.order <- c(p,0,q) final.arima <- arima(roll.returns, order = final.order) } } else next } # specify and fit the GARCH model spec = ugarchspec(variance.model <- list(garchOrder=c(1,1)), mean.model <- list( armaOrder <- c(final.order[1], final.order[3]), include.mean = T), distribution.model = "sged") fit = tryCatch(ugarchfit(spec, roll.returns, solver = 'hybrid'), error = function(e) e, warning = function(w) w) # calculate next day prediction from fitted mode # model does not always converge - assign value of 0 to prediction and p.val in this case if (is(fit, "warning")) { forecasts[i+1] <- 0 print(0) p.val[i+1] <- 0 } else { next.day.fore = ugarchforecast(fit, n.ahead = 1) x = next.day.fore@forecast$seriesFor directions[i+1] <- ifelse(x[1] > 0, 1, -1) # directional prediction only forecasts[i+1] <- x[1] # actual value of forecast print(forecasts[i]) # analysis of residuals resid <- as.numeric(residuals(fit, standardize = TRUE)) ljung.box <- Box.test(resid, lag = 20, type = "Ljung-Box", fitdf = 0) p.val[i+1] <- ljung.box$p.value } } dates <- EURUSD[, 1] forecasts.ts <- xts(forecasts, dates[(window.length):length(returns)]) # create lagged series of forecasts and sign of forecast ag.forecasts <- Lag(forecasts.ts, 1) ag.direction <- ifelse(ag.forecasts > 0, 1, ifelse(ag.forecasts < 0, -1, 0)) # Create the ARIMA/GARCH returns for the directional system ag.direction.returns <- ag.direction * returns[(window.length):length(returns)] ag.direction.returns[1] <- 0 # remove NA # Create the backtests for ARIMA/GARCH and Buy & Hold ag.curve <- log( cumprod( 1 + ag.direction.returns) ) buy.hold.ts <- xts(returns[(window.length):length(returns)], dates[(window.length):length(returns)]) buy.hold.curve <- log(cumprod(1 + buy.hold.ts)) both.curves <- cbind(ag.curve, buy.hold.curve) names(both.curves) <- c("Strategy returns", "Buy and hold returns") # plot both curves together myColors <- c( "darkorange", "blue") plot(x = both.curves[,"Strategy returns"], xlab = "Time", ylab = "Cumulative Return", main = "Cumulative Returns", ylim = c(-0.25, 0.4), major.ticks= "quarters", minor.ticks = FALSE, col = "darkorange") lines(x = both.curves[,"Buy and hold returns"], col = "blue") legend(x = 'bottomleft', legend = c("Strategy", "B&H"), lty = 1, col = myColors) |

First, the directional predictions only: buy when a positive return is forecast and sell when a negative return is forecast. The results of this approach are shown below (no allowance for transaction costs):

You might have noticed that in the model fitting procedure above, I retained the actual forecast return values as well as the direction of the forecast return. I want to investigate the predictive power of the magnitude of the forecast return value. Specifically, does filtering trades when the magnitude of the forecast return is below a certain threshold improve the performance of the strategy? The code below performs this analysis for a small return threshold. For simplicity, I converted the forecast log returns to simple returns to enable manipulation of the sign of the forecast and easy implementation.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
# Test entering a trade only when prediction exceeds a threshold magnitude simp.forecasts <- exp(ag.forecasts) - 1 threshold <- 0.000025 ag.threshold <- ifelse(simp.forecasts > threshold, 1, ifelse(simp.forecasts < -threshold, -1, 0)) ag.threshold.returns <- ag.threshold * returns[(window.length):length(returns)] ag.threshold.returns[1] <- 0 # remove NA ag.threshold.curve <- log(cumprod( 1 + ag.threshold.returns)) both.curves <- cbind(ag.threshold.curve, buy.hold.curve) names(both.curves) <- c("Strategy returns", "Buy and hold returns") # plot both curves together plot(x = both.curves[,"Strategy returns"], xlab = "Time", ylab = "Cumulative Return", main = "Cumulative Returns", major.ticks= "quarters", # minor.ticks = FALSE, ylim = c(-0.2, 0.45), col = "darkorange") lines(x = both.curves[,"Buy and hold returns"], col = "blue") legend(x = 'bottomleft', legend = c("Strategy", "B&H"), lty = 1, col = myColors) |

And the results overlaid with the raw strategy:

It occurred to me that the ARIMA/GARCH model we fit on certain days may be a better or worse representation of the underlying process than other days. Perhaps filtering trades when we have less confidence in our model would improve performance. This approach requires that the statistical significance of each day’s model fit be evaluated, and a trade only entered when this significance exceeds a certain threshold. There are a number of ways this could be accomplished. Firstly, we could visually examine the correlogram of the model residuals and make a judgement on the goodness of fit on that basis. Ideally, the correlogram of the residuals would resemble a white noise process, showing no serial correlation. The correlogram of the residuals can be constructed in R as follows:

acf(fit@fit$residuals, main = 'ACF of Model Residuals')

While this correlogram suggests a good model fit, it is obviously not a great approach as it relies on subjective judgement, not to mention the availability of a human to review each day’s model. A better approach would be to examine the Ljung-Box statistics for the model fit. The Ljung-Box is a hypothesis test for evaluating whether the autocorrelations of the residuals of a fitted model differ significantly from zero. In this test, the null hypothesis is that the autocorrelation of the residuals is zero; the alternate is that the series possesses serial correlation. Rejection of the null and confirmation of the alternate would imply that the model is not a good fit, as there is unexplained structure in the residuals. The Ljung-Box statistic is calculated in R as follows:

1 2 3 4 5 6 7 |
ljung.box <- Box.test(resid, lag = 20, type = "Ljung-Box", fitdf = 0) ljung.box Box-Ljung test data: resid X-squared = 23.099, df = 20, p-value = 0.284 |

The p-value in this case provides evidence that the residuals are independent and that this particular model is a good fit. By way of explanation, the Ljung-Box test statistic (X-squared in the code output above) grows larger for increasing autocorrelation of the residuals. The p-value is the probability of obtaining a value as large or larger than the test statistic under the null hypothesis. Therefore, a high p-value in this case is evidence for independence of the residuals. Note that it applies to all lags up to the one specified in the Box.test() function.

Applying the Ljung-Box test to each day’s model fit reveals very few days where the null hypothesis of independent residuals is rejected, so extending the strategy to also filter any trades triggered by a poor model fit is unlikely to add much value:

## Conclusions and future work

The performance of the ARIMA/GARCH strategy outperforms a buy and hold strategy on the EUR/USD for the backtest period, however the performance is nothing spectacular. It seems that it is possible to improve the performance of the strategy by filtering on characteristics such as the magnitude of the prediction and the goodness of fit of the model, although the latter does not add much value in this particular example. Another filtering option could be to calculate the 95% confidence interval for each day’s forecast and only enter a trade when the sign of each limit is the same, although this would greatly reduce the number of trades actually taken.

There are many other varieties of the GARCH model, for example exponential, integrated, quadratic, threshold, structural and switching to name a few. These may or may not provide a better representation of the underlying process than the simple GARCH (1,1) model used in this example. For an exposition of these and other flavors of GARCH, see Bollerslev *et. al.* (1994).

An area of research that I have found highly interesting recently is time series forecasting through the intelligent combination of disparate models, for example by taking the average of the individual predictions of several models or seeking consensus or a majority vote on the sign of the prediction. To borrow some machine learning nomenclature, this ‘ensembling’ of models can often produce more accurate forecasts than any of the constituent models. Perhaps a useful approach would be to ensemble the predictions of the ARIMA/GARCH model presented here with a suitably trained artificial neural network or other statistical learning method. We could perhaps expect the ARIMA/GARCH model to capture any linear characteristics of the time series, while the neural network may be a good fit for the non-linear characteristics. This is all pure speculation, potentially with some backing from this paper, but an interesting research avenue nevertheless.

If you have any ideas for improving the forecast accuracy of time series models, I’d love to hear about them in the comments.

Finally, credit where credit is due: although I worked my way through numerous sources of information on financial time series modelling, I found Michael Halls-Moore’s detailed posts on the subject extremely helpful. He starts from the beginning and works through various models of increasing complexity. As stated in the main post, I also borrowed from his ARIMA + GARCH trading strategy for the S&P500 in designing the EUR/USD strategy presented here, particularly the approach to determining model parameters through iterative minimization of the Aikake Information Criterion. The ideas around filtering trades on the basis of the results of the Ljung-Box test and the absolute magnitude of the forecast value were my own (although I’m sure I’m not the first to come up with them).

Other references I found particularly useful:

Bollerslev, T. (2001). *Financial Econometrics: Past Developments and Future Challenges*, in Journal of Econometrics, Vol. 100, 41-51

Bollerslev, T., Engle, R.F. and Nelson, D.B. (1994). *GARCH Models*, in: Engle, R.F., and McFadden, D.L. (eds.) Handbook of Econometrics, Vol. 4, Elsevier, Amsterdam, 2961-3038.

Engle, R. (2002). *New Frontiers for ARCH Models*, in Journal of Applied Econometrics, Vol. 17, 425-466

Qi, M. and Zhang, G.P. (2008). *Trend Time Series Modelling and Forecasting with Neural Networks*, in IEEE Transactions on Nerual Networks, Vol. 19, No. 5, 8-8-816.

Tsay, R. (2010). *Conditional Heteroscedastic Models*, in Tsay, R. Analysis of Financial Time Series, Third Edition, Wiley, 109-174.

Here you can download the code and data used in this analysis: arima_garch

## Matt haines

February 4, 2016I was literally struggling through a bunch of ARMA ARIMA GARCH box test reading and then took a break to read your blog post. “Yes!” I shouted (in my head) when I read about you pondering all those big words and acronyms I’ve been struggling with. And then I realized you were also basing your work off Michaels writings. Damn stuff hurts my head. But I’m slowly getting it. You’re about 4 parsecs ahead of me so I’m going to have to keep an eye on your work as well. 😃 Thanks.

## Robot Master

February 4, 2016Hey Matt, thanks for the comment! I hope my article was useful for you. Yes, I learned a lot from Michael’s posts on this subject. He’s heavy on the detail and presents it in a logical way that continuously builds on the previous information. I recently purchased the rough cut of his latest book and refer to it often. Very much looking forward to the final release. In my article, I was aiming to succinctly summarise the theory and focus on some trading ideas that seem a fairly natural extension. Hopefully it was helpful!

## Beliavsky

February 5, 2016Thanks for your post. Could you say what the Sharpe ratios of the tested strategies were?

## Robot Master

February 8, 2016I didn’t calculate Sharpe ratios when I ran these strategies. You could easily do this yourself by running the script (available via the download link, along with the data I used) and using the performanceAnalytics package in R.

Pingback: Quantocracy's Daily Wrap for 02/04/2016 | Quantocracy

Pingback: Build Better Strategies! Part 2: Model-Based Systems – The Financial Hacker

Pingback: The Best Links While I Was Away | Quantocracy

Pingback: URL

Pingback: Build Better Strategies! Part 2: Model-Based Systems | Trade Signal Machine Hub

## KAFEBR

July 8, 2016Hello, I am new to time series fitting and found your article very interesting. My question is: Is there not a random value involved in the prediction of the price per definition of a GARCH series ? If so, would it not make sense to calculate the probability for the forecast to be long or short by using the ARMA value as mean and the standard deviation and maybe apply a filter then by accepting only values above a certain threshold ?

## Robot Master

August 2, 2016Hey, thanks for reading my blog. I think you are referring to the noise term in the GARCH definition? You could certainly experiment with an ARMA model – I’d love to hear about the results – but I’m not sure how that relates to the noise term in the GARCH model?

## Infinity

August 30, 2016Thanks for the tutorial. What line(s) of code would we need to account for transaction costs?

## Robot Master

August 31, 2016There’s a few ways to do it, depending on how accurate or complex a transaction cost model you want. If a fixed transaction cost model would suffice, you could simply subtract this fixed transaction cost from each of your returns. For example, if a round turn on the EUR/USD costs you 1.5 pips, you simply do

`returns < - returns - 0.00015`

Of course, in reality you would get variable spread and variable slippage depending on factors such as real-time volatility and liquidity, so this may or may not be accurate enough for your purposes.