Time Series Analysis: Fitting ARIMA/GARCH predictions profitable for FX?

Recently, I wrote about fitting mean-reversion time series analysis models to financial data and using the models’ predictions as the basis of a trading strategy. Continuing our exploration of time series modelling, let’s research the autoregressive and conditionally heteroskedastic family of time series models. In particular, we want to understand the autoregressive integrated moving average (ARIMA) and generalized autoregressive conditional heteroskedasticity (GARCH) models. Why? Well, they are both referenced frequently in the quantitative finance literature, and it’s about time I got up to speed so why not join me!

What follows is a summary of what I learned about these models, a general fitting procedure and a simple trading strategy based on the forecasts of a fitted model.

Let’s get started!

What are these time series analysis models?

Several definitions are necessary to set the scene. I don’t want to reproduce the theory I’ve been wading through; rather here is my very high-level summary of what I’ve learned about time series modelling, in particular, the ARIMA and GARCH models and how they are related to their component models:

At its most basic level, fitting ARIMA and GARCH models is an exercise in uncovering the way in which observations, noise and variance in a time series affect subsequent values of the time series.  Such a model, properly fitted, would have some predictive utility, assuming of course that the model remained a good fit for the underlying process for some time in the future.


An ARMA model (note: no “I”) is a linear combination of an autoregressive (AR) model and moving average (MA) model. An AR model is one whose predictors are the previous values of the series. An MA model is structurally similar to an AR model, except the predictors are the noise terms.  An autoregressive moving average model of order p,q  – ARMA(p,q) –  is a linear combination of the two and can be defined as:

\begin{equation} \label{eq:poly}
X_{t} = a_{1}X_{t-1} + a_{2}X_{t-2} + … + a_{p}X_{t-p} + w_{t} + b_{1}w_{t-1} + b_{2}w_{t-2} + … + b_{q}w_{t-q}

where $w_{t}$ is white noise and $a_{i}$ and $ b_{i}$ are coefficients of the model.

An ARIMA(p,d,q) model is simply an ARMA(p,q) model differenced ‘d’ times – or integrated (I)-  to produce a stationary series.


Finally, a GARCH model attempts to also explain the heteroskedastic behaviour of a time series (that is, the characteristic of volatility clustering) as well as the serial influences of the previous values of the series (explained by the AR component) and the noise terms (explained by the MA component).  A GARCH model uses an autoregressive process for the variance itself, that is, it uses past values of the variance to account for changes to the variance over time.

So how do we apply these models?

With that context setting out of the way, I next fit an ARIMA/GARCH model to the EUR/USD exchange rate and use it as the basis of a trading system. The model’s parameters for each day are estimated using a fitting procedure, that model is then used to predict the next day’s return and a position is entered accordingly and held for one trading day. If the prediction is the same as for the previous day, the existing position is maintained.

A rolling window of log returns is used to fit an optimal ARIMA/GARCH model at the close of each trading day. The fitting procedure is based on a brute force search of the parameters that minimize the Aikake Information Criterion, but other methods can be used. For example, we could choose parameters that minimize the Bayesian Information Criterion, which may help to reduce overfitting by penalizing complex models (that is, models with a large number of parameters). This fitting procedure was inspired by Michael Halls-Moore’s post about an ARIMA+GARCH trading strategy for the S&P500, and I borrowed some of his code.

I chose to use a rolling window of 1000 days to fit the model, but this is a parameter for optimization. There is a case for using as much data as possible in the rolling window, but this may fail to capture the evolving model parameters quickly enough to adapt to a changing market. I won’t explore this too much here, but it would be interesting to investigate the strategy’s performance as a function of the lookback window.

Here’s the code:

### ARIMA/GARCH trading model
# get data and initialize objects to hold forecasts
EURUSD <- read.csv('EURUSD.csv', header = T)
EURUSD[, 1] <- as.Date(as.character(EURUSD[, 1]), format="%d/%m/%Y")
returns <- diff(log(EURUSD$C)) ## ttr::ROC can also be used: calculates log returns by default
window.length <- 1000
forecasts.length <- length(returns) - window.length
forecasts <- vector(mode="numeric", length=forecasts.length)
directions <- vector(mode="numeric", length=forecasts.length)
p.val <- vector(mode="numeric", length=forecasts.length)
# loop through every trading day, estimate optimal model parameters from rolling window
# and predict next day's return
for (i in 0:forecasts.length) {
  roll.returns <- returns[(1+i):(window.length + i)] # create rolling window
  final.aic <- Inf
  final.order <- c(0,0,0)
  # estimate optimal ARIMA model order
  for (p in 0:5) for (q in 0:5) { # limit possible order to p,q <= 5
    if (p == 0 && q == 0) next # p and q can't both be zero
    arimaFit <- tryCatch( arima(roll.returns, order = c(p,0,q)),
                          error = function( err ) FALSE,
                          warning = function( err ) FALSE )
    if (!is.logical( arimaFit)) {
      current.aic <- AIC(arimaFit)
      if (current.aic < final.aic) { # retain order if AIC is reduced
        final.aic <- current.aic
        final.order <- c(p,0,q)
        final.arima <- arima(roll.returns, order = final.order)
    else next
  # specify and fit the GARCH model
  spec = ugarchspec(variance.model <- list(garchOrder=c(1,1)),
                    mean.model <- list(
                      armaOrder <- c(final.order[1], final.order[3]), include.mean = T),
                    distribution.model = "sged")
  fit = tryCatch(ugarchfit(spec, roll.returns, solver = 'hybrid'), error = function(e) e, warning = function(w) w)
  # calculate next day prediction from fitted mode
  # model does not always converge - assign value of 0 to prediction and p.val in this case
  if (is(fit, "warning")) {
    forecasts[i+1] <- 0
    p.val[i+1] <- 0
  else {
    next.day.fore = ugarchforecast(fit, n.ahead = 1)
    x = next.day.fore@forecast$seriesFor
    directions[i+1] <- ifelse(x[1] > 0, 1, -1) # directional prediction only
    forecasts[i+1] <- x[1] # actual value of forecast
    # analysis of residuals
    resid <- as.numeric(residuals(fit, standardize = TRUE))
    ljung.box <- Box.test(resid, lag = 20, type = "Ljung-Box", fitdf = 0)
    p.val[i+1] <- ljung.box$p.value
dates <- EURUSD[, 1]
forecasts.ts <- xts(forecasts, dates[(window.length):length(returns)])
# create lagged series of forecasts and sign of forecast
ag.forecasts <- Lag(forecasts.ts, 1)
ag.direction <- ifelse(ag.forecasts > 0, 1, ifelse(ag.forecasts < 0, -1, 0))
# Create the ARIMA/GARCH returns for the directional system
ag.direction.returns <- ag.direction * returns[(window.length):length(returns)]
ag.direction.returns[1] <- 0 # remove NA
# Create the backtests for ARIMA/GARCH and Buy & Hold
ag.curve <- cumsum( ag.direction.returns)
buy.hold.ts <- xts(returns[(window.length):length(returns)], dates[(window.length):length(returns)])
buy.hold.curve <- cumsum(buy.hold.ts))
both.curves <- cbind(ag.curve, buy.hold.curve)
names(both.curves) <- c("Strategy returns", "Buy and hold returns")
# plot both curves together
myColors <- c( "darkorange", "blue")
plot(x = both.curves[,"Strategy returns"], xlab = "Time", ylab = "Cumulative Return",
     main = "Cumulative Returns", ylim = c(-0.25, 0.4), major.ticks= "quarters",
     minor.ticks = FALSE, col = "darkorange")
lines(x = both.curves[,"Buy and hold returns"], col = "blue")
legend(x = 'bottomleft', legend = c("Strategy", "B&H"),
       lty = 1, col = myColors)

First, the directional predictions only: buy when a positive return is forecast and sell when a negative return is forecast. The results of this approach are shown below (no allowance for transaction costs):
GARCH Returns EURUSD - time series analysis
You might have noticed that in the model fitting procedure above, I retained the actual forecast return values as well as the direction of the forecast return. I want to investigate the predictive power of the magnitude of the forecast return value. Specifically, does filtering trades when the magnitude of the forecast return is below a certain threshold improve the performance of the strategy? The code below performs this analysis for a small return threshold. For simplicity, I converted the forecast log returns to simple returns to enable manipulation of the sign of the forecast and easy implementation.

# Test entering a trade only when prediction exceeds a threshold magnitude
simp.forecasts <- exp(ag.forecasts) - 1
threshold <- 0.000025
ag.threshold <- ifelse(simp.forecasts > threshold, 1, ifelse(simp.forecasts < -threshold, -1, 0))
ag.threshold.returns <- ag.threshold * returns[(window.length):length(returns)]
ag.threshold.returns[1] <- 0 # remove NA
ag.threshold.curve <- cumsum(ag.threshold.returns))
both.curves <- cbind(ag.threshold.curve, buy.hold.curve)
names(both.curves) <- c("Strategy returns", "Buy and hold returns")
# plot both curves together
plot(x = both.curves[,"Strategy returns"], xlab = "Time", ylab = "Cumulative Return",
     main = "Cumulative Returns",  major.ticks= "quarters", #
     minor.ticks = FALSE, ylim = c(-0.2, 0.45), col = "darkorange")
lines(x = both.curves[,"Buy and hold returns"], col = "blue")
legend(x = 'bottomleft', legend = c("Strategy", "B&H"),
       lty = 1, col = myColors)

And the results overlaid with the raw strategy:
Filtered on magnitude of forecast
It occurred to me that the ARIMA/GARCH model we fit on certain days may be a better or worse representation of the underlying process than other days. Perhaps filtering trades when we have less confidence in our model would improve performance. This approach requires that the statistical significance of each day’s model fit be evaluated, and a trade only entered when this significance exceeds a certain threshold. There are a number of ways this could be accomplished. Firstly, we could visually examine the correlogram of the model residuals and make a judgement on the goodness of fit on that basis. Ideally, the correlogram of the residuals would resemble a white noise process, showing no serial correlation. The correlogram of the residuals can be constructed in R as follows:
acf(fit@fit$residuals, main = ‘ACF of Model Residuals’)
ACF of Model Residuals - time series analysis
While this correlogram suggests a good model fit, it is obviously not a great approach as it relies on subjective judgement, not to mention the availability of a human to review each day’s model. A better approach would be to examine the Ljung-Box statistics for the model fit. The Ljung-Box is a hypothesis test for evaluating whether the autocorrelations of the residuals of a fitted model differ significantly from zero. In this test, the null hypothesis is that the autocorrelation of the residuals is zero; the alternate is that our time series analysis possesses serial correlation. Rejection of the null and confirmation of the alternate would imply that the model is not a good fit, as there is unexplained structure in the residuals. The Ljung-Box statistic is calculated in R as follows:

ljung.box <- Box.test(resid, lag = 20, type = "Ljung-Box", fitdf = 0)
Box-Ljung test
data: resid
X-squared = 23.099, df = 20, p-value = 0.284

The p-value in this case provides evidence that the residuals are independent and that this particular model is a good fit. By way of explanation, the Ljung-Box test statistic (X-squared in the code output above) grows larger for increasing autocorrelation of the residuals. The p-value is the probability of obtaining a value as large or larger than the test statistic under the null hypothesis. Therefore, a high p-value, in this case, is evidence for independence of the residuals. Note that it applies to all lags up to the one specified in the Box.test()  function.

Applying the Ljung-Box test to each day’s model fit reveals very few days where the null hypothesis of independent residuals is rejected, so extending the strategy to also filter any trades triggered by a poor model fit is unlikely to add much value:
2 filters - time series analysis

Time series analysis conclusions and future work

The performance of the ARIMA/GARCH strategy outperforms a buy and hold strategy on the EUR/USD for the backtest period, however, the performance is nothing spectacular. It seems that it is possible to improve the performance of the strategy by filtering on characteristics such as the magnitude of the prediction and the goodness of fit of the model, although the latter does not add much value in this particular example. Another filtering option could be to calculate the 95% confidence interval for each day’s forecast and only enter a trade when the sign of each limit is the same, although this would greatly reduce the number of trades actually taken.

There are many other varieties of the GARCH model, for example exponential, integrated, quadratic, threshold, structural and switching to name a few. These may or may not provide a better representation of the underlying process than the simple GARCH (1,1) model used in this example. For an exposition of these and other flavors of GARCH, see Bollerslev et. al. (1994).

An area of research that I have found highly interesting recently is forecasting with time series analysis through the intelligent combination of disparate models. For example, by taking the average of the individual predictions of several models or seeking consensus or a majority vote on the sign of the prediction. To borrow some machine learning nomenclature, this ‘ensembling’ of models can often produce more accurate forecasts than any of the constituent models. Perhaps a useful approach would be to ensemble the predictions of the ARIMA/GARCH model presented here with a suitably trained artificial neural network or other statistical learning method. We could perhaps expect the ARIMA/GARCH model to capture any linear characteristics of the time series, while the neural network may be a good fit for the non-linear characteristics. This is all pure speculation, potentially with some backing from this paper, but an interesting research avenue nevertheless.

If you have any ideas for improving the forecast accuracy of time series analysis models, I’d love to hear about them in the comments.
Finally, credit where credit is due: although I worked my way through numerous sources of information on financial time series modelling, I found Michael Halls-Moore’s detailed posts on the subject extremely helpful. He starts from the beginning and works through various models of increasing complexity. As stated in the main post, I also borrowed from his ARIMA + GARCH trading strategy for the S&P500 in designing the EUR/USD strategy presented here, particularly the approach to determining model parameters through iterative minimization of the Aikake Information Criterion. The ideas around filtering trades on the basis of the results of the Ljung-Box test and the absolute magnitude of the forecast value were my own (although I’m sure I’m not the first to come up with them).

Found this post useful? Chances are you’ll love our exploration of the Hurst Exponent.

Other references I found particularly useful:

Bollerslev, T. (2001). Financial Econometrics: Past Developments and Future Challenges, in Journal of Econometrics, Vol. 100, 41-51
Bollerslev, T., Engle, R.F. and Nelson, D.B. (1994). GARCH Models, in: Engle, R.F., and McFadden, D.L. (eds.) Handbook of Econometrics, Vol. 4, Elsevier, Amsterdam, 2961-3038.
Engle, R. (2002). New Frontiers for ARCH Models, in Journal of Applied Econometrics, Vol. 17, 425-466
Qi, M. and Zhang, G.P. (2008). Trend Time Series Modelling and Forecasting with Neural Networks, in IEEE Transactions on Neural Networks, Vol. 19, No. 5, 8-8-816.
Tsay, R. (2010). Conditional Heteroscedastic Models, in Tsay, R. Analysis of Financial Time Series, Third Edition, Wiley, 109-174.
Here you can download the code and data used in this analysis: arima_garch

22 thoughts on “Time Series Analysis: Fitting ARIMA/GARCH predictions profitable for FX?”

  1. I was literally struggling through a bunch of ARMA ARIMA GARCH box test reading and then took a break to read your blog post. “Yes!” I shouted (in my head) when I read about you pondering all those big words and acronyms I’ve been struggling with. And then I realized you were also basing your work off Michaels writings. Damn stuff hurts my head. But I’m slowly getting it. You’re about 4 parsecs ahead of me so I’m going to have to keep an eye on your work as well. ? Thanks.

    • Hey Matt, thanks for the comment! I hope my article was useful for you. Yes, I learned a lot from Michael’s posts on this subject. He’s heavy on the detail and presents it in a logical way that continuously builds on the previous information. I recently purchased the rough cut of his latest book and refer to it often. Very much looking forward to the final release. In my article, I was aiming to succinctly summarise the theory and focus on some trading ideas that seem a fairly natural extension. Hopefully it was helpful!

    • I didn’t calculate Sharpe ratios when I ran these strategies. You could easily do this yourself by running the script (available via the download link, along with the data I used) and using the performanceAnalytics package in R.

  2. Pingback: URL
  3. Hello, I am new to time series fitting and found your article very interesting. My question is: Is there not a random value involved in the prediction of the price per definition of a GARCH series ? If so, would it not make sense to calculate the probability for the forecast to be long or short by using the ARMA value as mean and the standard deviation and maybe apply a filter then by accepting only values above a certain threshold ?

    • Hey, thanks for reading my blog. I think you are referring to the noise term in the GARCH definition? You could certainly experiment with an ARMA model – I’d love to hear about the results – but I’m not sure how that relates to the noise term in the GARCH model?

    • There’s a few ways to do it, depending on how accurate or complex a transaction cost model you want. If a fixed transaction cost model would suffice, you could simply subtract this fixed transaction cost from each of your returns. For example, if a round turn on the EUR/USD costs you 1.5 pips, you simply do returns <- returns - 0.00015
      Of course, in reality you would get variable spread and variable slippage depending on factors such as real-time volatility and liquidity, so this may or may not be accurate enough for your purposes.

  4. Hi Polar
    In this example, I first fit an ARMA model of order (p,q) where (p,q) ∈ {0,1,2,3,4,5} and (p,q) are chosen such that they minimzie the Aikake Information Criterion. Then we fit a model using GARCH (1,1) for the variance and ARMA(p,q) for the mean. A new model is constructed for each period in the simulation using the previous 1,000 periods. Each model is used once (to predict the next period’s return) and then discarded. So to answer your question, the approach used here doesn’t look for parameters that ‘work’ generally, rather we find the best parameters from our lookback window and assume they will hold for the next period.
    This assumption may or may not be valid. And if this assumption is valid, while time series modeling is super interesting from a theoretical perspective, it may or may not be of practical use in a trading strategy. At the very least, there are certainly other considerations beyond the optimal model parameters. For example, from my own experience building trading models for the forex markets, I can share that the choice of sampling time (that is, the time you choose as the open/close of your daily bars) is of critical significance in the success or otherwise of the model.
    Your comment is actually very timely! I recently built a GARCH backtesting framework for a client that enables efficient experimentation with these and other parameters, such as exit conditions and capital allocation. An efficient experimentation framework is really important for effective and practical research. If you are interested in something similar, ping me at kris[at]robotwealth.com
    Hope that helps

  5. Hello
    Regarding your code how did you handle negative values for the returns? Because you are using the log for the ag.curve and with negative values you will have NaN

    • I honestly don’t remember…I’d have to re-run the code. But yes, you are correct – it makes more sense to use log returns which are additive. Multiplying simple returns is problematic in creating an equity curve when you have neutral positions. I’ve updated the post.

  6. I have studied Time Series Econometrics as part of my PhD specialization and I have found that simply drawing trends and patterns on a chart is a far superior approach than the most advanced statistical techniques such as Markov Switching Multivariate GARCH or Multivariate Autoregressive State Space Models.  I feel kind of uneasy that all my efforts and sleepless nights were for nothing. But at least I know what the best approaches are.

    • Great to hear that you’ve found something that works for you. I agree, sometimes simpler can be better.


Leave a Comment