Time Series Analysis: Fitting ARIMA/GARCH predictions profitable for FX?

Recently, I wrote about using mean-reversion time series models to analyze financial data and build trading strategies based on their predictions. Continuing our exploration of time series analysis and modelling, let’s turn our attention to the autoregressive and conditionally heteroskedastic family of models. Specifically, we’ll look into the Autoregressive Integrated Moving Average (ARIMA) and Generalized Autoregressive Conditional Heteroskedasticity (GARCH) models.

Why these two? They’re frequently cited in quantitative finance literature, and it’s time I got better acquainted—so why not bring you along for the ride?

Below is a summary of what I’ve learned about these models, how to fit them, and how their forecasts can be used to construct a simple trading strategy.

Let’s dive in.

What Are These Time Series Models?

Before we go further, it helps to clarify a few foundational ideas. I won’t repeat all the dense theory I’ve encountered—rather, here’s a high-level overview of what I’ve learned about ARIMA and GARCH models, and how they relate to their component parts:

At a basic level, fitting ARIMA and GARCH models means identifying how past values, random noise, and variance in a time series influence future values. When fitted well, these models can offer predictive insights—assuming the underlying process remains relatively stable over time.

ARIMA

An ARMA model (note: not ARIMA just yet) is a combination of two simpler models:

  • The Autoregressive (AR) model, where future values are predicted using past values of the series.
  • The Moving Average (MA) model, where past errors (or “noise”) are used as predictors.

An ARMA(p, q) model combines both elements in a linear form:

\begin{equation} \label{eq:poly}
X_{t} = a_{1}X_{t-1} + a_{2}X_{t-2} + … + a_{p}X_{t-p} + w_{t} + b_{1}w_{t-1} + b_{2}w_{t-2} + … + b_{q}w_{t-q}
\end{equation}

where wtw_twt represents white noise, and aia_iai and bib_ibi are model coefficients.

To extend this to an ARIMA(p, d, q) model, we first difference the original time series ddd times to achieve stationarity. The resulting stationary series is then modelled using ARMA.

GARCH

While ARIMA models capture the behavior of the series’ mean, GARCH models go further by modelling its variance—specifically, its conditional heteroskedasticity, or the tendency for volatility to cluster over time.

GARCH models add another layer by making the variance itself dynamic: they use past variances and past squared errors to model how volatility evolves. In essence, GARCH captures the serial dependency not just in the values of the time series, but in its volatility too.

How Do We Apply These Models?

With that context out of the way, I next fit an ARIMA/GARCH model to the EUR/USD exchange rate and use it as the basis for a directional trading system. Here’s the high-level approach:

  • Each day, the model is fit to a rolling window of past returns.
  • The fitted model forecasts the next day’s return.
  • Based on the sign of the prediction, a long or short position is entered and held for one day.
  • If the prediction does not change from the previous day, the current position is maintained.

Model Estimation

A rolling window of 1000 daily log returns is used to fit the model. For each window:

  • A brute-force grid search finds the ARIMA(p,0,q) model that minimizes the Akaike Information Criterion (AIC).
  • A GARCH(1,1) model is then fit on the residuals of the selected ARIMA model.
  • Forecasts are made one step ahead using the combined ARIMA-GARCH model.
  • If the GARCH model fails to converge, a forecast of 0 is assigned.

This approach is based on Michael Halls-Moore’s ARIMA+GARCH strategy for the S&P 500, and I’ve adapted parts of his code.

While a 1000-day window works well in this case, this is a tunable parameter. Longer windows may provide more stable parameter estimates but may lag in adapting to regime shifts. It would be worth testing model performance across various lookback window lengths.

R Code: ARIMA/GARCH Forecasting Strategy

### ARIMA/GARCH trading model
library(quantmod)
library(timeSeries)
library(rugarch)
# get data and initialize objects to hold forecasts
EURUSD <- read.csv('EURUSD.csv', header = T)
EURUSD[, 1] <- as.Date(as.character(EURUSD[, 1]), format="%d/%m/%Y")
returns <- diff(log(EURUSD$C)) ## ttr::ROC can also be used: calculates log returns by default
window.length <- 1000
forecasts.length <- length(returns) - window.length
forecasts <- vector(mode="numeric", length=forecasts.length)
directions <- vector(mode="numeric", length=forecasts.length)
p.val <- vector(mode="numeric", length=forecasts.length)
# loop through every trading day, estimate optimal model parameters from rolling window
# and predict next day's return
for (i in 0:forecasts.length) {
  roll.returns <- returns[(1+i):(window.length + i)] # create rolling window
  final.aic <- Inf
  final.order <- c(0,0,0)
  # estimate optimal ARIMA model order
  for (p in 0:5) for (q in 0:5) { # limit possible order to p,q <= 5
    if (p == 0 && q == 0) next # p and q can't both be zero
    arimaFit <- tryCatch( arima(roll.returns, order = c(p,0,q)),
                          error = function( err ) FALSE,
                          warning = function( err ) FALSE )
    if (!is.logical( arimaFit)) {
      current.aic <- AIC(arimaFit)
      if (current.aic < final.aic) { # retain order if AIC is reduced
        final.aic <- current.aic
        final.order <- c(p,0,q)
        final.arima <- arima(roll.returns, order = final.order)
      }
    }
    else next
  }
  # specify and fit the GARCH model
  spec = ugarchspec(variance.model <- list(garchOrder=c(1,1)),
                    mean.model <- list(
                      armaOrder <- c(final.order[1], final.order[3]), include.mean = T),
                    distribution.model = "sged")
  fit = tryCatch(ugarchfit(spec, roll.returns, solver = 'hybrid'), error = function(e) e, warning = function(w) w)
  # calculate next day prediction from fitted mode
  # model does not always converge - assign value of 0 to prediction and p.val in this case
  if (is(fit, "warning")) {
    forecasts[i+1] <- 0
    print(0)
    p.val[i+1] <- 0
  }
  else {
    next.day.fore = ugarchforecast(fit, n.ahead = 1)
    x = next.day.fore@forecast$seriesFor
    directions[i+1] <- ifelse(x[1] > 0, 1, -1) # directional prediction only
    forecasts[i+1] <- x[1] # actual value of forecast
    print(forecasts[i])
    # analysis of residuals
    resid <- as.numeric(residuals(fit, standardize = TRUE))
    ljung.box <- Box.test(resid, lag = 20, type = "Ljung-Box", fitdf = 0)
    p.val[i+1] <- ljung.box$p.value
  }
}
dates <- EURUSD[, 1]
forecasts.ts <- xts(forecasts, dates[(window.length):length(returns)])
# create lagged series of forecasts and sign of forecast
ag.forecasts <- Lag(forecasts.ts, 1)
ag.direction <- ifelse(ag.forecasts > 0, 1, ifelse(ag.forecasts < 0, -1, 0))
# Create the ARIMA/GARCH returns for the directional system
ag.direction.returns <- ag.direction * returns[(window.length):length(returns)]
ag.direction.returns[1] <- 0 # remove NA
# Create the backtests for ARIMA/GARCH and Buy & Hold
ag.curve <- cumsum( ag.direction.returns)
buy.hold.ts <- xts(returns[(window.length):length(returns)], dates[(window.length):length(returns)])
buy.hold.curve <- cumsum(buy.hold.ts))
both.curves <- cbind(ag.curve, buy.hold.curve)
names(both.curves) <- c("Strategy returns", "Buy and hold returns")
# plot both curves together
myColors <- c( "darkorange", "blue")
plot(x = both.curves[,"Strategy returns"], xlab = "Time", ylab = "Cumulative Return",
     main = "Cumulative Returns", ylim = c(-0.25, 0.4), major.ticks= "quarters",
     minor.ticks = FALSE, col = "darkorange")
lines(x = both.curves[,"Buy and hold returns"], col = "blue")
legend(x = 'bottomleft', legend = c("Strategy", "B&H"),
       lty = 1, col = myColors)

Signal Construction and Backtesting

You might have noticed that in the model fitting procedure above, I retained the actual forecast return values as well as the direction of the forecast return. I want to investigate the predictive power of the magnitude of the forecast return value. Specifically, does filtering trades when the magnitude of the forecast return is below a certain threshold improve the performance of the strategy? The code below performs this analysis for a small return threshold. For simplicity, I converted the forecast log returns to simple returns to enable manipulation of the sign of the forecast and easy implementation.

# Test entering a trade only when prediction exceeds a threshold magnitude
simp.forecasts <- exp(ag.forecasts) - 1
threshold <- 0.000025
ag.threshold <- ifelse(simp.forecasts > threshold, 1, ifelse(simp.forecasts < -threshold, -1, 0))
ag.threshold.returns <- ag.threshold * returns[(window.length):length(returns)]
ag.threshold.returns[1] <- 0 # remove NA
ag.threshold.curve <- cumsum(ag.threshold.returns))
both.curves <- cbind(ag.threshold.curve, buy.hold.curve)
names(both.curves) <- c("Strategy returns", "Buy and hold returns")
# plot both curves together
plot(x = both.curves[,"Strategy returns"], xlab = "Time", ylab = "Cumulative Return",
     main = "Cumulative Returns",  major.ticks= "quarters", #
     minor.ticks = FALSE, ylim = c(-0.2, 0.45), col = "darkorange")
lines(x = both.curves[,"Buy and hold returns"], col = "blue")
legend(x = 'bottomleft', legend = c("Strategy", "B&H"),
       lty = 1, col = myColors)

Results and Interpretation

In this implementation, we take directional signals only: we go long if the model predicts a positive return and short if it predicts a negative return. We do not model the magnitude of returns, and no transaction costs are included. As the final plot shows, the ARIMA/GARCH strategy is able to adapt to some extent and differentiate itself from a simple buy-and-hold approach.

1. Magnitude of Forecast Return

You’re asking: Does filtering out trades when the forecast return (converted from log to simple return) is very small improve strategy performance?

  • You implement this by setting a threshold (0.000025) and only taking positions when the absolute forecast return exceeds it.
  • You calculate:
    • simp.forecasts: simple returns from log forecasts.
    • ag.threshold: directional trade signal (1, -1, or 0 based on threshold).
    • ag.threshold.returns: the filtered strategy return series.
    • ag.threshold.curve: cumulative return of this strategy.

Then, you compare it with a buy-and-hold strategy by plotting both.

Conclusion: Filtering on magnitude acts as a noise-reduction technique, helping avoid overtrading on weak signals. This often leads to better Sharpe ratios even if total return drops slightly—less noise, fewer false positives.

2. Model Confidence via Residual Analysis (Ljung-Box Test)

Here, you’re exploring whether model goodness-of-fit can help improve trade filtering.

  • You propose using the Ljung-Box test to check for autocorrelation in residuals.
  • If the p-value is low, it means residuals are autocorrelated → model might be poor.
  • If the p-value is high, residuals are not autocorrelated → model fit is good.

Example:

ljung.box <- Box.test(resid, lag = 20, type = "Ljung-Box", fitdf = 0)
ljung.box
Box-Ljung test
data: resid
X-squared = 23.099, df = 20, p-value = 0.284
  • In your example, p = 0.284 → high p-value → good model fit.

Key Insight: While this is a sound statistical idea, your daily fits rarely reject the null. So, the Ljung-Box filter would rarely eliminate any trades. Therefore, it adds little incremental benefit in practice.

Suggestions Going Forward

If you’re looking for other ways to evaluate confidence or filter trades:

  • Forecast variance: If you’re using GARCH, forecast variance can serve as a proxy for model certainty.
  • Prediction intervals: Narrower intervals = higher confidence.
  • Ensemble agreement: If using multiple models, only trade when they agree.

Time Series Analysis: Conclusions and Future Work

The ARIMA/GARCH-based trading strategy applied to EUR/USD outperformed the buy-and-hold approach during the backtest period. However, the excess performance was modest. The analysis suggests that filtering trades based on the magnitude of the forecast improves strategy outcomes, while filtering based on model fit diagnostics (e.g. the Ljung-Box test) adds limited value in this dataset.

Additional filtering ideas — such as acting only when both bounds of the 95% forecast confidence interval share the same sign — may improve forecast reliability but significantly reduce the number of trades taken, which can impact overall profitability.

There is considerable scope for model enhancement. Alternative GARCH variants — including exponential, threshold, integrated, structural, and regime-switching models — may better capture volatility dynamics than the standard GARCH(1,1) used here. For a broader view, see Bollerslev et al. (1994).

A promising direction involves model ensembling — combining the predictions of diverse models to improve forecast accuracy. For instance, aggregating signals from ARIMA/GARCH and machine learning models such as artificial neural networks could allow the ARIMA/GARCH component to capture linear structure while the neural net models non-linearities. Though speculative, this hybrid approach aligns with findings in the literature and merits further exploration.

If you have ideas for improving the forecast accuracy of time series models — particularly through hybrid approaches or regime-switching filters — I’d be glad to hear them.

Acknowledgements

This work was inspired by multiple sources on financial time series modeling. In particular, I found Michael Halls-Moore’s posts indispensable for their clarity and progression through increasingly complex models. The ARIMA+GARCH strategy design and the AIC-based parameter selection approach were adapted from his work. The concepts of filtering trades by Ljung-Box p-values and forecast magnitude were original contributions to this adaptation.

Found this post useful? Chances are you’ll love our exploration of the Hurst Exponent.

Further Reading and References:

  • Bollerslev, T. (2001). Financial Econometrics: Past Developments and Future Challenges, Journal of Econometrics, 100, 41–51.
  • Bollerslev, T., Engle, R.F., & Nelson, D.B. (1994). GARCH Models, in Engle & McFadden (Eds.), Handbook of Econometrics, Vol. 4, Elsevier.
  • Engle, R. (2002). New Frontiers for ARCH Models, Journal of Applied Econometrics, 17, 425–466.
  • Qi, M., & Zhang, G.P. (2008). Trend Time Series Modeling and Forecasting with Neural Networks, IEEE Transactions on Neural Networks, 19(5), 808–816.
  • Tsay, R. (2010). Conditional Heteroscedastic Models, in Analysis of Financial Time Series (3rd Ed.), Wiley, pp. 109–174.

Download the full code and dataset used in this analysis: arima_garch

22 thoughts on “Time Series Analysis: Fitting ARIMA/GARCH predictions profitable for FX?”

  1. I was literally struggling through a bunch of ARMA ARIMA GARCH box test reading and then took a break to read your blog post. “Yes!” I shouted (in my head) when I read about you pondering all those big words and acronyms I’ve been struggling with. And then I realized you were also basing your work off Michaels writings. Damn stuff hurts my head. But I’m slowly getting it. You’re about 4 parsecs ahead of me so I’m going to have to keep an eye on your work as well. ? Thanks.

    Reply
    • Hey Matt, thanks for the comment! I hope my article was useful for you. Yes, I learned a lot from Michael’s posts on this subject. He’s heavy on the detail and presents it in a logical way that continuously builds on the previous information. I recently purchased the rough cut of his latest book and refer to it often. Very much looking forward to the final release. In my article, I was aiming to succinctly summarise the theory and focus on some trading ideas that seem a fairly natural extension. Hopefully it was helpful!

      Reply
    • I didn’t calculate Sharpe ratios when I ran these strategies. You could easily do this yourself by running the script (available via the download link, along with the data I used) and using the performanceAnalytics package in R.

      Reply
  2. Pingback: URL
  3. Hello, I am new to time series fitting and found your article very interesting. My question is: Is there not a random value involved in the prediction of the price per definition of a GARCH series ? If so, would it not make sense to calculate the probability for the forecast to be long or short by using the ARMA value as mean and the standard deviation and maybe apply a filter then by accepting only values above a certain threshold ?

    Reply
    • Hey, thanks for reading my blog. I think you are referring to the noise term in the GARCH definition? You could certainly experiment with an ARMA model – I’d love to hear about the results – but I’m not sure how that relates to the noise term in the GARCH model?

      Reply
    • There’s a few ways to do it, depending on how accurate or complex a transaction cost model you want. If a fixed transaction cost model would suffice, you could simply subtract this fixed transaction cost from each of your returns. For example, if a round turn on the EUR/USD costs you 1.5 pips, you simply do returns <- returns - 0.00015
      Of course, in reality you would get variable spread and variable slippage depending on factors such as real-time volatility and liquidity, so this may or may not be accurate enough for your purposes.

      Reply
  4. Hi Polar
    In this example, I first fit an ARMA model of order (p,q) where (p,q) ∈ {0,1,2,3,4,5} and (p,q) are chosen such that they minimzie the Aikake Information Criterion. Then we fit a model using GARCH (1,1) for the variance and ARMA(p,q) for the mean. A new model is constructed for each period in the simulation using the previous 1,000 periods. Each model is used once (to predict the next period’s return) and then discarded. So to answer your question, the approach used here doesn’t look for parameters that ‘work’ generally, rather we find the best parameters from our lookback window and assume they will hold for the next period.
    This assumption may or may not be valid. And if this assumption is valid, while time series modeling is super interesting from a theoretical perspective, it may or may not be of practical use in a trading strategy. At the very least, there are certainly other considerations beyond the optimal model parameters. For example, from my own experience building trading models for the forex markets, I can share that the choice of sampling time (that is, the time you choose as the open/close of your daily bars) is of critical significance in the success or otherwise of the model.
    Your comment is actually very timely! I recently built a GARCH backtesting framework for a client that enables efficient experimentation with these and other parameters, such as exit conditions and capital allocation. An efficient experimentation framework is really important for effective and practical research. If you are interested in something similar, ping me at kris[at]robotwealth.com
    Hope that helps

    Reply
  5. Hello
    Regarding your code how did you handle negative values for the returns? Because you are using the log for the ag.curve and with negative values you will have NaN
    Thanks
    Regis

    Reply
    • I honestly don’t remember…I’d have to re-run the code. But yes, you are correct – it makes more sense to use log returns which are additive. Multiplying simple returns is problematic in creating an equity curve when you have neutral positions. I’ve updated the post.

      Reply
  6. I have studied Time Series Econometrics as part of my PhD specialization and I have found that simply drawing trends and patterns on a chart is a far superior approach than the most advanced statistical techniques such as Markov Switching Multivariate GARCH or Multivariate Autoregressive State Space Models.  I feel kind of uneasy that all my efforts and sleepless nights were for nothing. But at least I know what the best approaches are.

    Reply
    • Great to hear that you’ve found something that works for you. I agree, sometimes simpler can be better.

      Reply

Leave a Comment