Recently, I wrote about using mean-reversion time series models to analyze financial data and build trading strategies based on their predictions. Continuing our exploration of time series analysis and modelling, let’s turn our attention to the autoregressive and conditionally heteroskedastic family of models. Specifically, we’ll look into the Autoregressive Integrated Moving Average (ARIMA) and Generalized Autoregressive Conditional Heteroskedasticity (GARCH) models.
Why these two? They’re frequently cited in quantitative finance literature, and it’s time I got better acquainted—so why not bring you along for the ride?
Below is a summary of what I’ve learned about these models, how to fit them, and how their forecasts can be used to construct a simple trading strategy.
Let’s dive in.
What Are These Time Series Models?
Before we go further, it helps to clarify a few foundational ideas. I won’t repeat all the dense theory I’ve encountered—rather, here’s a high-level overview of what I’ve learned about ARIMA and GARCH models, and how they relate to their component parts:
At a basic level, fitting ARIMA and GARCH models means identifying how past values, random noise, and variance in a time series influence future values. When fitted well, these models can offer predictive insights—assuming the underlying process remains relatively stable over time.
ARIMA
An ARMA model (note: not ARIMA just yet) is a combination of two simpler models:
- The Autoregressive (AR) model, where future values are predicted using past values of the series.
- The Moving Average (MA) model, where past errors (or “noise”) are used as predictors.
An ARMA(p, q) model combines both elements in a linear form:
\begin{equation} \label{eq:poly}
X_{t} = a_{1}X_{t-1} + a_{2}X_{t-2} + … + a_{p}X_{t-p} + w_{t} + b_{1}w_{t-1} + b_{2}w_{t-2} + … + b_{q}w_{t-q}
\end{equation}
where wtw_twt
To extend this to an ARIMA(p, d, q) model, we first difference the original time series ddd times to achieve stationarity. The resulting stationary series is then modelled using ARMA.
GARCH
While ARIMA models capture the behavior of the series’ mean, GARCH models go further by modelling its variance—specifically, its conditional heteroskedasticity, or the tendency for volatility to cluster over time.
GARCH models add another layer by making the variance itself dynamic: they use past variances and past squared errors to model how volatility evolves. In essence, GARCH captures the serial dependency not just in the values of the time series, but in its volatility too.
How Do We Apply These Models?
With that context out of the way, I next fit an ARIMA/GARCH model to the EUR/USD exchange rate and use it as the basis for a directional trading system. Here’s the high-level approach:
- Each day, the model is fit to a rolling window of past returns.
- The fitted model forecasts the next day’s return.
- Based on the sign of the prediction, a long or short position is entered and held for one day.
- If the prediction does not change from the previous day, the current position is maintained.
Model Estimation
A rolling window of 1000 daily log returns is used to fit the model. For each window:
- A brute-force grid search finds the ARIMA(p,0,q) model that minimizes the Akaike Information Criterion (AIC).
- A GARCH(1,1) model is then fit on the residuals of the selected ARIMA model.
- Forecasts are made one step ahead using the combined ARIMA-GARCH model.
- If the GARCH model fails to converge, a forecast of 0 is assigned.
This approach is based on Michael Halls-Moore’s ARIMA+GARCH strategy for the S&P 500, and I’ve adapted parts of his code.
While a 1000-day window works well in this case, this is a tunable parameter. Longer windows may provide more stable parameter estimates but may lag in adapting to regime shifts. It would be worth testing model performance across various lookback window lengths.
R Code: ARIMA/GARCH Forecasting Strategy
### ARIMA/GARCH trading model
library(quantmod)
library(timeSeries)
library(rugarch)
# get data and initialize objects to hold forecasts
EURUSD <- read.csv('EURUSD.csv', header = T)
EURUSD[, 1] <- as.Date(as.character(EURUSD[, 1]), format="%d/%m/%Y")
returns <- diff(log(EURUSD$C)) ## ttr::ROC can also be used: calculates log returns by default
window.length <- 1000
forecasts.length <- length(returns) - window.length
forecasts <- vector(mode="numeric", length=forecasts.length)
directions <- vector(mode="numeric", length=forecasts.length)
p.val <- vector(mode="numeric", length=forecasts.length)
# loop through every trading day, estimate optimal model parameters from rolling window
# and predict next day's return
for (i in 0:forecasts.length) {
roll.returns <- returns[(1+i):(window.length + i)] # create rolling window
final.aic <- Inf
final.order <- c(0,0,0)
# estimate optimal ARIMA model order
for (p in 0:5) for (q in 0:5) { # limit possible order to p,q <= 5
if (p == 0 && q == 0) next # p and q can't both be zero
arimaFit <- tryCatch( arima(roll.returns, order = c(p,0,q)),
error = function( err ) FALSE,
warning = function( err ) FALSE )
if (!is.logical( arimaFit)) {
current.aic <- AIC(arimaFit)
if (current.aic < final.aic) { # retain order if AIC is reduced
final.aic <- current.aic
final.order <- c(p,0,q)
final.arima <- arima(roll.returns, order = final.order)
}
}
else next
}
# specify and fit the GARCH model
spec = ugarchspec(variance.model <- list(garchOrder=c(1,1)),
mean.model <- list(
armaOrder <- c(final.order[1], final.order[3]), include.mean = T),
distribution.model = "sged")
fit = tryCatch(ugarchfit(spec, roll.returns, solver = 'hybrid'), error = function(e) e, warning = function(w) w)
# calculate next day prediction from fitted mode
# model does not always converge - assign value of 0 to prediction and p.val in this case
if (is(fit, "warning")) {
forecasts[i+1] <- 0
print(0)
p.val[i+1] <- 0
}
else {
next.day.fore = ugarchforecast(fit, n.ahead = 1)
x = next.day.fore@forecast$seriesFor
directions[i+1] <- ifelse(x[1] > 0, 1, -1) # directional prediction only
forecasts[i+1] <- x[1] # actual value of forecast
print(forecasts[i])
# analysis of residuals
resid <- as.numeric(residuals(fit, standardize = TRUE))
ljung.box <- Box.test(resid, lag = 20, type = "Ljung-Box", fitdf = 0)
p.val[i+1] <- ljung.box$p.value
}
}
dates <- EURUSD[, 1]
forecasts.ts <- xts(forecasts, dates[(window.length):length(returns)])
# create lagged series of forecasts and sign of forecast
ag.forecasts <- Lag(forecasts.ts, 1)
ag.direction <- ifelse(ag.forecasts > 0, 1, ifelse(ag.forecasts < 0, -1, 0))
# Create the ARIMA/GARCH returns for the directional system
ag.direction.returns <- ag.direction * returns[(window.length):length(returns)]
ag.direction.returns[1] <- 0 # remove NA
# Create the backtests for ARIMA/GARCH and Buy & Hold
ag.curve <- cumsum( ag.direction.returns)
buy.hold.ts <- xts(returns[(window.length):length(returns)], dates[(window.length):length(returns)])
buy.hold.curve <- cumsum(buy.hold.ts))
both.curves <- cbind(ag.curve, buy.hold.curve)
names(both.curves) <- c("Strategy returns", "Buy and hold returns")
# plot both curves together
myColors <- c( "darkorange", "blue")
plot(x = both.curves[,"Strategy returns"], xlab = "Time", ylab = "Cumulative Return",
main = "Cumulative Returns", ylim = c(-0.25, 0.4), major.ticks= "quarters",
minor.ticks = FALSE, col = "darkorange")
lines(x = both.curves[,"Buy and hold returns"], col = "blue")
legend(x = 'bottomleft', legend = c("Strategy", "B&H"),
lty = 1, col = myColors)
Signal Construction and Backtesting
You might have noticed that in the model fitting procedure above, I retained the actual forecast return values as well as the direction of the forecast return. I want to investigate the predictive power of the magnitude of the forecast return value. Specifically, does filtering trades when the magnitude of the forecast return is below a certain threshold improve the performance of the strategy? The code below performs this analysis for a small return threshold. For simplicity, I converted the forecast log returns to simple returns to enable manipulation of the sign of the forecast and easy implementation.
# Test entering a trade only when prediction exceeds a threshold magnitude
simp.forecasts <- exp(ag.forecasts) - 1
threshold <- 0.000025
ag.threshold <- ifelse(simp.forecasts > threshold, 1, ifelse(simp.forecasts < -threshold, -1, 0))
ag.threshold.returns <- ag.threshold * returns[(window.length):length(returns)]
ag.threshold.returns[1] <- 0 # remove NA
ag.threshold.curve <- cumsum(ag.threshold.returns))
both.curves <- cbind(ag.threshold.curve, buy.hold.curve)
names(both.curves) <- c("Strategy returns", "Buy and hold returns")
# plot both curves together
plot(x = both.curves[,"Strategy returns"], xlab = "Time", ylab = "Cumulative Return",
main = "Cumulative Returns", major.ticks= "quarters", #
minor.ticks = FALSE, ylim = c(-0.2, 0.45), col = "darkorange")
lines(x = both.curves[,"Buy and hold returns"], col = "blue")
legend(x = 'bottomleft', legend = c("Strategy", "B&H"),
lty = 1, col = myColors)
Results and Interpretation
In this implementation, we take directional signals only: we go long if the model predicts a positive return and short if it predicts a negative return. We do not model the magnitude of returns, and no transaction costs are included. As the final plot shows, the ARIMA/GARCH strategy is able to adapt to some extent and differentiate itself from a simple buy-and-hold approach.
1. Magnitude of Forecast Return
You’re asking: Does filtering out trades when the forecast return (converted from log to simple return) is very small improve strategy performance?
- You implement this by setting a threshold (0.000025) and only taking positions when the absolute forecast return exceeds it.
- You calculate:
simp.forecasts
: simple returns from log forecasts.ag.threshold
: directional trade signal (1, -1, or 0 based on threshold).ag.threshold.returns
: the filtered strategy return series.ag.threshold.curve
: cumulative return of this strategy.
Then, you compare it with a buy-and-hold strategy by plotting both.
Conclusion: Filtering on magnitude acts as a noise-reduction technique, helping avoid overtrading on weak signals. This often leads to better Sharpe ratios even if total return drops slightly—less noise, fewer false positives.
2. Model Confidence via Residual Analysis (Ljung-Box Test)
Here, you’re exploring whether model goodness-of-fit can help improve trade filtering.
- You propose using the Ljung-Box test to check for autocorrelation in residuals.
- If the p-value is low, it means residuals are autocorrelated → model might be poor.
- If the p-value is high, residuals are not autocorrelated → model fit is good.
Example:
ljung.box <- Box.test(resid, lag = 20, type = "Ljung-Box", fitdf = 0)
ljung.box
Box-Ljung test
data: resid
X-squared = 23.099, df = 20, p-value = 0.284
- In your example,
p = 0.284
→ high p-value → good model fit.
Key Insight: While this is a sound statistical idea, your daily fits rarely reject the null. So, the Ljung-Box filter would rarely eliminate any trades. Therefore, it adds little incremental benefit in practice.
Suggestions Going Forward
If you’re looking for other ways to evaluate confidence or filter trades:
- Forecast variance: If you’re using GARCH, forecast variance can serve as a proxy for model certainty.
- Prediction intervals: Narrower intervals = higher confidence.
- Ensemble agreement: If using multiple models, only trade when they agree.
Time Series Analysis: Conclusions and Future Work
The ARIMA/GARCH-based trading strategy applied to EUR/USD outperformed the buy-and-hold approach during the backtest period. However, the excess performance was modest. The analysis suggests that filtering trades based on the magnitude of the forecast improves strategy outcomes, while filtering based on model fit diagnostics (e.g. the Ljung-Box test) adds limited value in this dataset.
Additional filtering ideas — such as acting only when both bounds of the 95% forecast confidence interval share the same sign — may improve forecast reliability but significantly reduce the number of trades taken, which can impact overall profitability.
There is considerable scope for model enhancement. Alternative GARCH variants — including exponential, threshold, integrated, structural, and regime-switching models — may better capture volatility dynamics than the standard GARCH(1,1) used here. For a broader view, see Bollerslev et al. (1994).
A promising direction involves model ensembling — combining the predictions of diverse models to improve forecast accuracy. For instance, aggregating signals from ARIMA/GARCH and machine learning models such as artificial neural networks could allow the ARIMA/GARCH component to capture linear structure while the neural net models non-linearities. Though speculative, this hybrid approach aligns with findings in the literature and merits further exploration.
If you have ideas for improving the forecast accuracy of time series models — particularly through hybrid approaches or regime-switching filters — I’d be glad to hear them.
Acknowledgements
This work was inspired by multiple sources on financial time series modeling. In particular, I found Michael Halls-Moore’s posts indispensable for their clarity and progression through increasingly complex models. The ARIMA+GARCH strategy design and the AIC-based parameter selection approach were adapted from his work. The concepts of filtering trades by Ljung-Box p-values and forecast magnitude were original contributions to this adaptation.
Found this post useful? Chances are you’ll love our exploration of the Hurst Exponent.
Further Reading and References:
- Bollerslev, T. (2001). Financial Econometrics: Past Developments and Future Challenges, Journal of Econometrics, 100, 41–51.
- Bollerslev, T., Engle, R.F., & Nelson, D.B. (1994). GARCH Models, in Engle & McFadden (Eds.), Handbook of Econometrics, Vol. 4, Elsevier.
- Engle, R. (2002). New Frontiers for ARCH Models, Journal of Applied Econometrics, 17, 425–466.
- Qi, M., & Zhang, G.P. (2008). Trend Time Series Modeling and Forecasting with Neural Networks, IEEE Transactions on Neural Networks, 19(5), 808–816.
- Tsay, R. (2010). Conditional Heteroscedastic Models, in Analysis of Financial Time Series (3rd Ed.), Wiley, pp. 109–174.
Download the full code and dataset used in this analysis: arima_garch
I was literally struggling through a bunch of ARMA ARIMA GARCH box test reading and then took a break to read your blog post. “Yes!” I shouted (in my head) when I read about you pondering all those big words and acronyms I’ve been struggling with. And then I realized you were also basing your work off Michaels writings. Damn stuff hurts my head. But I’m slowly getting it. You’re about 4 parsecs ahead of me so I’m going to have to keep an eye on your work as well. ? Thanks.
Hey Matt, thanks for the comment! I hope my article was useful for you. Yes, I learned a lot from Michael’s posts on this subject. He’s heavy on the detail and presents it in a logical way that continuously builds on the previous information. I recently purchased the rough cut of his latest book and refer to it often. Very much looking forward to the final release. In my article, I was aiming to succinctly summarise the theory and focus on some trading ideas that seem a fairly natural extension. Hopefully it was helpful!
Thanks for your post. Could you say what the Sharpe ratios of the tested strategies were?
I didn’t calculate Sharpe ratios when I ran these strategies. You could easily do this yourself by running the script (available via the download link, along with the data I used) and using the performanceAnalytics package in R.
Hello, I am new to time series fitting and found your article very interesting. My question is: Is there not a random value involved in the prediction of the price per definition of a GARCH series ? If so, would it not make sense to calculate the probability for the forecast to be long or short by using the ARMA value as mean and the standard deviation and maybe apply a filter then by accepting only values above a certain threshold ?
Hey, thanks for reading my blog. I think you are referring to the noise term in the GARCH definition? You could certainly experiment with an ARMA model – I’d love to hear about the results – but I’m not sure how that relates to the noise term in the GARCH model?
Thanks for the tutorial. What line(s) of code would we need to account for transaction costs?
There’s a few ways to do it, depending on how accurate or complex a transaction cost model you want. If a fixed transaction cost model would suffice, you could simply subtract this fixed transaction cost from each of your returns. For example, if a round turn on the EUR/USD costs you 1.5 pips, you simply do
returns <- returns - 0.00015
Of course, in reality you would get variable spread and variable slippage depending on factors such as real-time volatility and liquidity, so this may or may not be accurate enough for your purposes.
Hi Polar
In this example, I first fit an ARMA model of order (p,q) where (p,q) ∈ {0,1,2,3,4,5} and (p,q) are chosen such that they minimzie the Aikake Information Criterion. Then we fit a model using GARCH (1,1) for the variance and ARMA(p,q) for the mean. A new model is constructed for each period in the simulation using the previous 1,000 periods. Each model is used once (to predict the next period’s return) and then discarded. So to answer your question, the approach used here doesn’t look for parameters that ‘work’ generally, rather we find the best parameters from our lookback window and assume they will hold for the next period.
This assumption may or may not be valid. And if this assumption is valid, while time series modeling is super interesting from a theoretical perspective, it may or may not be of practical use in a trading strategy. At the very least, there are certainly other considerations beyond the optimal model parameters. For example, from my own experience building trading models for the forex markets, I can share that the choice of sampling time (that is, the time you choose as the open/close of your daily bars) is of critical significance in the success or otherwise of the model.
Your comment is actually very timely! I recently built a GARCH backtesting framework for a client that enables efficient experimentation with these and other parameters, such as exit conditions and capital allocation. An efficient experimentation framework is really important for effective and practical research. If you are interested in something similar, ping me at kris[at]robotwealth.com
Hope that helps
The price data EURUSD.csv… How is the data structure in the file? Can you display the headers/first few rows so I can parse my data accordingly.
Sorry. After posting my question I saw the download link for the file. Doooh!
Hello
Regarding your code how did you handle negative values for the returns? Because you are using the log for the ag.curve and with negative values you will have NaN
Thanks
Regis
I honestly don’t remember…I’d have to re-run the code. But yes, you are correct – it makes more sense to use log returns which are additive. Multiplying simple returns is problematic in creating an equity curve when you have neutral positions. I’ve updated the post.
you can use this : ag.curve <- cumsum( ag.direction.returns)
I have studied Time Series Econometrics as part of my PhD specialization and I have found that simply drawing trends and patterns on a chart is a far superior approach than the most advanced statistical techniques such as Markov Switching Multivariate GARCH or Multivariate Autoregressive State Space Models. I feel kind of uneasy that all my efforts and sleepless nights were for nothing. But at least I know what the best approaches are.
Great to hear that you’ve found something that works for you. I agree, sometimes simpler can be better.