Understanding Time Series Modelling and Forecasting, Part 2

Yashu Seth

11 months ago

As promised, this is the second post on my two part blog series on time series modelling and forecasting. In my first blog post I discussed the basics of time series analysis and gave a theoretical overview. In case you missed it you can find it here – Understanding Time Series Modelling and Forecasting, Part 1

Table of Contents

Identifying a Possible Model
Diagnosing a Selected Model
Forecasting With ARIMA Models
Seasonal ARIMA Models

Identifying a Possible Model

I have talked about this in detail in my previous blog post. Lets go over this briefly once again. There are three things that need to be considered to make a first guess.

The time series plot of the observed series – Look for seasonality and trend. Ensure that the plot represents a stationary time series. If not, look for trends and use differencing to detrend your series. If there is curved upward trend along with increasing variance, consider transforming the series with a logarithm or square root. Non-constant variance with no trend need to be dealt with ARCH models (I have not discussed ARCH models here).
The ACF plot of the series
The PACF plot of the series

I have discussed this in my previous blog post. But I would like to mention a very important table again that I often refer.

	AR(p)	MA(q)	ARMA(p, q)
ACF	Tails Off	Cuts off after lag q	Tails Off
PACF	Cuts off after lag p	Tails Off	Tails Off

Diagnosing a selected model

Now that we have decided a particular model to use, we need to estimate the coefficients of our model. You usually do not have to worry about it as R or any other statistical software would do this for you. Most of the software packages use maximum likelihood estimation method to make the estimates.

Once you get the coefficients you need to consider a few things –

Check the significance of the coefficients. You can calculate the t-statistics or the p-values of the coefficients in R.
Examine the ACF plot of the residuals. A good model will have all autocorrelations of the residuals non – significant. You need to reconsider the order of your ARIMA model if this is not the case.
Use Box-Pierce (Ljung) tests for various possible residual autocorrelation at various lags. We will see how we can do this in the ARIMA modelling in R section.
If you are concerned about the non – constant variance, you need to again examine the residuals vs fits and the time series plot of the residuals.

If any of the checks bothers you, revise your guess of the model selected. You might have to change the order of the ARIMA model.

What if more than one model looks okay?

This happens very often. You will be in a position where once you perform the above steps, more than one model would seem to work. Here are the few steps you can take –

The model with lesser parameters should be preferred.
Examine the standard errors of the forecast values. Pick the model with the lowest standard errors.
Use statistics such as the MSE (mean square error), AIC, AICc or BIC. Lower values of these statistics are desirable.

Forecasting with ARIMA models

When we forecast a value past the end of the series, on the right side of the equation we might need values from the observed series that aren’t yet observed. Again, statistical software like R would do this for you but let us discuss the basic steps involved.

Let us consider the AR(2) model,

x_t = φ₁x_t-1 + φ₂x_t-2 + w_t

Suppose that we have observed n data values and wish to forecast the value of x_n+1 and x_n+2 , this can be done using the above equation.

x_n+1 = φ₁x_n + φ₂x_n-1 + w_n+1

x_n+2 = φ₁x_n+1 + φ₂x_n + w_n+2

We replace the w_n+1and w_n+2 by the expected value of 0 (the assumed mean for the errors). We use the forecasted value of x_n+1 to get the values of x_n+2 .

In general, the forecasting procedure is as follows –

For any w_j with 1 ≤ j ≤ n, use the sample residual for time point j
For any w_j with j > n, use 0 as the value of w_j
For any x_j with 1 ≤ j ≤ n, use the observed value of x_j
For any x_jwith j > n use the forecasted value of x_j

Seasonal ARIMA Models

Seasonality in time series is a regular pattern of changes occurring at fixed time periods. Lets denote this fixed time period with S. For example, if the sales of a particular product increases every July then S = 12 (months per year).

In a seasonal ARIMA models, seasonal AR and MA terms predict x_tusing data values and errors at times with lags that are multiples of S (the span of the seasonality).

With weekly data (S = 7), a stationary seasonal AR model with order 2 would depend on x_t-7and x_t-14. The equation to represent it would be –

x_t = φ₁x_t-7 + φ₂x_t-14 + w_t

Similarly a seasonal MA model with order 1 and span 12 would be –

x_t = θ₁₂w_t-12 + w_t

Seasonal Differencing

Seasonality usually causes the series to be non-stationary because the average values at some particular times within the seasonal span (months, for example) may be different than the average values at other times. For instance, the sale of blankets will always be higher in the winter months.

Seasonal differencing is defined as a difference between a value and a value with lag that is a multiple of S. With S = 24, a seasonal difference is x_t – x_{t-24 .}

Seasonal differencing can occur with non-seasonal differencing too.

The above equations assumed that the non-seasonal orders are zero. A model with non-seasonal as well as seasonal orders is represented as –

ARIMA (p, d, q) x (P, D, Q)_S

where,

p is the non-seasonal AR order, d is the non-seasonal differencing order, q is the non-seasonal MA order, P is the seasonal AR order, D is the non-seasonal differencing order, Q is the non-seasonal MA order and S is the span of the seasonality.

Therefore, an ARIMA(1, 0, 1) x (1, 0, 2)₁₂ model would have the following equation –

x_t = φ₁x_t-1 + φ₁₂x_t-12 + θ₁w_t-1 + θ₁₂w_t-12 + θ₂₄w_t-24 + w_t

Identifying a seasonal model

Examine the time series plot of the data for trend and seasonality. We usually know beforehand whether we have gathered seasonal (months, weeks, years etc.) or not.
We need to do any necessary differencing –
- If there is seasonality and no trend, then differencing of order S is required. Seasonality in ACF will appear as a slowly tapering pattern at multiples of S.
- If there is linear trend but no seasonality then apply a first difference. If there is quadratic trend then apply a second order difference.
- If there is both trend and seasonality, first apply a seasonal difference. If the trend remains then apply the requisite non-seasonal difference. (first order, second order etc.)
- If there is no trend and no seasonality then no differencing is needed.
Examine the ACF and PACF plot of the differenced data (if differencing is necessary).
- Non-seasonal terms – Examine the early lags to guess the non-seasonal terms. Spikes in the ACF (at low lags) indicate non-seasonal MA terms. Spike in the PACF (at low lags) indicate possible non-seasonal AR terms.
- Seasonal terms – Examine the patterns across lags that are multiples of S. For example, for weekly data, look at lags 7, 14, 21 and so on. The seasonal lags are judged in the same way as we judge the earlier lags.
Use a statistical software like R, to estimate the coefficients of the decided model.
Examine the coefficients following the same diagnosis steps that we do for the non-seasonal models. If the diagnosis results are not good, we need to redo step 3 above.

Identifying a Possible Model

Diagnosing a selected model

Forecasting with ARIMA models

Seasonal ARIMA Models

Identifying a seasonal model

Further Reading