In this post I will give a brief introduction to time series analysis and its applications. We will be using the R package astsa which was developed by professor David Stoffer at the University of Pittsburgh. The textbook it accompanies, which is a good read for anyone interested in the topic, can be found in a free eBook format here: Time Series Analysis and Its Applications

Time series are a series of observations made over a certain time interval. It is commonly used in economic forecasting as well as analyzing climate data over large periods of time. The main idea behind time series analysis is to use a certain number of previous observations to predict future observations.

First we install and load the astsa package.

install.packages("astsa")
require(astsa)

We will be examining the data set gtemp, which contains global temperature deviations from the 1951-1980 average, measured from 1880-2009.

plot(gtemp, xlab="Year", ylab="Deviation", main="Global Temperature Deviations, 1880-2009")

gtemp plot:
gtemp

Before we build a model, we want to check that the time series is stationary. To be stationary a time series requires three things:

Real data usually doesn’t usually meet these standards unless we are measuring something such as white noise. Judging with our eyes, the time series for gtemp appears non-stationary. The mean is non-constant and there is clearly an upward trend. The variance appears to be pretty consistent however.

We can further check this through the acf2() function. If stationary, the ACF/PACF plots will show a quick drop-off in correlation after a small amount of lag between points.

acf2(gtemp)

The dotted blue line details significance threshold for each lag. Clearly this data is non-stationary as a high number of previous observations are correlated with future values. Although regression techniques would allow one to fit a smooth curve to this data, time series analysis is interested in removing as much trend as possible in order to identify potential factors that a regression line wouldn’t capture.

We will now transform the series in to stationary by subtracting Xt-1 from Xt for all values t. This technique is called differencing and can be done with the diff function.

plot(diff(gtemp)
acf2(diff(gtemp))

gtempdiff

acf2gtemp

That seems to have done the trick, with all significant correlation being removed after lag 1. It’s time to fit a time series model to the data using the sarima function. The sarima function takes in 3 parameters (p,d,q), which correspond to the Auto-Regressive order, degree of differencing, and Moving-Average order. If you are not familiar with those terms, I recommend a quick overview here: Auto-regressive-moving-average model

The ACF/PACF plot give us suggestions on what degree of parameters to utilize. If the ACF had a smooth, geometric decay and the PACF a cutoff at lag p, we would utilize a pure AR(p) model. Given that the ACF shows no pattern of a smooth decay and decays to insignificance after lag 1, we will build an MA(1) model to fit the data. We will choose d = 1 as our degree of differencing, evidenced by our earlier call of the diff function.

sarima(gtemp,0,1,1)
Coefficients:
          ma1  constant
      -0.6550    0.0063
s.e.   0.0795    0.0030
sigma^2 estimated as 0.009574:  log likelihood = 116.11,  aic = -226.22

sarimagtemp011

As we can see from our model, it is a good fit. The residuals fluctuate around 0, and the QQ-plot captures the distribution well besides some length at the extreme tails. We will now forecast our ARIMA model five years in to 2014. The blue lines indicate the standard error of our model.

sarima.for(gtemp,5,0,1,1)

sarimaforecast

The model forecast fits the data well.

Given a stationary transformation that neutralizes trends such as the one in our original gtemp data, researchers can identify previously unnoticed components to strengthen their forecasting.