Notation
We shall measure time at periods denoted by 1, 2, ..., t, ..., T. For example, period 1 might be midnight on January 1, period 2 might be midnight on January 2, etc. If we had a year's worth of data, T would be 365. Let's denote the value of the time series at time t by y,: in the example above, the first observation (the value of the series on January 1) will be denoted by yl, the last by Y365' We shall refer to the entire time series (one year's worth of daily data in the example above) as Yt; the reason for the subscript t will become clear in a moment. Because the rule generating the time series is probabilisticmeaning that we won't be able to forecast the next value of the series with certainty-we need to introduce the notion of a random "disturbance" at time t, denoted by et. Each disturbance is, by definition, drawn from the same probability distribution, and its value doesn't depend on the value of any prior disturbances. (This definition is summarized by saying that the values of et are independent and identically distributed, sometimes abbreviated fid.)
It is convenient to specify that the mean of et is 0. Thus, et could be +1 with probability 0.5 and -1 with probability 0.5; or it could be + 2 with probability 0.3, 0 with probability 0.1, and -1 with probability 0.6. Usually, we will specify simply that et has a normal distribution with mean 0 and some fixed standard deviation S-say S = 2.5.
Autocorrelation
A concept indispensable to the analysis of time series is autocorrelation. We have already looked at the idea of correlation between two variables, say x and y: we can estimate the correlation between x and y by plotting a scatter diagram; we can measure it by computing the correlation coefficient.
In time-series analysis, there is only one variable, but we can easily create new variables consisting of the old variable lagged one or more periods. If Yt represents the original series, consisting of observations yl, Y2, -- Yt, ---I YT, then Yt-1 will represent the same series lagged one period. If there are no observations prior to period 1, then the first value of Yt-1 will be missing, but the second will be y2, the third Y2, and the last yT-I. If the original series Yt consisted of daily observations for a year, starting January 1 and ending December 31, then Yt-1 would have a missing first observation, its second observation would be the value of the series on January 1, and its last observation would be the value on December 30.
We thus have two variables, Yt and Yt-1, derived from a single time series. We can plot a scatter diagram of the two variables (only T-1 points can be plotted), and we can compute the coefficient of correlation between Yt and Yt-1. This coefficient is called the (first-order) autocorrelation coefficient.
There is no reason to restrict our analysis to one-period lags. A
variable Yt_2 would start with two missing observations; the third observation
would be the value of the series on January 1, and the last would be the
value on December 29. A scatter diagram of Yt versus Y-2 would reveal whether
there was any substantial second-order correlation, and the second-order
autocorrelation coefficient would quantify it.