The Cross Correlation Theorem is similar to the more widely known Convolution Theorem. The cross correlation of two discrete finite time series {f0, f1, f2, fN1}\{f_0,\ f_1,\ f_2,\ldots\,\ f_{N-1}\} and {g0, g1, g2, gN1}\{g_0,\ g_1,\ g_2,\ldots\,\ g_{N-1}\} is defined by,

ψt=n=0N1fngn+t     (1), \psi_t = \sum_{n=0}^{N-1} f_{n} g_{n+t}\ \ \ \ \ (1),

where tt is called the time lag. Cross correlation provides a measure of the similitude of two time series when shifted by the time lag. A direct calculation of the cross correlation using the equation above requires O(N2)O(N^2) operations. The Cross Correlation Theorem provides a method for calculating cross correlation in O(NlogN)O(NlogN) operations by use of the Fast Fourier Transform. Here the theoretical background required to understand cross correlation calculations using the Cross Correlation Theorem is discussed. Example calculations are performed and different implementations using the FFT libraries in numpy compared. The important special case of the cross correlation called Autocorrelation is addressed in the final section.

Cross Correlation

Cross Correlation can be understood by considering the Covariance of two random variables, XX and YY. Covariance is the Expectation of the product of the deviations of the random variables from their respective means,

Cov(X,Y)=E[(XE[X])(YE[Y])]=E[XY]E[X]E[Y].     (2) \begin{aligned} Cov(X,Y) &= E\left[(X-E[X])(Y-E[Y])\right] \\ &= E[XY] - E[X]E[Y]. \end{aligned}\ \ \ \ \ (2)

Note that Cov(X,Y)=Cov(Y,X)Cov(X,Y)=Cov(Y,X) and if XX and YY are Independent Random Variables E[XY]=E[X]E[Y]  Cov(X,Y)=0E[XY]=E[X]E[Y]\ \implies\ Cov(X,Y)=0. If X=YX=Y the Covariance reduces to the Variance,

Var(X)=E[(XE[X])2]=E[X2](E[X])2. \begin{aligned} Var(X) &= E\left[(X-E[X])^2\right] \\ &= E[X^2] - \left(E[X]\right)^2. \end{aligned}

These two results are combined in the definition of the Correlation Coefficient,

ρXY=Cov(X,Y)Var(X)Var(Y). \rho_{XY} = \frac{Cov(X,Y)}{\sqrt{Var(X)Var(Y)}}.

The correlation coefficient has a geometric interpretation leading to the conclusion that 1 ρXY 1-1\ \leq \rho_{XY}\ \leq 1. At the extreme values XX and YY are either the same or differ by a multiple of 1-1. At the midpoint value, ρXY=0\rho_{XY}=0, XX and YY are independent random variables. If follows that ρXY\rho_{XY} provides a measure of possible dependence or similarity of two random variables.

Consider the first term in equation (2)(2) when a time series of samples of XX and YY of equal length are available,

E[XY]1N2n=0N1xnyn ψ0, \begin{aligned} E[XY]& \approx \frac{1}{N^2}\sum_{n=0}^{N-1}x_n y_n \\ &\propto\ \psi_{0}, \end{aligned}

if NN is sufficiently large. Generalizing this result to arbitrary time shifts leads to equation (1)(1).

An important special case of the cross correlation is the autocorrelation which is defined a the cross correlation of a time series with itself,

rt=n=0N1xnxn+t     (3). r_t = \sum_{n=0}^{N-1} x_{n} x_{n+t}\ \ \ \ \ (3).

Building on the interpretation of cross correlation the autocorrelation is viewed as a measure of dependence or similarity of the past and future of a time series. For a time lag of t=0t=0,

r0  E[X2]. r_0\ \propto\ E\left[X^2\right].

Discrete Fourier Transform

This section will discuss properties of the Discrete Fourier Transform that are used in following sections. The Discrete Fourier Transform for a discrete periodic time series of length NN, {f0, f1, f2,, fN1}\{f_0,\ f_1,\ f_2,\ldots,\ f_{N-1}\}, is defined by,

fn=1Nk=0N1Fke2πi(k/N)nFk=n=0N1fne2πi(n/N)k,     (4) \begin{gathered} f_{n} = \frac{1}{N}\sum_{k=0}^{N-1}F_{k}e^{2\pi i (k/N)n} \\ F_{k} = \sum_{n=0}^{N-1}f_{n}e^{-2\pi i (n/N)k}, \end{gathered}\ \ \ \ \ (4)

where the expression for fnf_{n} is referred to the inverse transform and the one for FkF_{k} the forward transform.

Linearity

Both the forward and inverse transforms are Linear Operators. An operator, F\mathfrak{F}, is linear if the operation on a sum is equal the sum of the operations,

F(a+b)=F(a)+F(b)     (5) \mathfrak{F}(a+b) = \mathfrak{F}(a)+\mathfrak{F}(b)\ \ \ \ \ (5)

To show this for the forward transform consider hn=fn+gnh_{n}=f_{n}+g_{n}, then,

Hk=n=0N1hne2πi(n/N)k=n=0N1(fn+gn)e2πi(n/N)k=n=0N1fne2πi(n/N)k+n=0N1gne2πi(n/N)k=Fk+Gk. \begin{aligned} H_{k} &= \sum_{n=0}^{N-1}h_{n}e^{-2\pi i (n/N)k} \\ &= \sum_{n=0}^{N-1}\left(f_{n}+g_{n}\right)e^{-2\pi i (n/N)k} \\ &= \sum_{n=0}^{N-1}f_{n} e^{-2\pi i (n/N)k} + \sum_{n=0}^{N-1} g_{n}e^{-2\pi i (n/N)k} \\ &= F_{k} + G_{k}. \end{aligned}

Similarly it can be shown that the inverse transform is linear.

Periodicity

Periodicity of the forward transform implies that, Fk+mN=Fk     (6), F_{k+mN}=F_{k}\ \ \ \ \ (6), where m={,2, 1, 0, 1, 2,}m=\{\ldots,-2,\ -1,\ 0,\ 1,\ 2,\ldots\}. To show that this is true first consider the case m=1m=1,

Fk+N=n=0Nfne2πi(n/N)(k+N)=n=0Nfne2πi(n/N)ke2πin=n=0Nfne2πi(n/N)k=Fk, \begin{aligned} F_{k+N} &= \sum_{n=0}^{N} f_{n}e^{-2\pi i (n/N)(k+N)} \\ &= \sum_{n=0}^{N} f_{n} e^{-2\pi i(n/N)k}e^{-2\pi i n} \\ &= \sum_{n=0}^{N} f_{n} e^{-2\pi i(n/N)k} \\ &= F_{k}, \end{aligned}

where the second step follows from, e2πin=1,  ne^{2\pi i n} = 1,\ \forall\ n.

For an arbitrary value of mm, Fk+mN=n=0Nfne2πi(n/N)(k+mN)=n=0Nfne2πi(n/N)ke2πinm=n=0Nfne2πi(n/N)k=Fk, \begin{aligned} F_{k+mN} &= \sum_{n=0}^{N} f_{n}e^{-2\pi i (n/N)(k+mN)} \\ &= \sum_{n=0}^{N} f_{n} e^{-2\pi i(n/N)k}e^{-2\pi i nm} \\ &= \sum_{n=0}^{N} f_{n} e^{-2\pi i(n/N)k} \\ &= F_{k}, \end{aligned}

since, e2πimn=1,  m, ne^{2\pi i mn} = 1,\ \forall\ m,\ n.

Consequence of Real fnf_{n}

If fnf_{n} is real then fn=fnf_{n}=f_{n}^{\ast}, where \ast denotes the Complex Conjugate. It follows that,

fn=fn={1Nk=0N1Fke2πi(k/N)n}=1Nk=0N1Fke2πi(k/N)n     (7) \begin{aligned} f_{n} &= f_{n}^{\ast} \\ &= \left\{ \frac{1}{N}\sum_{k=0}^{N-1}F_{k}e^{2\pi i (k/N)n} \right\}^{\ast} \\ &= \frac{1}{N}\sum_{k=0}^{N-1}F_{k}^{\ast}e^{-2\pi i (k/N)n} \end{aligned}\ \ \ \ \ (7)

Another interesting and related result is,

Fk=Fk     (8). F_{-k} = F_{k}^{\ast}\ \ \ \ \ (8).

which follows from,

Fk=n=0N1fne2πi(n/N)k=n=0N1fne2πi(n/N)k={n=0N1fne2πi(n/N)k}=Fk. \begin{aligned} F_{-k} &= \sum_{n=0}^{N-1} f_{n}e^{2\pi i (n/N)k} \\ &= \sum_{n=0}^{N-1} f_{n}^{\ast}e^{2\pi i (n/N)k} \\ &= \left\{ \sum_{n=0}^{N-1} f_{n}e^{-2\pi i (n/N)k}\right\}^{\ast} \\ &=F_{k}^{\ast}. \end{aligned}

Orthogonality of Fourier Basis

The Discrete Fourier Basis is the collection of functions,

{e2πi(k/N)n}, \left\{e^{2\pi i(k/N)n}\right\},

where n=0, 1, 2,,N1n = 0,\ 1,\ 2,\ldots,N-1. It forms an Orthogonal Basis since,

1Nn=0N1e2π[(mk)/N]n = δmk={1if m = k0if m  k     (9), \frac{1}{N} \sum_{n=0}^{N-1} e^{2\pi \left[ (m-k)/N\right] n}\ =\ \delta_{mk} = \begin{cases} 1 & \text{if}\ m\ =\ k \\ 0 & \text{if}\ m\ \ne\ k \end{cases}\ \ \ \ \ (9),

where δmk\delta_{mk} is the Kronecker Delta. This result can be proven for m  km\ \ne\ k by noting that the sum in equation (9)(9) is a Geometric Series,

1Nn=0N1e2πi[(mk)/N]n=1N1e2πi(mk)1e2πi(mk)/N. \frac{1}{N} \sum_{n=0}^{N-1} e^{2\pi i \left[(m-k)/N \right] n} = \frac{1}{N} \frac{1-e^{2\pi i (m-k)}}{1-e^{2\pi i (m-k)/N}}.

Since 2πi(mk)2\pi i(m-k) is always a multiple of 2π2\pi it follows that the numerator is zero,

1e2πi(mk)=11=0. 1-e^{2\pi i (m-k)} = 1-1 = 0.

The denominator is zero only if mk=lNm-k=lN where ll is an integer. This cannot happen since (N1) mk N1-(N-1)\ \leq m-k\ \leq N-1, so,

n=0N1e2π[(mk)/N]n = 0 \sum_{n=0}^{N-1} e^{2\pi \left[ (m-k)/N\right] n}\ =\ 0

If m=km=k then,

n=0N1e2πi[(mk)/N]n=n=0N11=N, \sum_{n=0}^{N-1} e^{2\pi i \left[(m-k)/N \right] n} = \sum_{n=0}^{N-1} 1 = N,

this proves that equation (9)(9).

The Cross Correlation Theorem

The Cross Correlation Theorem is a relationship between the Fourier Transform of the cross correlation, ψt\psi_{t} defined by equation (1)(1) and the Fourier Transforms of the two time series used in the cross correlation calculation,

Ψk=FkGk     (10), \Psi_{k} = F_{k}^{\ast}G_{k}\ \ \ \ \ (10),

where,

Ψk=n=0N1ψne2πi(n/N)kFk=n=0N1fne2πi(n/N)kGk=n=0N1gne2πi(n/N)k. \begin{aligned} \Psi_{k} &= \sum_{n=0}^{N-1}\psi_{n}e^{-2\pi i (n/N)k} \\ F_{k}^{\ast} &= \sum_{n=0}^{N-1}f_{n}^{\ast}e^{2\pi i (n/N)k} \\ G_{k} &= \sum_{n=0}^{N-1}g_{n}e^{-2\pi i (n/N)k}. \end{aligned}

To derive equation (10)(10) consider the Inverse Fourier Transform of the time series fnf_{n} and gn+tg_{n+t},

fn=1Nk=0N1Fke2πi(k/N)ngn+t=1Nk=0N1Gke2πi(k/N)(n+t), \begin{gathered} f_{n} = \frac{1}{N}\sum_{k=0}^{N-1}F_{k}^{\ast}e^{-2\pi i (k/N)n} \\ g_{n+t} = \frac{1}{N}\sum_{k=0}^{N-1}G_{k}e^{2\pi i (k/N)(n+t)}, \end{gathered}

Substituting these expressions into equation (1)(1) gives,

ψt=n=0N1fngn+t=n=0N1{1Nk=0N1Fke2πi(k/N)n}{1Nm=0N1Gme2πi(m/N)(n+t)}=1Nk=0N1m=0N1FkGme2πi(t/N)m1Nn=0N1e2πi[(mk)/N]n=1Nk=0N1m=0N1FkGme2πi(t/N)mδmk=1Nk=0N1FkGke2πi(t/N)k, \begin{aligned} \psi_t &= \sum_{n=0}^{N-1} f_{n} g_{n+t} \\ &= \sum_{n=0}^{N-1} \left\{ \frac{1}{N}\sum_{k=0}^{N-1}F_{k}^{\ast}e^{-2\pi i (k/N)n} \right\} \left\{ \frac{1}{N}\sum_{m=0}^{N-1}G_{m}e^{2\pi i (m/N)(n+t)} \right\} \\ &= \frac{1}{N}\sum_{k=0}^{N-1}\sum_{m=0}^{N-1} F_{k}^{\ast}G_{m} e^{2\pi i (t/N)m} \frac{1}{N} \sum_{n=0}^{N-1} e^{2\pi i \left[(m-k)/N \right] n} \\ &= \frac{1}{N}\sum_{k=0}^{N-1}\sum_{m=0}^{N-1} F_{k}^{\ast}G_{m} e^{2\pi i (t/N)m} \delta_{mk} \\ &= \frac{1}{N}\sum_{k=0}^{N-1} F_{k}^{\ast}G_{k} e^{2\pi i (t/N)k}, \end{aligned}

where the second step follows from equation (9)(9). Equation (10)(10) follows by taking the Fourier Transform of the previous result,

Ψk=t=0N1ψte2πi(t/N)k=t=0N1{1Nk=0N1FkGke2πi(t/N)m}e2πi(t/N)k=m=0N1FmGm1Nt=0N1e2πi[(mk)/N)]t=m=0N1FmGmδmk=FkGk, \begin{aligned} \Psi_{k} &= \sum_{t=0}^{N-1}\psi_{t}e^{-2\pi i (t/N)k} \\ &= \sum_{t=0}^{N-1} \left\{ \frac{1}{N}\sum_{k=0}^{N-1} F_{k}^{\ast}G_{k} e^{2\pi i (t/N)m} \right\}e^{-2\pi i (t/N)k} \\ &= \sum_{m=0}^{N-1} F_{m}^{\ast}G_{m} \frac{1}{N} \sum_{t=0}^{N-1} e^{2\pi i \left[(m-k)/N)\right]t} \\ &= \sum_{m=0}^{N-1} F_{m}^{\ast}G_{m} \delta_{mk} \\ &= F_{k}^{\ast}G_{k}, \end{aligned}

proving the Cross Correlation Theorem defined by equation (10)(10).

Discrete Fourier Transform Example

This section will work through an example calculation of a discrete Fourier Transform that can be worked out by hand. The manual calculations will be compared with calculations performed using the FFT library from numpy.

The Discrete Fourier Transform of a time series, represented by the column vector ff, into a column vector of Fourier coefficients, f\overline{f}, can be represented by the linear equation,

f=Tf, \overline{f} = Tf,

where TT is the transform matrix computed from the Fourier basis functions. Each element of the matrix is the value of the basis function used in the calculation. It is assumed that the time series contains only 44 points so that TT will be a 4×44\times 4 matrix. The transform matrix only depends on the number of elements in the time series vector and is given by,

T=(11111eiπ/2eiπei3π/21eiπei2πei3π1ei3π/2ei3πei9π/2)=(11111i1i11111i1i)     (11) T= \begin{pmatrix} 1 & 1 & 1 & 1 \\ 1 & e^{-i\pi/2} & e^{-i\pi} & e^{-i3\pi/2} \\ 1 & e^{-i\pi} & e^{-i2\pi} & e^{-i3\pi} \\ 1 & e^{-i3\pi/2} & e^{-i3\pi} & e^{-i9\pi/2} \end{pmatrix} = \begin{pmatrix} 1 & 1 & 1 & 1 \\ 1 & -i & -1 & i \\ 1 & -1 & 1 & -1 \\ 1 & i & -1 & -i \end{pmatrix} \ \ \ \ \ (11)

Assume an example time series of,

f=(8480). f = \begin{pmatrix} 8 \\ 4 \\ 8 \\ 0 \end{pmatrix}.

It follows that,

(f1f2f3f4)=(11111i1i11111i1i)(8480)=(204i124i)     (12) \begin{aligned} \begin{pmatrix} \overline{f_1} \\ \overline{f_2} \\ \overline{f_3} \\ \overline{f_4} \end{pmatrix} &= \begin{pmatrix} 1 & 1 & 1 & 1 \\ 1 & -i & -1 & i \\ 1 & -1 & 1 & -1 \\ 1 & i & -1 & -i \end{pmatrix} \begin{pmatrix} 8 \\ 4 \\ 8 \\ 0 \end{pmatrix} \\ &= \begin{pmatrix} 20 \\ -4i \\ 12 \\ 4i \end{pmatrix} \end{aligned}\ \ \ \ \ (12)

The Python code listing below uses the FFT implementation from numpy to confirm the calculation of equation (12)(12). It first defines the time series example data ff. The Fourier Transform is then used to compute f\overline{f}.

In [1]: import numpy
In [2]: f = numpy.array([8, 4, 8, 0])
In [3]: numpy.fft.fft(f)
Out[3]: array([20.+0.j,  0.-4.j, 12.+0.j,  0.+4.j])

Cross Correlation Theorem Example

This section will work through an example calculation of cross correlation using the Cross Correlation Theorem with the goal of verifying an implementation of the algorithm in Python. Here use will be made of the time series vector ff and the transform matrix TT discussed in the previous section. An additional time series vector also needs to be considered, let,

g=(6393). g = \begin{pmatrix} 6 \\ 3 \\ 9 \\ 3 \end{pmatrix}.

First, consider a direct calculation of the cross correlation, ψt=n=0N1fngn+t\psi_t = \sum_{n=0}^{N-1} f_{n} g_{n+t}. The following python function cross_correlate_sum(x, y) implements the required summation.

In [1]: import numpy
In [2]: def cross_correlate_sum(x, y):
   ...:     n = len(x)
   ...:     correlation = numpy.zeros(len(x))
   ...:     for t in range(n):
   ...:         for k in range(0, n - t):
   ...:             correlation[t] += x[k] * y[k + t]
   ...:     return correlation
   ...:

In [3]: f = numpy.array([8, 4, 8, 0])
In [4]: g = numpy.array([6, 3, 9, 3])
In [5]: cross_correlate_sum(f, g)
Out[5]: array([132.,  84.,  84.,  24.])

cross_correlate_sum(x, y) takes two vectors, x and y, as arguments. It assumes that x and y are equal length and after allocating storage for the result performs the double summation required to compute the cross correlation for all possible time lags, returning the result. It is also seen that O(N2)O(N^2) operations are required to perform the calculation, where NN is the length of the input vectors.

Verification of following results requires a manual calculation. A method of organizing the calculation to facilitate
this is shown in table below. The table rows are constructed from the elements of fnf_{n} and the time lagged elements of gn+tg_{n+t} for each value tt. The column is indexed by the element number nn. The time lag shift performed on the vector gn+tg_{n+t} results in the translation of the components to the left that increases for each row as the time lag increases. Since the number of elements in ff and gg is finite the time lag shift will lead to some elements not participating in ψt\psi_{t} for some time lag values. If there is no element in the table at a position the value of ff or gg at that position is assumed to be 00.

nn       0 1 2 3
fnf_{n}       8 4 8 0
gng_{n}       6 3 9 3
gn+1g_{n+1}     6 3 9 3  
gn+2g_{n+2}   6 3 9 3    
gn+3g_{n+3} 6 3 9 3      

The cross correlation, ψt\psi_t, for a value of the time lag, tt, is computed for each nn by multiplication of fnf_{n} and gn+tg_{n+t} and summing the results. The outcome of this calculation is shown as the column vector ψ\psi below where each row corresponds to a different time lag value.

ψ=(86+43+89+0383+49+8389+4383)=(132848424)      (13) \psi = \begin{pmatrix} 8\cdot 6 + 4\cdot 3 + 8\cdot 9 + 0\cdot 3 \\ 8\cdot 3 + 4\cdot 9 + 8\cdot 3 \\ 8\cdot 9 + 4\cdot 3 \\ 8\cdot 3 \end{pmatrix} = \begin{pmatrix} 132 \\ 84 \\ 84 \\ 24 \end{pmatrix}\ \ \ \ \ \ (13)

The result is the same as determined by cross_correlate_sum(x,y).

Now that an expectation of a result is established it can be compared with the a calculation using using the Cross Correlation Theorem from equation (10)(10) where ff and gg are represented by Discrete Fourier Series. It was previously shown that the Fourier representation is periodic, see equation (6)(6). It follows that the time lag shifts of gn+tg_{n+t} will by cyclic as shown in the calculation table below.

nn 0 1 2 3
fnf_{n} 8 4 8 0
gng_{n} 6 3 9 3
gn+1g_{n+1} 3 9 3 6
gn+2g_{n+2} 9 3 6 3
gn+3g_{n+3} 3 6 3 9

Performing the calculation following the steps previously described has the following outcome,

ψ=(86+43+89+0383+49+83+0689+43+86+0383+46+83+09)=(1328413272). \psi = \begin{pmatrix} 8\cdot 6 + 4\cdot 3 + 8\cdot 9 + 0\cdot 3 \\ 8\cdot 3 + 4\cdot 9 + 8\cdot 3 + 0\cdot 6 \\ 8\cdot 9 + 4\cdot 3 + 8\cdot 6 + 0\cdot 3 \\ 8\cdot 3 + 4\cdot 6 + 8\cdot 3 + 0\cdot 9 \end{pmatrix} = \begin{pmatrix} 132 \\ 84 \\ 132 \\ 72 \end{pmatrix}.

This is different from what was obtained from a direct evaluation of the cross correlation sum. Even though the result is different the calculation could be correct since periodicity of ff and gg was not assumed when the sum was evaluated. Below a calculation using the Cross Correlation Theorem implemented in Python is shown.

In [1]: import numpy
In [2]: f = numpy.array([8, 4, 8, 0])
In [3]: g = numpy.array([6, 3, 9, 3])
In [4]: f_bar = numpy.fft.fft(f)
In [5]: g_bar = numpy.fft.fft(g)
In [6]: numpy.fft.ifft(f_bar * g_bar)
Out[6]: array([132.+0.j,  72.+0.j, 132.+0.j,  84.+0.j])

In the calculation ff and gg are defined and their Fourier Transforms are computed. The Cross Correlation Theorem is then used to compute the Fourier Transform of the cross correlation, which is then inverted. The result obtained is the same as obtained in the manual calculation verifying the results. Since the calculations seem to be correct the problem must be that periodicity of the Fourier representations ff and gg was not handled properly. This analysis artifact is called aliasing. The following example attempts to correct for this problem.

The cross correlation calculation table for periodic ff and gg can be made to resemble the table for the nonperiodic case by padding the end of the vectors with N1N-1 zeros, where NN is the vector length, creating a new periodic vector of length 2N12N-1. This new construction is shown below.

nn       0 1 2 3 4 5 6
fnf_{n}       8 4 8 0      
gng_{n}       6 3 9 3 0 0 0
gn+1g_{n+1}     6 3 9 3 0 0 0 6
gn+2g_{n+2}   6 3 9 3 0 0 0 6 3
gn+3g_{n+3} 6 3 9 3 0 0 0 6 3 9
gn+4g_{n+4} 3 9 3 0 0 0 6 3 9 3
gn+5g_{n+5} 9 3 0 0 0 6 3 9 3 0
gn+6g_{n+6} 3 0 0 0 6 3 9 3 0 0

It follows that,

ψ=(86+43+89+3083+49+83+0089+43+80+0083+40+80+0080+40+80+0380+40+86+0380+46+83+09)=(13284842404848) \psi = \begin{pmatrix} 8\cdot 6 + 4\cdot 3 + 8\cdot 9 + 3\cdot 0 \\ 8\cdot 3 + 4\cdot 9 + 8\cdot 3 + 0\cdot 0 \\ 8\cdot 9 + 4\cdot 3 + 8\cdot 0 + 0\cdot 0 \\ 8\cdot 3 + 4\cdot 0 + 8\cdot 0 + 0\cdot 0 \\ 8\cdot 0 + 4\cdot 0 + 8\cdot 0 + 0\cdot 3 \\ 8\cdot 0 + 4\cdot 0 + 8\cdot 6 + 0\cdot 3 \\ 8\cdot 0 + 4\cdot 6 + 8\cdot 3 + 0\cdot 9 \end{pmatrix} = \begin{pmatrix} 132 \\ 84 \\ 84 \\ 24 \\ 0 \\ 48 \\ 48 \end{pmatrix}

The first NN elements of this result are the same as obtained calculating the sum directly. The same result is achieved by discarding the last N1N-1 elements. Verification is shown below where the previous example is extended by padding the tail of both ff and gg with three zeros.

In [1]: import numpy
In [2]: f = numpy.array([8, 4, 8, 0])
In [3]: g = numpy.array([6, 3, 9, 3])
In [4]: f_bar = numpy.fft.fft(numpy.concatenate((f, numpy.zeros(len(f)-1))))
In [5]: g_bar = numpy.fft.fft(numpy.concatenate((g, numpy.zeros(len(g)-1))))
In [6]: numpy.fft.ifft(numpy.conj(f_bar) * g_bar)
Out[7]:
array([1.3200000e+02+0.j, 8.4000000e+01+0.j, 8.4000000e+01+0.j,
       2.4000000e+01+0.j, 4.0602442e-15+0.j, 4.8000000e+01+0.j,
       4.8000000e+01+0.j])

The following function, cross_correlate(x, y), generalizes the calculation to vectors of arbitrary but equal lengths.

def cross_correlate(x, y):
    n = len(x)
    x_padded = numpy.concatenate((x, numpy.zeros(n-1)))
    y_padded = numpy.concatenate((y, numpy.zeros(n-1)))
    x_fft = numpy.fft.fft(x_padded)
    y_fft = numpy.fft.fft(y_padded)
    h_fft = numpy.conj(x_fft) * y_fft
    cc = numpy.fft.ifft(h_fft)
    return cc[0:n]

Autocorrelation

The autocorrelation, defined by equation (3)(3), is the special case of the cross correlation of a time series with itself. It provides a measure of the dependence or similarity of its past and future. The version of the Cross Correlation Theorem for autocorrelation is given by,

Rk=FkFk=Fk2     (14). R_{k} = F^{\ast}_{k}F_{k} = {\mid F_{k} \mid}^{2}\ \ \ \ \ (14).

RkR_{k} is the weight of each of the coefficients in the Fourier Series representation of the time series and is known as the Power Spectrum.

When discussing the autocorrelation of a time series, ff, the autocorrelation coefficient is useful,

γτ=1σf2n=0t(fnμf)(fn+τμf)     (15), \gamma_{\tau} = \frac{1}{\sigma_{f}^2}\sum_{n=0}^{t} \left(f_{n} - \mu_f \right) \left(f_{n+\tau} - \mu_f\right)\ \ \ \ \ (15),

where μf\mu_{f} and σf\sigma_{f} are the time series mean and standard deviation respectively. Below a Python implementation calculating the autocorrelation coefficient is given.

def autocorrelate(x):
    n = len(x)
    x_shifted = x - x.mean()
    x_padded = numpy.concatenate((x_shifted, numpy.zeros(n-1)))
    x_fft = numpy.fft.fft(x_padded)
    r_fft = numpy.conj(x_fft) * x_fft
    ac = numpy.fft.ifft(r_fft)
    return ac[0:n]/ac[0]

autocorrelate(x) takes a single argument, x, that is the time series used in the calculation. The function first shifts x by its mean, then adds padding to remove aliasing and computes its FFT. Equation (14)(14) is next used to compute the Discrete Fourier Transform of the autocorrelation which is inverted. The final result is normalized by the zero lag autocorrelation which equals σf2\sigma_f^2.

AR(1) Equilibrium Autocorrelation

The equilibrium properties of the AR(1) random process are discussed in some detail in Continuous State Markov Chain Equilibrium. AR(1) is defined by the difference equation,

Xt=αXt1+εt     (16), X_{t} = \alpha X_{t-1} + \varepsilon_{t}\ \ \ \ \ (16),

where t=0, 1, 2,t=0,\ 1,\ 2,\ldots and the εt\varepsilon_{t} are identically distributed independent Normal(0, σ2)\textbf{Normal}(0,\ \sigma^2) random variables.

The equilibrium mean and standard deviation, μE\mu_E and σE\sigma_E are given by,

μE=0σE=σ21α2.     (17) \begin{gathered} \mu_{E} = 0 \\ \sigma_{E} = \frac{\sigma^2}{1 - \alpha^2}. \end{gathered}\ \ \ \ \ (17)

The equilibrium autocorrelation with time lag τ\tau is defined by,

rτE=limtE[XtXt+τ] r_{\tau}^{E} = \lim_{t\to\infty} E\left[X_t X_{t+\tau} \right]

If equation (16)(16) is used to evaluate a few steps beyond an arbitrary time tt it is seen that,

Xt+1=αXt+εt+1Xt+2=αXt+1+εt+2Xt+3=αXt+2+εt+3. \begin{aligned} X_{t+1} &= \alpha X_t + \varepsilon_{t+1} \\ X_{t+2} &= \alpha X_{t+1} + \varepsilon_{t+2} \\ X_{t+3} &= \alpha X_{t+2} + \varepsilon_{t+3}. \end{aligned}

Substituting the equation for t+1t+1 into the equation for t+2t+2 and that result into the equation for t+3t+3 gives,

Xt+3=α3Xt+n=13αn1εt+n. X_{t+3} = \alpha^{3} X_t + \sum_{n=1}^{3}\alpha^{n-1} \varepsilon_{t+n}.

If this procedure is continued for τ\tau steps the following is obtained,

Xt+τ=ατXt+n=1ταn1εt+n. X_{t+\tau} = \alpha^{\tau} X_t + \sum_{n=1}^{\tau}\alpha^{n-1} \varepsilon_{t+n}.

It follows that the autocorrelation is given by,

rτ=E[XtXt+τ]=E[Xt(ατXt+n=1ταn1εt+n)]=E[ατXt2+n=1ταn1Xtεt+n]=ατE[Xt2]+n=1ταn1E[Xtεt+n]. \begin{aligned} r_{\tau} &= E\left[X_t X_{t+\tau} \right] \\ &=E\left[ X_{t}\left( \alpha^{\tau} X_t + \sum_{n=1}^{\tau}\alpha^{n-1} \varepsilon_{t+n} \right) \right] \\ &= E\left[ \alpha^{\tau} X_t^2 + \sum_{n=1}^{\tau}\alpha^{n-1} X_t \varepsilon_{t+n}\right] \\ &= \alpha^{\tau} E\left[X_t^2\right] + \sum_{n=1}^{\tau}\alpha^{n-1} E\left[ X_t \varepsilon_{t+n}\right]. \end{aligned}

To go further the summation in the last step needs to be evaluated. In a previous post it was shown that,

Xt=αtX0+i=1tαtiεi. X_t = \alpha^t X_{0} + \sum_{i=1}^t \alpha^{t-i} \varepsilon_{i}.

Using this result the summation term becomes,

E[Xtεt+n]=E[αtX0εt+n+i=1tαtiεiεt+n]=αtX0E[εt+n]+i=1tαtiE[εiεt+n]=0, \begin{aligned} E\left[ X_t \varepsilon_{t+n}\right] &= E\left[ \alpha^t X_{0}\varepsilon_{t+n} + \sum_{i=1}^t \alpha^{t-i} \varepsilon_{i}\varepsilon_{t+n} \right] \\ &= \alpha^{t} X_0 E\left[ \varepsilon_{t+n} \right] + \sum_{i=1}^{t} \alpha^{t-i} E\left[ \varepsilon_{i} \varepsilon_{t+n} \right] \\ &= 0, \end{aligned}

since the εt\varepsilon_{t} are independent and identically distributed random variables. It follows that,

rτ=E[XtXt+τ]=ατE[Xt2] \begin{aligned} r_{\tau} &= E\left[X_t X_{t+\tau} \right] \\ &= \alpha^{\tau} E\left[X_t^2\right] \end{aligned}

Evaluation of the equilibrium limit gives,

rτE=limtE[XtXt+τ]=limtατE[Xt2]=ατσE2. \begin{aligned} r_{\tau}^{E} &= \lim_{t\to\infty} E\left[X_t X_{t+\tau} \right] \\ &= \lim_{t\to\infty} \alpha^{\tau} E\left[X_t^2\right] \\ &= \alpha^{\tau} \sigma_{E}^{2}. \end{aligned}

The last step follows from (17)(17) by assuming that α <1\mid\alpha\mid\ < 1 so that μE\mu_{E} and σE\sigma_{E} are finite. A simple form of the autocorrelation coefficient in the equilibrium limit follows from equation (15)(15),

γτE=rτσE2=ατ     (18). \begin{aligned} \gamma_{\tau}^{E} &= \frac{r_{\tau}}{\sigma^2_E} \\ &= \alpha^{\tau} \end{aligned}\ \ \ \ \ (18).

γτE\gamma_{\tau}^{E} remains finite for increasing τ\tau only for α  1\mid\alpha\mid\ \leq\ 1.

AR(1) Simulations

In this section the results obtained for AR(1) equilibrium autocorrelation in the previous section will be compared with simulations. Below an implementation in Python of an AR(1) simulator based on the difference equation representation from equation (16)(16) is listed.

def ar_1_series(α, σ, x0=0.0, nsamples=100):
    samples = numpy.zeros(nsamples)
    ε = numpy.random.normal(0.0, σ, nsamples)
    samples[0] = x0
    for i in range(1, nsamples):
        samples[i] = α * samples[i-1] + ε[i]
    return samples

The function ar_1_series(α, σ, x0, nsamples) takes four arguments: α and σ from equation (16)(16), the initial value of xx, x0, and the number of desired samples, nsamples. It begins by allocating storage for the sample output followed by generation of nsamples values of εNormal(0, σ2)\varepsilon \sim \textbf{Normal}(0,\ \sigma^2) with the requested standard deviation, σ\sigma. The samples are then created using the AR(1 ) difference equation, equation (5)(5).

The plots below show examples of simulated time series with σ=1\sigma=1 and values of α\alpha satisfying α < 1\alpha\ <\ 1. It is seen that for smaller α\alpha values the series more frequently change direction and have smaller variance. This is expected from equation (17)(17).

The next series of plots compare the autocorrelation coefficient from equation (18)(18), obtained in the equilibrium limit, with an autocorrelation coefficient calculation using the previously discussed autocorrelate(x) function on the generated sample data. Recall that autocorrelate(x) performs a calculation using the Cross Correlation Theorem. Equation (18)(18) does a good job of capturing the time scale for the series to become uncorrelated.

Conclusions

The Discrete Cross Correlation Theorem provides a more efficient method of calculating time series correlations than directly evaluating the sums. For a time series of length N it decreases the cost of the calculation from O(N2)O(N^2) to O(NlogN)O(NlogN) by use of the Fast Fourier Transform. An interpretation of the cross correlation as the time lagged covariance of two random variables was presented and followed by a discussion of the properties of Discrete Fourier Transforms needed to prove the Cross Correlation Theorem. After building sufficient tools the theorem was derived by application of the Discrete Fourier Transform to equation (1)(1), which defines cross correlation. An example manual calculation of a Discrete Fourier Transform was performed and compared with a calculation using the FFT library form numpy. Next, manual calculations of cross correlation using a tabular method to represent the summations were presented and followed by a calculation using the Discrete Cross Correlation Theorem which illustrated the problem of aliasing. The next example calculation eliminated aliasing and recovered a result equal to the direct calculation of the summations. A dealiased implementation using numpy FFT libraries was then presented. Finally, the Discrete Cross Correlation Theorem for the special case of the autocorrelation was discussed and a numpy FFT implementation was provided and followed by an example calculation using the AR(1) random process. In conclusion the autocorrelation coefficient in the equilibrium limit for AR(1) was evaluated and shown to be finite only for values of the AR(1) parameter that satisfy α < 1\mid\alpha\mid\ < \ 1. This result is compared to direct simulations and shown to provide a good estimate of the decorrelation time of the process.