A Markov Chain is a sequence of states where transitions between states occur ordered in time with the probability of transition depending only on the previous state. Here the states will be assumed a continuous unbounded set and time a discrete unbounded set. If the set of states is given by, , the probability that the process will be in state at time , denoted by , is referred to as the distribution. Markov Chain equilibrium is defined by , that is, as time advances becomes independent of time. Here a solution for this limit is discussed and illustrated with examples.
The Markov Chain is constructed from the set of states , ordered in time, where . The process starts at time with state . At the next step, , the process will assume a state , , with probability since it will depend on the state at by the definition of a Markov Process. At the next time step the process state will be with probability, since by definition the probability of state transition depends only upon the state at the previous time step. For an arbitrary time the transition from a state to a state will occur with probability, that is independent of . The Transition Kernel, defined by,
plays the same role as the transition matrix plays in the theory of Discrete State Markov Chains. In general cannot always be represented by a probability density function, but here it is assumed that it has a density function representation. This leads to the interpretation that for a known value of can be interpreted as a conditional probability density for a transition to state given that the process is in state , namely,
Consequently, is a family of conditional probability distributions each representing conditioning on a possible state of the chain at time step . Since is a conditional probability density for each value of it follows that,
The transition kernel for a single step in the Markov Process is defined by . The transition kernel across two time steps is computed as follows. Begin by denoting the process state at time by , the state at by and the state at by . Then the probability of transitioning to a state from in two steps, , is given by,
where use was made of the Law of Total Probability, in the first step and second step used the Markov Chain property . Now, the result obtained for the two step transition kernel is the continuous version of matrix multiplication of the single step transitions probabilities. A operator inspired by matrix multiplication would be helpful. Make the definition,
Using this operator, with , the three step transition kernel is given by,
where the second step substitutes the two step transition kernel and the remaining steps perform the decomposition to the single step transition kernel, . Use of Mathematical Induction is used to show that the transition kernel between two states in an arbitrary number of steps, , is given by,
The distribution of Markov Chain states is defined by,
To determine the equilibrium distribution, , the time variability of must be determined. Begin by considering an arbitrary distribution at , . The distribution after the first step will be,
The distribution after two steps is,
The pattern becomes more apparent after the third step,
This looks similar to the previous result obtained for the time dependence of the transition kernel. Making use of the operator gives,
Mathematical Induction can be used to prove the distribution at an arbitrary time is given by,
The equilibrium distribution is defined as the invariant solution to equation for arbitrary , namely,
substitution into equation gives,
Thus equation defines the time invariant solution to equation .
To go further a particular form for the transition kernel must be specified. Unlike the discrete state model a general solution cannot be obtained since convergence of the limit will depend on the assumed transition kernel. The following section will describe a solution to equation arising from a simple stochastic processes.
To evaluate the equilibrium distribution a form for the transition kernel must be assumed. Here the AR(1) stochastic process is considered. AR(1) is a simple first order autoregressive model providing an example of a continuous state Markov Chain. In following sections its equilibrium distribution is determined and the results of simulations are discussed.
AR(1) is defined by the difference equation,
where and the are identically distributed independent random variables with zero mean and variance, . The subscript on indicates that it is generated at time step . The transition kernel for AR(1) can be derived from equation by using, and letting and so that . The result is,
Now, consider a few steps,
substituting the first equation into the second equation and that result into the third equation gives,
If this process is continued for steps the following result is obtained,
In this section equation is used to evaluate the the mean and variance of the AR(1) process in the equilibrium limit . The mean and variance obtained are then used to construct that is shown to be a solution to equation .
The equilibrium mean is given by,
From equation it is seen that,
where the last step follows from . Now,
this limit exists for ,
The equilibrium variance is given by,
To evaluate this expression and equation for is needed. From equation it follows that,
It follows that,
where the last step follows from . Since the are independent,
where is the Kronecker Delta,
Using these results gives,
The last step follows from summation of a geometric series,
In the limit only converges for . If the denominator of the second term in of is is undefined. is evaluated assuming if this is the case gives , so,
The equilibrium distribution, , is found by substituting the results from equation and into a distribution to obtain,
It can be shown that equation is the equilibrium distribution by verifying that it is a solution to equation . Inserting equations and into equation yields,
An AR(1) simulator can be implemented using either the difference equation definition, equation or the transition kernel, equation . The result produced by either are statistically identical, An example implementation in Python using equation is shown below.
def ar_1_difference_series(α, σ, x0, nsample=100): samples = numpy.zeros(nsample) ε = numpy.random.normal(0.0, σ, nsample) samples = x0 for i in range(1, nsample): samples[i] = α * samples[i-1] + ε[i] return samples
ar_1_difference_series(α, σ, x0, nsamples) takes four arguments:
σ, the initial value
x0 and the number of desired samples
nsample. It begins by allocating storage
for the sample output followed by generation of
nsample values of with the requested standard deviation, . The samples are then created using the AR(1 ) difference equation, equation .
A second implementation using the transition kernel, equation is shown below.
def ar1_kernel_series(α, σ, x0, nsample=100): samples = numpy.zeros(nsample) samples = x0 for i in range(1, nsample): samples[i] = numpy.random.normal(α * samples[i-1], σ) return samples
ar1_kernel_series(α, σ, x0, nsamples) also takes four arguments:
from equation ,
the initial value of ,
x0 and the number of desired samples,
It begins by allocating storage for the sample output and then generates samples using the transition kernel
with distribution .
The plots below show examples of time series generated
ar_1_difference_series with and values of
satisfying . It is seen that for smaller values of
the series more frequently change direction and have smaller variance. This is expected from
equation , where .
If is just slightly larger than the time series can increase rapidly as illustrated in the plot below.
For is bounded and the generated samples are constrained about the equilibrium mean value, , but for is unbounded and the samples very quickly develop very large deviations from .
Convergence to Equilibrium
For sufficiently large times samples generated by the AR(1) process will approach the equilibrium values
for . In the plots following the cumulative values of both the
mean and standard deviation computed from simulations, using
, are compared with the equilibrium vales
and from equations and
respectively. The first plot illustrates the convergence
to for six different simulations with varying initial states,
, and . The rate of convergence is seen to
decrease as increases. For smaller the simulation
is close to within
samples and indistinguishable from by . For larger
values of the convergence is mush slower. After
samples there are still noticeable oscillations of the sampled about
Since varies with for clarity only simulations with varying are shown. The rate of convergence to is slightly slower than the rate seem for . For smaller simulation computations are indistinguishable form by samples. For larger after sample deviations about the are still visible.
The plot below favorably compares the histogram produced from results of a simulation of samples and the equilibrium distribution, , from equation .
A more efficient method of estimating is obtained from equation by noting that for sufficiently large number of samples equation can be approximated by the expectation of the transition kernel, namely,
A Python implementation of the solution to equation for the average transition kernel is listed below where two functions are defined.
def ar_1_kernel(α, σ, x, y): p = numpy.zeros(len(y)) for i in range(0, len(y)): ε = ((y[i] - α * x)**2) / (2.0 * σ**2) p[i] = numpy.exp(-ε) / numpy.sqrt(2 * numpy.pi * σ**2) return p def ar_1_equilibrium_distributions(α, σ, x0, y, nsample=100): py = [ar_1_kernel(α, σ, x, y) for x in ar_1_difference_series(α, σ, x0, nsample)] pavg = [py] for i in range(1, len(py)): pavg.append((py[i] + i * pavg[i-1]) / (i + 1)) return pavg
The first function
ar_1_kernel(α, σ, x, y) computes the transition kernel for a range of values and takes four arguments as
σ from equation and the value of
x and an array of
y values where
the transition kernel is evaluated. The second function
ar_1_equilibrium_distributions(α, σ, x0, y, nsample) has five arguments as input:
σ, the initial value of ,
x0, the array of
y values used to evaluate the transition kernel and
the number of desired samples
ar_1_equilibrium_distributions begins by calling
ar_1_difference_series to generate
nsample samples of
x. These values and the needed inputs are then passed to
nsample evaluations of
the transition kernel. The cumulative average of the transition kernel is then evaluated and returned.
In practice this method gives reasonable results for as few as samples. This is illustrated
in the following plot where the transition kernel mean value computed with just samples
ar_1_equilibrium_distributions is compared from equation
The following plot shows intermediate values the calculation in the range of to samples. This illustrates the changes in the estimated equilibrium distribution as the calculation progresses. By samples a distribution near the equilibrium distribution is obtained.
Markov Chain equilibrium for continuous state processes provides a general theory of the time evolution of stochastic kernels and distributions. Unlike the case for the discrete state model general solutions cannot be obtained since evaluation of the obtained equations depends of the form of the stochastic kernel. Kernels will exist which do not have equilibrium solutions. A continuous state process that has an equilibrium distribution that can be analytically evaluated is AR(1). The stochastic kernel for AR(1) is derived from its difference equation representation and the first and second moments are evaluated in the equilibrium limit, . It is shown that finite values exists only for values of the AR(1) parameter that satisfy . A distribution is then constructed using these moments and shown to be the equilibrium distribution. Simulations are performed using the difference equation representation of the process and compared with the equilibrium calculations. The rate of convergence of simulations to the equilibrium is shown to depend on . For values not near convergence of the mean occurs with time steps and convergence of the standard deviation with time steps. For values of closer to convergence has not occurred by time steps.