Skip to content

Chapter 112. Markov Chains

A Markov chain is a model for random movement among a set of states.

At each step, the process occupies one state. It then moves to another state according to fixed transition probabilities. The defining condition is that the next state depends only on the current state, not on the full history.

This is called the Markov property.

If

X0,X1,X2, X_0, X_1, X_2, \ldots

is a sequence of random states, then the Markov property says

Pr(Xk+1=jXk=i,Xk1,,X0)=Pr(Xk+1=jXk=i). \Pr(X_{k+1}=j \mid X_k=i, X_{k-1},\ldots,X_0) = \Pr(X_{k+1}=j \mid X_k=i).

For finite state spaces, a Markov chain is naturally described by a matrix. This makes Markov chains one of the clearest applications of linear algebra to probability. The transition matrix stores one-step probabilities, its powers describe multi-step transitions, and its eigenvectors describe long-term behavior.

112.1 States

The possible values of a Markov chain are called states.

For a finite Markov chain, the state space is a finite set

S={1,2,,n}. S = \{1,2,\ldots,n\}.

At each time kk, the random variable XkX_k takes one value in SS.

For example, a simple weather model may have three states:

StateMeaning
1Sunny
2Cloudy
3Rainy

If X5=2X_5 = 2, then the model says that the weather on day 55 is cloudy.

The purpose of the Markov chain is to describe how the state changes over time.

112.2 Transition Probabilities

The probability of moving from state ii to state jj in one step is written

pij=Pr(Xk+1=jXk=i). p_{ij} = \Pr(X_{k+1}=j \mid X_k=i).

For a time-homogeneous Markov chain, this probability does not depend on kk. The same transition rule is used at every step.

For each fixed starting state ii, the probabilities satisfy

pi1+pi2++pin=1. p_{i1}+p_{i2}+\cdots+p_{in}=1.

Each probability is nonnegative:

pij0. p_{ij} \geq 0.

These two conditions express a basic fact: from state ii, the chain must move to some state in the state space.

112.3 Transition Matrices

The transition probabilities form an n×nn \times n matrix

P=[p11p12p1np21p22p2npn1pn2pnn]. P = \begin{bmatrix} p_{11} & p_{12} & \cdots & p_{1n} \\ p_{21} & p_{22} & \cdots & p_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ p_{n1} & p_{n2} & \cdots & p_{nn} \end{bmatrix}.

The entry pijp_{ij} is the probability of moving from state ii to state jj.

Since every row sums to 11, PP is a row-stochastic matrix. A finite-state Markov chain can be represented by such a transition matrix.

For example,

P=[0.80.20.30.7] P = \begin{bmatrix} 0.8 & 0.2 \\ 0.3 & 0.7 \end{bmatrix}

describes a chain with two states. From state 11, the chain stays in state 11 with probability 0.80.8 and moves to state 22 with probability 0.20.2. From state 22, it moves to state 11 with probability 0.30.3 and stays in state 22 with probability 0.70.7.

112.4 Probability Distributions as Vectors

A probability distribution on nn states can be written as a row vector

μ=[μ1μ2μn]. \mu = \begin{bmatrix} \mu_1 & \mu_2 & \cdots & \mu_n \end{bmatrix}.

The entry μi\mu_i is the probability that the chain is in state ii.

The entries satisfy

μi0 \mu_i \geq 0

and

μ1+μ2++μn=1. \mu_1+\mu_2+\cdots+\mu_n = 1.

If the initial distribution is μ0\mu_0, then the distribution after one step is

μ1=μ0P. \mu_1 = \mu_0P.

After two steps,

μ2=μ1P=μ0P2. \mu_2 = \mu_1P = \mu_0P^2.

After kk steps,

μk=μ0Pk. \mu_k = \mu_0P^k.

Thus powers of the transition matrix describe the evolution of the Markov chain.

112.5 Multi-Step Transition Probabilities

The entry (Pk)ij(P^k)_{ij} is the probability that the chain moves from state ii to state jj in kk steps.

For k=2k=2,

(P2)ij==1npipj. (P^2)_{ij} = \sum_{\ell=1}^n p_{i\ell}p_{\ell j}.

This formula has a direct probabilistic interpretation. To go from ii to jj in two steps, the chain may pass through any intermediate state \ell. The term pipjp_{i\ell}p_{\ell j} is the probability of the path

ij. i \to \ell \to j.

Summing over all \ell gives the total probability.

The same reasoning extends to PkP^k. Matrix multiplication is exactly the algebra of composing transition probabilities.

112.6 Example: Two-State Chain

Consider the transition matrix

P=[0.90.10.40.6]. P = \begin{bmatrix} 0.9 & 0.1 \\ 0.4 & 0.6 \end{bmatrix}.

Suppose the chain starts in state 11 with probability 11. Then

μ0=[10]. \mu_0 = \begin{bmatrix} 1 & 0 \end{bmatrix}.

After one step,

μ1=μ0P=[0.90.1]. \mu_1 = \mu_0P = \begin{bmatrix} 0.9 & 0.1 \end{bmatrix}.

After two steps,

μ2=μ1P=[0.90.1][0.90.10.40.6]. \mu_2 = \mu_1P = \begin{bmatrix} 0.9 & 0.1 \end{bmatrix} \begin{bmatrix} 0.9 & 0.1 \\ 0.4 & 0.6 \end{bmatrix}.

Therefore

μ2=[0.850.15]. \mu_2 = \begin{bmatrix} 0.85 & 0.15 \end{bmatrix}.

The probability of state 22 has increased from 00 to 0.10.1, then to 0.150.15.

Further powers of PP continue the evolution.

112.7 Stationary Distributions

A stationary distribution is a probability vector π\pi that remains unchanged after applying the transition matrix:

πP=π. \pi P = \pi.

This equation says that if the chain starts with distribution π\pi, then it has the same distribution after one step.

In linear algebra terms, π\pi is a left eigenvector of PP with eigenvalue 11, normalized so that its entries sum to 11.

The stationary distribution describes equilibrium behavior.

It does not say that the state itself becomes fixed. It says that the probability distribution over states becomes fixed.

112.8 Computing a Stationary Distribution

To find a stationary distribution, solve

πP=π. \pi P = \pi.

Equivalently,

π(PI)=0. \pi(P-I)=0.

Using column vectors, this may be written as

(PTI)πT=0. (P^T-I)\pi^T=0.

The solution must also satisfy

π1+π2++πn=1. \pi_1+\pi_2+\cdots+\pi_n = 1.

For the matrix

P=[0.90.10.40.6], P = \begin{bmatrix} 0.9 & 0.1 \\ 0.4 & 0.6 \end{bmatrix},

let

π=[ab]. \pi = \begin{bmatrix} a & b \end{bmatrix}.

The equation πP=π\pi P=\pi gives

0.9a+0.4b=a, 0.9a + 0.4b = a, 0.1a+0.6b=b. 0.1a + 0.6b = b.

These equations are equivalent. From the second,

0.1a=0.4b, 0.1a = 0.4b,

so

a=4b. a = 4b.

Since a+b=1a+b=1,

4b+b=1, 4b+b=1,

so

b=15,a=45. b=\frac{1}{5}, \qquad a=\frac{4}{5}.

Thus

π=[4515]. \pi = \begin{bmatrix} \frac{4}{5} & \frac{1}{5} \end{bmatrix}.

This distribution is unchanged by PP.

112.9 Convergence to Equilibrium

A major question is whether

μ0Pk \mu_0P^k

converges to a stationary distribution as kk\to\infty.

For many finite Markov chains, the answer is yes. In particular, if the chain is irreducible and aperiodic, then it has a unique stationary distribution and the distribution converges to it.

Irreducibility means that every state can eventually reach every other state.

Aperiodicity means that the chain does not move in a fixed repeating cycle.

Under these conditions, long-term behavior becomes independent of the initial distribution.

112.10 Eigenvalues and Long-Term Behavior

The long-term behavior of a Markov chain is controlled by the eigenvalues of its transition matrix.

Since PP is stochastic, 11 is an eigenvalue. The corresponding eigenvector is related to the stationary distribution.

Other eigenvalues determine the rate of convergence.

If the eigenvalues satisfy

1=λ1,λ2λ3, 1 = \lambda_1, \qquad |\lambda_2| \geq |\lambda_3| \geq \cdots,

then the magnitude of λ2\lambda_2 often controls the dominant rate of convergence toward equilibrium.

When λ2|\lambda_2| is close to 11, convergence is slow.

When λ2|\lambda_2| is much smaller than 11, convergence is fast.

This is why spectral methods are important in the study of Markov chains.

112.11 Absorbing States

A state ii is absorbing if, once the chain enters it, the chain remains there forever.

This means

pii=1. p_{ii}=1.

Equivalently, row ii of PP has a 11 in column ii and zeros elsewhere.

For example,

P=[1000.20.50.30.10.20.7] P = \begin{bmatrix} 1 & 0 & 0 \\ 0.2 & 0.5 & 0.3 \\ 0.1 & 0.2 & 0.7 \end{bmatrix}

has state 11 absorbing.

Absorbing chains are used to model completion, failure, extinction, ruin, termination, and hitting events.

Linear algebra can compute absorption probabilities and expected times to absorption.

112.12 Canonical Form for Absorbing Chains

For an absorbing chain, reorder the states so that transient states come first and absorbing states come last.

Then the transition matrix can be written in block form:

P=[QR0I]. P = \begin{bmatrix} Q & R \\ 0 & I \end{bmatrix}.

Here:

BlockMeaning
QQTransitions among transient states
RRTransitions from transient to absorbing states
IIAbsorbing states remain fixed

The matrix

N=(IQ)1 N = (I-Q)^{-1}

is called the fundamental matrix.

The entry NijN_{ij} gives the expected number of visits to transient state jj, starting from transient state ii, before absorption.

The matrix

B=NR B = NR

gives absorption probabilities.

Thus inverse matrices and block matrices provide exact formulas for absorbing Markov chains.

112.13 Random Walks on Graphs

A random walk on a graph is a Markov chain whose states are vertices.

At each step, the process moves from the current vertex to a neighboring vertex.

For an unweighted graph with adjacency matrix AA and degree matrix DD, the transition matrix is

P=D1A, P = D^{-1}A,

assuming every vertex has positive degree.

The entry pijp_{ij} is

pij={1di,if i is adjacent to j,0,otherwise. p_{ij} = \begin{cases} \frac{1}{d_i}, & \text{if } i \text{ is adjacent to } j, \\ 0, & \text{otherwise.} \end{cases}

This connects Markov chains with graph theory.

Random walks are used in ranking, sampling, diffusion, clustering, and network analysis.

112.14 Reversibility

A Markov chain with stationary distribution π\pi is reversible if

πipij=πjpji \pi_i p_{ij} = \pi_j p_{ji}

for all states ii and jj.

This condition is called detailed balance.

It says that, at equilibrium, the probability flow from ii to jj equals the probability flow from jj to ii.

Reversible Markov chains have strong linear algebraic structure. In a weighted inner product determined by π\pi, the transition operator is self-adjoint. This makes spectral analysis especially useful.

112.15 Doubly Stochastic Matrices

A transition matrix is doubly stochastic if every row and every column sums to 11.

That is,

jpij=1 \sum_j p_{ij}=1

for every row ii, and

ipij=1 \sum_i p_{ij}=1

for every column jj.

For an n×nn \times n doubly stochastic matrix, the uniform distribution

π=[1n1n1n] \pi = \begin{bmatrix} \frac{1}{n} & \frac{1}{n} & \cdots & \frac{1}{n} \end{bmatrix}

is stationary.

This follows because column sums equal 11. Thus

πP=π. \pi P = \pi.

Doubly stochastic chains arise in shuffling, averaging, consensus algorithms, and symmetric random processes.

112.16 Periodicity

A state has period dd if returns to that state can occur only at times divisible by dd, and dd is the greatest such divisor.

A chain is aperiodic if all relevant states have period 11.

Periodicity can prevent convergence even when a stationary distribution exists.

For example,

P=[0110] P = \begin{bmatrix} 0 & 1 \\ 1 & 0 \end{bmatrix}

alternates between two states.

It has stationary distribution

π=[1212], \pi = \begin{bmatrix} \frac{1}{2} & \frac{1}{2} \end{bmatrix},

but if the chain starts in state 11, the distribution alternates between

[10] \begin{bmatrix} 1 & 0 \end{bmatrix}

and

[01]. \begin{bmatrix} 0 & 1 \end{bmatrix}.

Thus μ0Pk\mu_0P^k does not converge.

112.17 Reducibility

A chain is reducible if its state space can be decomposed into classes that do not all communicate with each other.

In matrix terms, after a suitable permutation of states, the transition matrix can often be written in block triangular form.

For example,

P=[P11P120P22] P = \begin{bmatrix} P_{11} & P_{12} \\ 0 & P_{22} \end{bmatrix}

means that states in the second block cannot move back to states in the first block.

Reducibility affects stationary distributions and long-term behavior.

Some reducible chains have many stationary distributions. For example, the identity transition matrix leaves every probability distribution unchanged.

112.18 Markov Chains and Linear Operators

A transition matrix acts as a linear operator on probability distributions.

For row vectors,

μμP. \mu \mapsto \mu P.

For column vectors, one may instead use

μPTμ. \mu \mapsto P^T\mu.

The operator preserves total probability and nonnegativity.

This means it maps the probability simplex

Δn1={μRn:μi0,  iμi=1} \Delta^{n-1} = \left\{ \mu \in \mathbb{R}^n : \mu_i \geq 0,\; \sum_i \mu_i = 1 \right\}

to itself.

The fixed points of this operator are stationary distributions.

Thus Markov chains combine geometry, probability, and linear algebra.

112.19 Markov Chains in Applications

Markov chains appear whenever a system changes state probabilistically over time.

FieldExample
Web searchRandom surfer models and PageRank
StatisticsMarkov chain Monte Carlo
PhysicsRandom walks and diffusion
BiologyPopulation genetics
FinanceCredit rating transitions
Queueing theorySystem state evolution
Computer scienceRandomized algorithms
Machine learningSampling and sequence models
ChemistryMolecular state transitions
Operations researchInventory and reliability models

The matrix viewpoint makes these applications computable.

Large Markov chains are often sparse. Efficient computation then depends on sparse matrix multiplication, iterative eigenvalue methods, and numerical stability.

112.20 Summary

A finite Markov chain is a probability model whose state changes according to a transition matrix.

The matrix PP stores one-step transition probabilities. Its powers PkP^k store kk-step transition probabilities. A probability distribution evolves by

μk=μ0Pk. \mu_k = \mu_0P^k.

A stationary distribution satisfies

πP=π. \pi P = \pi.

This is an eigenvector equation with eigenvalue 11.

The long-term behavior of the chain depends on irreducibility, aperiodicity, absorbing structure, and the spectrum of the transition matrix.

Markov chains show how linear algebra describes random processes. A stochastic system becomes a sequence of matrix multiplications, and its equilibrium becomes an eigenvector problem.