# Chapter 112. Markov Chains

# Chapter 112. Markov Chains

A Markov chain is a model for random movement among a set of states.

At each step, the process occupies one state. It then moves to another state according to fixed transition probabilities. The defining condition is that the next state depends only on the current state, not on the full history.

This is called the Markov property.

If

$$
X_0, X_1, X_2, \ldots
$$

is a sequence of random states, then the Markov property says

$$
\Pr(X_{k+1}=j \mid X_k=i, X_{k-1},\ldots,X_0) =
\Pr(X_{k+1}=j \mid X_k=i).
$$

For finite state spaces, a Markov chain is naturally described by a matrix. This makes Markov chains one of the clearest applications of linear algebra to probability. The transition matrix stores one-step probabilities, its powers describe multi-step transitions, and its eigenvectors describe long-term behavior.

## 112.1 States

The possible values of a Markov chain are called states.

For a finite Markov chain, the state space is a finite set

$$
S = \{1,2,\ldots,n\}.
$$

At each time \(k\), the random variable \(X_k\) takes one value in \(S\).

For example, a simple weather model may have three states:

| State | Meaning |
|---|---|
| 1 | Sunny |
| 2 | Cloudy |
| 3 | Rainy |

If \(X_5 = 2\), then the model says that the weather on day \(5\) is cloudy.

The purpose of the Markov chain is to describe how the state changes over time.

## 112.2 Transition Probabilities

The probability of moving from state \(i\) to state \(j\) in one step is written

$$
p_{ij} =
\Pr(X_{k+1}=j \mid X_k=i).
$$

For a time-homogeneous Markov chain, this probability does not depend on \(k\). The same transition rule is used at every step.

For each fixed starting state \(i\), the probabilities satisfy

$$
p_{i1}+p_{i2}+\cdots+p_{in}=1.
$$

Each probability is nonnegative:

$$
p_{ij} \geq 0.
$$

These two conditions express a basic fact: from state \(i\), the chain must move to some state in the state space.

## 112.3 Transition Matrices

The transition probabilities form an \(n \times n\) matrix

$$
P =
\begin{bmatrix}
p_{11} & p_{12} & \cdots & p_{1n} \\
p_{21} & p_{22} & \cdots & p_{2n} \\
\vdots & \vdots & \ddots & \vdots \\
p_{n1} & p_{n2} & \cdots & p_{nn}
\end{bmatrix}.
$$

The entry \(p_{ij}\) is the probability of moving from state \(i\) to state \(j\).

Since every row sums to \(1\), \(P\) is a row-stochastic matrix. A finite-state Markov chain can be represented by such a transition matrix.

For example,

$$
P =
\begin{bmatrix}
0.8 & 0.2 \\
0.3 & 0.7
\end{bmatrix}
$$

describes a chain with two states. From state \(1\), the chain stays in state \(1\) with probability \(0.8\) and moves to state \(2\) with probability \(0.2\). From state \(2\), it moves to state \(1\) with probability \(0.3\) and stays in state \(2\) with probability \(0.7\).

## 112.4 Probability Distributions as Vectors

A probability distribution on \(n\) states can be written as a row vector

$$
\mu =
\begin{bmatrix}
\mu_1 & \mu_2 & \cdots & \mu_n
\end{bmatrix}.
$$

The entry \(\mu_i\) is the probability that the chain is in state \(i\).

The entries satisfy

$$
\mu_i \geq 0
$$

and

$$
\mu_1+\mu_2+\cdots+\mu_n = 1.
$$

If the initial distribution is \(\mu_0\), then the distribution after one step is

$$
\mu_1 = \mu_0P.
$$

After two steps,

$$
\mu_2 = \mu_1P = \mu_0P^2.
$$

After \(k\) steps,

$$
\mu_k = \mu_0P^k.
$$

Thus powers of the transition matrix describe the evolution of the Markov chain.

## 112.5 Multi-Step Transition Probabilities

The entry \((P^k)_{ij}\) is the probability that the chain moves from state \(i\) to state \(j\) in \(k\) steps.

For \(k=2\),

$$
(P^2)_{ij} =
\sum_{\ell=1}^n p_{i\ell}p_{\ell j}.
$$

This formula has a direct probabilistic interpretation. To go from \(i\) to \(j\) in two steps, the chain may pass through any intermediate state \(\ell\). The term \(p_{i\ell}p_{\ell j}\) is the probability of the path

$$
i \to \ell \to j.
$$

Summing over all \(\ell\) gives the total probability.

The same reasoning extends to \(P^k\). Matrix multiplication is exactly the algebra of composing transition probabilities.

## 112.6 Example: Two-State Chain

Consider the transition matrix

$$
P =
\begin{bmatrix}
0.9 & 0.1 \\
0.4 & 0.6
\end{bmatrix}.
$$

Suppose the chain starts in state \(1\) with probability \(1\). Then

$$
\mu_0 =
\begin{bmatrix}
1 & 0
\end{bmatrix}.
$$

After one step,

$$
\mu_1 =
\mu_0P =
\begin{bmatrix}
0.9 & 0.1
\end{bmatrix}.
$$

After two steps,

$$
\mu_2 =
\mu_1P =
\begin{bmatrix}
0.9 & 0.1
\end{bmatrix}
\begin{bmatrix}
0.9 & 0.1 \\
0.4 & 0.6
\end{bmatrix}.
$$

Therefore

$$
\mu_2 =
\begin{bmatrix}
0.85 & 0.15
\end{bmatrix}.
$$

The probability of state \(2\) has increased from \(0\) to \(0.1\), then to \(0.15\).

Further powers of \(P\) continue the evolution.

## 112.7 Stationary Distributions

A stationary distribution is a probability vector \(\pi\) that remains unchanged after applying the transition matrix:

$$
\pi P = \pi.
$$

This equation says that if the chain starts with distribution \(\pi\), then it has the same distribution after one step.

In linear algebra terms, \(\pi\) is a left eigenvector of \(P\) with eigenvalue \(1\), normalized so that its entries sum to \(1\).

The stationary distribution describes equilibrium behavior.

It does not say that the state itself becomes fixed. It says that the probability distribution over states becomes fixed.

## 112.8 Computing a Stationary Distribution

To find a stationary distribution, solve

$$
\pi P = \pi.
$$

Equivalently,

$$
\pi(P-I)=0.
$$

Using column vectors, this may be written as

$$
(P^T-I)\pi^T=0.
$$

The solution must also satisfy

$$
\pi_1+\pi_2+\cdots+\pi_n = 1.
$$

For the matrix

$$
P =
\begin{bmatrix}
0.9 & 0.1 \\
0.4 & 0.6
\end{bmatrix},
$$

let

$$
\pi =
\begin{bmatrix}
a & b
\end{bmatrix}.
$$

The equation \(\pi P=\pi\) gives

$$
0.9a + 0.4b = a,
$$

$$
0.1a + 0.6b = b.
$$

These equations are equivalent. From the second,

$$
0.1a = 0.4b,
$$

so

$$
a = 4b.
$$

Since \(a+b=1\),

$$
4b+b=1,
$$

so

$$
b=\frac{1}{5},
\qquad
a=\frac{4}{5}.
$$

Thus

$$
\pi =
\begin{bmatrix}
\frac{4}{5} & \frac{1}{5}
\end{bmatrix}.
$$

This distribution is unchanged by \(P\).

## 112.9 Convergence to Equilibrium

A major question is whether

$$
\mu_0P^k
$$

converges to a stationary distribution as \(k\to\infty\).

For many finite Markov chains, the answer is yes. In particular, if the chain is irreducible and aperiodic, then it has a unique stationary distribution and the distribution converges to it.

Irreducibility means that every state can eventually reach every other state.

Aperiodicity means that the chain does not move in a fixed repeating cycle.

Under these conditions, long-term behavior becomes independent of the initial distribution.

## 112.10 Eigenvalues and Long-Term Behavior

The long-term behavior of a Markov chain is controlled by the eigenvalues of its transition matrix.

Since \(P\) is stochastic, \(1\) is an eigenvalue. The corresponding eigenvector is related to the stationary distribution.

Other eigenvalues determine the rate of convergence.

If the eigenvalues satisfy

$$
1 = \lambda_1,
\qquad
|\lambda_2| \geq |\lambda_3| \geq \cdots,
$$

then the magnitude of \(\lambda_2\) often controls the dominant rate of convergence toward equilibrium.

When \(|\lambda_2|\) is close to \(1\), convergence is slow.

When \(|\lambda_2|\) is much smaller than \(1\), convergence is fast.

This is why spectral methods are important in the study of Markov chains.

## 112.11 Absorbing States

A state \(i\) is absorbing if, once the chain enters it, the chain remains there forever.

This means

$$
p_{ii}=1.
$$

Equivalently, row \(i\) of \(P\) has a \(1\) in column \(i\) and zeros elsewhere.

For example,

$$
P =
\begin{bmatrix}
1 & 0 & 0 \\
0.2 & 0.5 & 0.3 \\
0.1 & 0.2 & 0.7
\end{bmatrix}
$$

has state \(1\) absorbing.

Absorbing chains are used to model completion, failure, extinction, ruin, termination, and hitting events.

Linear algebra can compute absorption probabilities and expected times to absorption.

## 112.12 Canonical Form for Absorbing Chains

For an absorbing chain, reorder the states so that transient states come first and absorbing states come last.

Then the transition matrix can be written in block form:

$$
P =
\begin{bmatrix}
Q & R \\
0 & I
\end{bmatrix}.
$$

Here:

| Block | Meaning |
|---|---|
| \(Q\) | Transitions among transient states |
| \(R\) | Transitions from transient to absorbing states |
| \(I\) | Absorbing states remain fixed |

The matrix

$$
N = (I-Q)^{-1}
$$

is called the fundamental matrix.

The entry \(N_{ij}\) gives the expected number of visits to transient state \(j\), starting from transient state \(i\), before absorption.

The matrix

$$
B = NR
$$

gives absorption probabilities.

Thus inverse matrices and block matrices provide exact formulas for absorbing Markov chains.

## 112.13 Random Walks on Graphs

A random walk on a graph is a Markov chain whose states are vertices.

At each step, the process moves from the current vertex to a neighboring vertex.

For an unweighted graph with adjacency matrix \(A\) and degree matrix \(D\), the transition matrix is

$$
P = D^{-1}A,
$$

assuming every vertex has positive degree.

The entry \(p_{ij}\) is

$$
p_{ij} =
\begin{cases}
\frac{1}{d_i}, & \text{if } i \text{ is adjacent to } j, \\
0, & \text{otherwise.}
\end{cases}
$$

This connects Markov chains with graph theory.

Random walks are used in ranking, sampling, diffusion, clustering, and network analysis.

## 112.14 Reversibility

A Markov chain with stationary distribution \(\pi\) is reversible if

$$
\pi_i p_{ij} = \pi_j p_{ji}
$$

for all states \(i\) and \(j\).

This condition is called detailed balance.

It says that, at equilibrium, the probability flow from \(i\) to \(j\) equals the probability flow from \(j\) to \(i\).

Reversible Markov chains have strong linear algebraic structure. In a weighted inner product determined by \(\pi\), the transition operator is self-adjoint. This makes spectral analysis especially useful.

## 112.15 Doubly Stochastic Matrices

A transition matrix is doubly stochastic if every row and every column sums to \(1\).

That is,

$$
\sum_j p_{ij}=1
$$

for every row \(i\), and

$$
\sum_i p_{ij}=1
$$

for every column \(j\).

For an \(n \times n\) doubly stochastic matrix, the uniform distribution

$$
\pi =
\begin{bmatrix}
\frac{1}{n} & \frac{1}{n} & \cdots & \frac{1}{n}
\end{bmatrix}
$$

is stationary.

This follows because column sums equal \(1\). Thus

$$
\pi P = \pi.
$$

Doubly stochastic chains arise in shuffling, averaging, consensus algorithms, and symmetric random processes.

## 112.16 Periodicity

A state has period \(d\) if returns to that state can occur only at times divisible by \(d\), and \(d\) is the greatest such divisor.

A chain is aperiodic if all relevant states have period \(1\).

Periodicity can prevent convergence even when a stationary distribution exists.

For example,

$$
P =
\begin{bmatrix}
0 & 1 \\
1 & 0
\end{bmatrix}
$$

alternates between two states.

It has stationary distribution

$$
\pi =
\begin{bmatrix}
\frac{1}{2} & \frac{1}{2}
\end{bmatrix},
$$

but if the chain starts in state \(1\), the distribution alternates between

$$
\begin{bmatrix}
1 & 0
\end{bmatrix}
$$

and

$$
\begin{bmatrix}
0 & 1
\end{bmatrix}.
$$

Thus \(\mu_0P^k\) does not converge.

## 112.17 Reducibility

A chain is reducible if its state space can be decomposed into classes that do not all communicate with each other.

In matrix terms, after a suitable permutation of states, the transition matrix can often be written in block triangular form.

For example,

$$
P =
\begin{bmatrix}
P_{11} & P_{12} \\
0 & P_{22}
\end{bmatrix}
$$

means that states in the second block cannot move back to states in the first block.

Reducibility affects stationary distributions and long-term behavior.

Some reducible chains have many stationary distributions. For example, the identity transition matrix leaves every probability distribution unchanged.

## 112.18 Markov Chains and Linear Operators

A transition matrix acts as a linear operator on probability distributions.

For row vectors,

$$
\mu \mapsto \mu P.
$$

For column vectors, one may instead use

$$
\mu \mapsto P^T\mu.
$$

The operator preserves total probability and nonnegativity.

This means it maps the probability simplex

$$
\Delta^{n-1} =
\left\{
\mu \in \mathbb{R}^n :
\mu_i \geq 0,\;
\sum_i \mu_i = 1
\right\}
$$

to itself.

The fixed points of this operator are stationary distributions.

Thus Markov chains combine geometry, probability, and linear algebra.

## 112.19 Markov Chains in Applications

Markov chains appear whenever a system changes state probabilistically over time.

| Field | Example |
|---|---|
| Web search | Random surfer models and PageRank |
| Statistics | Markov chain Monte Carlo |
| Physics | Random walks and diffusion |
| Biology | Population genetics |
| Finance | Credit rating transitions |
| Queueing theory | System state evolution |
| Computer science | Randomized algorithms |
| Machine learning | Sampling and sequence models |
| Chemistry | Molecular state transitions |
| Operations research | Inventory and reliability models |

The matrix viewpoint makes these applications computable.

Large Markov chains are often sparse. Efficient computation then depends on sparse matrix multiplication, iterative eigenvalue methods, and numerical stability.

## 112.20 Summary

A finite Markov chain is a probability model whose state changes according to a transition matrix.

The matrix \(P\) stores one-step transition probabilities. Its powers \(P^k\) store \(k\)-step transition probabilities. A probability distribution evolves by

$$
\mu_k = \mu_0P^k.
$$

A stationary distribution satisfies

$$
\pi P = \pi.
$$

This is an eigenvector equation with eigenvalue \(1\).

The long-term behavior of the chain depends on irreducibility, aperiodicity, absorbing structure, and the spectrum of the transition matrix.

Markov chains show how linear algebra describes random processes. A stochastic system becomes a sequence of matrix multiplications, and its equilibrium becomes an eigenvector problem.