# Chapter 93. Power Iteration

# Chapter 93. Power Iteration

Power iteration is an iterative method for approximating a dominant eigenvalue and a corresponding eigenvector of a square matrix. It is one of the simplest eigenvalue algorithms in numerical linear algebra.

The method applies the matrix repeatedly to a vector:

$$
v,\quad Av,\quad A^2v,\quad A^3v,\ldots.
$$

If one eigenvalue has strictly larger magnitude than all the others, repeated multiplication by \(A\) amplifies the component in that dominant eigendirection. After normalization, the vector sequence tends toward a dominant eigenvector. The eigenvalue can then be estimated by a scalar quotient such as the Rayleigh quotient. Power iteration is simple and may converge slowly, but it is effective for large sparse or matrix-free problems because each step mainly requires one matrix-vector product.

## 93.1 Dominant Eigenvalue

Let

$$
A\in \mathbb{R}^{n\times n}.
$$

An eigenpair of \(A\) is a scalar \(\lambda\) and a nonzero vector \(v\) satisfying

$$
Av=\lambda v.
$$

An eigenvalue \(\lambda_1\) is called dominant if

$$
|\lambda_1|>|\lambda_j|
$$

for every other eigenvalue \(\lambda_j\).

The power method is designed to approximate this dominant eigenvalue and an associated eigenvector.

## 93.2 Basic Idea

Suppose \(A\) has eigenvectors

$$
v_1,v_2,\ldots,v_n
$$

with eigenvalues

$$
\lambda_1,\lambda_2,\ldots,\lambda_n,
$$

ordered so that

$$
|\lambda_1|>|\lambda_2|\ge \cdots \ge |\lambda_n|.
$$

Suppose the initial vector \(x^{(0)}\) has a nonzero component in the direction of \(v_1\). Then

$$
x^{(0)} =
c_1v_1+c_2v_2+\cdots+c_nv_n,
\qquad c_1\ne 0.
$$

Applying \(A^k\) gives

$$
A^kx^{(0)} =
c_1\lambda_1^kv_1
+
c_2\lambda_2^kv_2
+
\cdots
+
c_n\lambda_n^kv_n.
$$

Factor out the dominant term:

$$
A^kx^{(0)} =
\lambda_1^k
\left(
c_1v_1
+
c_2\left(\frac{\lambda_2}{\lambda_1}\right)^kv_2
+
\cdots
+
c_n\left(\frac{\lambda_n}{\lambda_1}\right)^kv_n
\right).
$$

Since

$$
\left|\frac{\lambda_j}{\lambda_1}\right|<1
$$

for \(j>1\), the non-dominant terms decay relative to the dominant term.

This is the mechanism behind power iteration.

## 93.3 Normalization

Directly computing

$$
A^kx^{(0)}
$$

is numerically unsafe. The vector may grow without bound or underflow toward zero.

Power iteration normalizes after each multiplication.

A standard form is:

$$
y^{(k+1)}=Ax^{(k)},
$$

$$
x^{(k+1)} =
\frac{y^{(k+1)}}{\|y^{(k+1)}\|}.
$$

The normalization keeps the vector at manageable scale.

It does not change the direction of the vector.

Since eigenvectors are defined only up to nonzero scalar multiples, preserving direction is sufficient.

## 93.4 Basic Algorithm

Choose an initial nonzero vector

$$
x^{(0)}.
$$

For

$$
k=0,1,2,\ldots
$$

compute

$$
y^{(k+1)}=Ax^{(k)}.
$$

Normalize:

$$
x^{(k+1)} =
\frac{y^{(k+1)}}{\|y^{(k+1)}\|_2}.
$$

Estimate the eigenvalue using the Rayleigh quotient:

$$
\mu_{k+1} =
\frac{(x^{(k+1)})^TAx^{(k+1)}}{(x^{(k+1)})^Tx^{(k+1)}}.
$$

If \(x^{(k+1)}\) is normalized, this reduces to

$$
\mu_{k+1} =
(x^{(k+1)})^TAx^{(k+1)}.
$$

The Rayleigh quotient converges to the dominant eigenvalue when the vector converges to the corresponding eigenvector.

## 93.5 Pseudocode

```text
power_iteration(A, x, tol, max_iter):
    x = x / norm(x)

    for k = 0 to max_iter - 1:
        y = A * x

        if norm(y) == 0:
            error "zero vector encountered"

        x_new = y / norm(y)

        lambda_new = dot(x_new, A * x_new) / dot(x_new, x_new)

        r = A * x_new - lambda_new * x_new

        if norm(r) <= tol:
            return lambda_new, x_new

        x = x_new

    return lambda_new, x
```

The residual

$$
r=Ax-\mu x
$$

measures how close \((\mu,x)\) is to an eigenpair.

## 93.6 Example

Consider

$$
A=
\begin{bmatrix}
2 & 0\\
0 & 1
\end{bmatrix}.
$$

The eigenvalues are

$$
2
\qquad \text{and} \qquad 1.
$$

The dominant eigenvalue is

$$
\lambda_1=2.
$$

Let

$$
x^{(0)} =
\begin{bmatrix}
1\\
1
\end{bmatrix}.
$$

Then

$$
Ax^{(0)} =
\begin{bmatrix}
2\\
1
\end{bmatrix}.
$$

Next,

$$
A^2x^{(0)} =
\begin{bmatrix}
4\\
1
\end{bmatrix}.
$$

Then

$$
A^3x^{(0)} =
\begin{bmatrix}
8\\
1
\end{bmatrix}.
$$

After normalization, the direction approaches

$$
\begin{bmatrix}
1\\
0
\end{bmatrix},
$$

which is the eigenvector associated with \(\lambda_1=2\).

The second component becomes negligible relative to the first because

$$
\left(\frac{1}{2}\right)^k
$$

decays to zero.

## 93.7 Convergence Rate

The convergence rate is controlled by the ratio

$$
\left|\frac{\lambda_2}{\lambda_1}\right|.
$$

If this ratio is small, convergence is fast.

If this ratio is close to \(1\), convergence is slow.

For example:

| Ratio | Behavior |
|---:|---|
| \(0.1\) | fast convergence |
| \(0.5\) | moderate convergence |
| \(0.95\) | slow convergence |
| \(1\) | no spectral separation |

Power iteration converges geometrically under the usual assumptions, with rate governed by the dominant eigenvalue gap.

## 93.8 Spectral Gap

The spectral gap is the separation between the dominant eigenvalue and the remaining eigenvalues.

For power iteration, the important quantity is often not the difference

$$
|\lambda_1|-|\lambda_2|,
$$

but the ratio

$$
\frac{|\lambda_2|}{|\lambda_1|}.
$$

A large gap means

$$
\frac{|\lambda_2|}{|\lambda_1|}
$$

is small.

A small gap means the ratio is close to \(1\).

The method becomes slow when the dominant eigenvalue is only slightly larger in magnitude than the next eigenvalue.

## 93.9 Sign and Phase Behavior

If the dominant eigenvalue is positive, the normalized iterates often converge directly to a dominant eigenvector.

If the dominant eigenvalue is negative, signs may alternate:

$$
x^{(k)}\approx (-1)^k v_1.
$$

The direction still converges, but the vector may alternate between \(v_1\) and \(-v_1\).

For complex eigenvalues, a phase factor may appear. The direction may converge in a projective sense even when the raw vector sequence does not converge in the ordinary sense.

## 93.10 Rayleigh Quotient

The Rayleigh quotient of a nonzero vector \(x\) is

$$
\rho(x)=\frac{x^TAx}{x^Tx}.
$$

If \(x\) is an eigenvector, then

$$
Ax=\lambda x,
$$

so

$$
\rho(x) =
\frac{x^T\lambda x}{x^Tx} =
\lambda.
$$

Thus the Rayleigh quotient gives an eigenvalue estimate from an approximate eigenvector.

For symmetric matrices, the Rayleigh quotient has especially strong approximation properties.

## 93.11 Residual Test

Given an approximate eigenpair \((\mu,x)\), the residual is

$$
r=Ax-\mu x.
$$

If

$$
r=0,
$$

then \((\mu,x)\) is an exact eigenpair.

A practical stopping rule is

$$
\|Ax-\mu x\|_2 \le \tau.
$$

A relative version is

$$
\frac{\|Ax-\mu x\|_2}{\|A\|_2\|x\|_2}
\le \tau.
$$

Residual-based stopping is more meaningful than stopping only when successive iterates change little.

## 93.12 Choice of Initial Vector

The initial vector must have a nonzero component in the dominant eigendirection.

If

$$
x^{(0)}
$$

is orthogonal to the dominant eigenvector in a way that removes the dominant component, power iteration cannot recover that component.

In practice, a random initial vector is often used. For ordinary finite-dimensional problems, a random vector has nonzero component in any fixed direction with probability one.

## 93.13 Failure Cases

Power iteration may fail or behave poorly in several cases.

| Case | Behavior |
|---|---|
| No unique dominant eigenvalue | may not converge to one direction |
| \(|\lambda_1|\approx|\lambda_2|\) | slow convergence |
| Initial vector lacks dominant component | wrong invariant subspace |
| Defective matrix | convergence may be altered |
| Dominant complex pair | real iteration may oscillate or rotate |
| Non-normal matrix | eigenvalue behavior may be misleading |

The method is reliable only when its assumptions match the matrix.

## 93.14 Defective and Non-Diagonalizable Matrices

The cleanest convergence analysis assumes a diagonalizable matrix.

If \(A\) is defective, the Jordan form contains nontrivial Jordan blocks. The power \(A^k\) may contain polynomial factors in \(k\), not only powers of eigenvalues.

When a unique dominant eigenvalue exists, power iteration may still converge under suitable conditions, but the analysis is more delicate.

The diagonalizable case gives the main intuition.

## 93.15 Symmetric Matrices

For real symmetric matrices, power iteration has a simpler theory.

A symmetric matrix has an orthonormal eigenbasis. Therefore any initial vector can be expanded as

$$
x^{(0)} =
c_1q_1+c_2q_2+\cdots+c_nq_n.
$$

Then

$$
A^kx^{(0)} =
c_1\lambda_1^kq_1+
c_2\lambda_2^kq_2+
\cdots+
c_n\lambda_n^kq_n.
$$

Orthogonality removes many complications present in non-normal matrices.

Symmetric power iteration is therefore easier to analyze and often more numerically predictable.

## 93.16 Cost per Iteration

Each power iteration step mainly requires one matrix-vector product.

For a dense matrix,

$$
Ax
$$

costs

$$
O(n^2).
$$

For a sparse matrix, the cost is approximately

$$
O(\operatorname{nnz}(A)).
$$

The storage requirement is small:

| Object | Storage |
|---|---:|
| Matrix \(A\) | problem-dependent |
| Current vector \(x\) | \(n\) |
| Product vector \(y=Ax\) | \(n\) |
| Optional residual | \(n\) |

This low memory usage is one of the main strengths of the method.

## 93.17 Matrix-Free Power Iteration

Power iteration does not require direct access to all entries of \(A\).

It only requires a routine that computes

$$
x\mapsto Ax.
$$

This is useful when \(A\) is an operator rather than an explicitly stored matrix.

Examples include:

| Setting | Matrix-vector operation |
|---|---|
| graph algorithms | multiply by adjacency or transition matrix |
| PDE solvers | apply discretized differential operator |
| statistics | apply covariance matrix implicitly |
| machine learning | apply Hessian-vector product |
| web ranking | multiply by link transition matrix |

This is why power iteration remains relevant despite its simplicity.

## 93.18 PageRank Connection

PageRank can be interpreted as an eigenvector problem for a large transition matrix.

The target vector is a stationary distribution satisfying an equation of the form

$$
p = Gp,
$$

where \(G\) is a stochastic matrix modified by damping.

Power iteration is natural here because the matrix is enormous and sparse, and the main operation is repeated multiplication by the transition operator. Power iteration is historically associated with large sparse web matrices and PageRank-style computations.

## 93.19 Inverse Power Iteration

Power iteration finds the eigenvalue of largest magnitude.

Inverse power iteration applies power iteration to

$$
A^{-1}.
$$

The eigenvalues of \(A^{-1}\) are

$$
\frac{1}{\lambda_i}.
$$

Thus the largest eigenvalue of \(A^{-1}\) corresponds to the smallest magnitude eigenvalue of \(A\).

Each step requires solving

$$
Ay^{(k+1)}=x^{(k)}
$$

rather than multiplying by \(A\).

This is more expensive, but it targets small eigenvalues.

## 93.20 Shifted Inverse Iteration

To find an eigenvalue near a chosen shift \(\sigma\), apply inverse iteration to

$$
A-\sigma I.
$$

The eigenvalues of

$$
(A-\sigma I)^{-1}
$$

are

$$
\frac{1}{\lambda_i-\sigma}.
$$

The largest magnitude corresponds to the eigenvalue \(\lambda_i\) closest to \(\sigma\).

Thus shifted inverse iteration can target interior eigenvalues, not only extremal ones. This shift idea is a standard extension of the power method.

## 93.21 Rayleigh Quotient Iteration

Rayleigh quotient iteration updates the shift dynamically using the Rayleigh quotient.

At step \(k\), set

$$
\sigma_k=\rho(x^{(k)}).
$$

Then solve

$$
(A-\sigma_k I)y^{(k+1)}=x^{(k)}.
$$

Normalize:

$$
x^{(k+1)}=\frac{y^{(k+1)}}{\|y^{(k+1)}\|}.
$$

For symmetric matrices, this method can converge very rapidly when started sufficiently close to an eigenvector.

It is more expensive per iteration because it requires solving shifted linear systems.

## 93.22 Deflation

Power iteration gives one eigenpair.

To compute additional eigenpairs, one may use deflation.

For a symmetric matrix with normalized dominant eigenvector \(q_1\), define

$$
B=A-\lambda_1 q_1q_1^T.
$$

Then \(B\) removes the contribution of the first eigenpair.

Power iteration can then be applied to \(B\) to find the next eigenpair.

Deflation must be used carefully because numerical errors in the first eigenvector can affect later eigenpairs.

## 93.23 Relation to Krylov Subspaces

Power iteration generates the sequence

$$
x^{(0)},\quad Ax^{(0)},\quad A^2x^{(0)},\ldots.
$$

These vectors span a Krylov subspace:

$$
\mathcal{K}_k(A,x^{(0)}) =
\operatorname{span}
\{x^{(0)},Ax^{(0)},\ldots,A^{k-1}x^{(0)}\}.
$$

Power iteration uses only the newest vector. More advanced Krylov methods retain and orthogonalize several vectors from this sequence.

Thus Arnoldi and Lanczos may be viewed as more sophisticated descendants of power iteration.

## 93.24 Block Power Iteration

Block power iteration uses several starting vectors at once.

Let

$$
X^{(0)}\in\mathbb{R}^{n\times m}.
$$

Compute

$$
Y^{(k+1)}=AX^{(k)}.
$$

Then orthonormalize the columns:

$$
X^{(k+1)}=\operatorname{orth}(Y^{(k+1)}).
$$

Block power iteration approximates an invariant subspace associated with the largest \(m\) eigenvalues in magnitude.

It is useful when several leading eigenvectors are needed.

## 93.25 Practical Stopping Criteria

Common stopping rules include:

| Criterion | Form |
|---|---|
| Eigenpair residual | \(\|Ax-\mu x\|\le \tau\) |
| Relative residual | \(\|Ax-\mu x\|/(\|A\|\|x\|)\le \tau\) |
| Eigenvalue change | \(|\mu_{k+1}-\mu_k|\le \tau\) |
| Direction change | \(\|x^{(k+1)}\pm x^{(k)}\|\le \tau\) |
| Maximum iterations | \(k\le k_{\max}\) |

The sign ambiguity in eigenvectors explains the use of

$$
x^{(k+1)}\pm x^{(k)}.
$$

For negative dominant eigenvalues, consecutive normalized vectors may alternate signs.

## 93.26 Numerical Considerations

A practical implementation should account for:

| Issue | Handling |
|---|---|
| Overflow or underflow | normalize every step |
| Zero product \(Ax=0\) | stop or change initial vector |
| Sign ambiguity | compare directions, not raw vectors |
| Slow convergence | use shift, inverse iteration, or Krylov methods |
| Sparse matrix | use sparse matrix-vector product |
| Non-normality | check residual, not only eigenvalue change |
| Multiple dominant eigenvalues | use block methods or Arnoldi |

The simplicity of power iteration makes it attractive, but its diagnostics must be chosen carefully.

## 93.27 Summary

Power iteration approximates the dominant eigenpair of a matrix by repeated multiplication and normalization:

$$
y^{(k+1)}=Ax^{(k)},
\qquad
x^{(k+1)} =
\frac{y^{(k+1)}}{\|y^{(k+1)}\|}.
$$

The eigenvalue is commonly estimated by the Rayleigh quotient:

$$
\mu_k =
\frac{(x^{(k)})^TAx^{(k)}}{(x^{(k)})^Tx^{(k)}}.
$$

The method converges when there is a unique dominant eigenvalue and the starting vector has a nonzero component in its eigendirection. Its convergence rate is controlled by

$$
\left|\frac{\lambda_2}{\lambda_1}\right|.
$$

Power iteration is simple, memory-light, and suitable for sparse or matrix-free problems. Its main weakness is slow convergence when the dominant eigenvalue is poorly separated from the rest of the spectrum.