# Chapter 70. Cayley-Hamilton Theorem

# Chapter 70. Cayley-Hamilton Theorem

The Cayley-Hamilton theorem states that every square matrix satisfies its own characteristic equation.

If \(A\) is an \(n\times n\) matrix and its characteristic polynomial is

$$
p_A(t)=\det(tI-A),
$$

then the theorem says

$$
p_A(A)=0.
$$

This means that if

$$
p_A(t)=t^n+c_{n-1}t^{n-1}+\cdots+c_1t+c_0,
$$

then

$$
A^n+c_{n-1}A^{n-1}+\cdots+c_1A+c_0I=0.
$$

The theorem is a basic bridge between determinants, eigenvalues, matrix powers, and minimal polynomials. Standard statements of the theorem say that the characteristic polynomial of an \(n\times n\) matrix annihilates that matrix.

## 70.1 Matrix Polynomials

Let

$$
p(t)=a_0+a_1t+a_2t^2+\cdots+a_kt^k
$$

be a polynomial.

For a square matrix \(A\), define

$$
p(A)=a_0I+a_1A+a_2A^2+\cdots+a_kA^k.
$$

The identity matrix \(I\) appears in the constant term so that every term is an \(n\times n\) matrix.

For example, if

$$
p(t)=t^2-5t-2,
$$

then

$$
p(A)=A^2-5A-2I.
$$

A polynomial \(p\) annihilates \(A\) if

$$
p(A)=0.
$$

The Cayley-Hamilton theorem says that the characteristic polynomial always annihilates the matrix.

## 70.2 Statement of the Theorem

Let \(A\in F^{n\times n}\), where \(F\) is a field. Let

$$
p_A(t)=\det(tI-A)
$$

be the characteristic polynomial of \(A\).

If

$$
p_A(t)=t^n+c_{n-1}t^{n-1}+\cdots+c_1t+c_0,
$$

then

$$
p_A(A)=A^n+c_{n-1}A^{n-1}+\cdots+c_1A+c_0I=0.
$$

This is the Cayley-Hamilton theorem.

It says that the matrix \(A\) is a root of its own characteristic polynomial, in the sense of matrix polynomial evaluation.

## 70.3 A Two by Two Form

For a \(2\times2\) matrix

$$
A=
\begin{bmatrix}
a & b \\
c & d
\end{bmatrix},
$$

the characteristic polynomial is

$$
p_A(t)=t^2-\operatorname{tr}(A)t+\det(A).
$$

Therefore the Cayley-Hamilton theorem gives

$$
A^2-\operatorname{tr}(A)A+\det(A)I=0.
$$

This compact identity holds for every \(2\times2\) matrix.

For example, if

$$
A=
\begin{bmatrix}
1 & 2 \\
3 & 4
\end{bmatrix},
$$

then

$$
\operatorname{tr}(A)=5
$$

and

$$
\det(A)=-2.
$$

So the theorem predicts

$$
A^2-5A-2I=0.
$$

This example is a common concrete illustration of the theorem.

## 70.4 Direct Verification in a Two by Two Example

Let

$$
A=
\begin{bmatrix}
1 & 2 \\
3 & 4
\end{bmatrix}.
$$

Compute

$$
A^2=
\begin{bmatrix}
1 & 2 \\
3 & 4
\end{bmatrix}
\begin{bmatrix}
1 & 2 \\
3 & 4
\end{bmatrix} =
\begin{bmatrix}
7 & 10 \\
15 & 22
\end{bmatrix}.
$$

Now compute

$$
5A=
\begin{bmatrix}
5 & 10 \\
15 & 20
\end{bmatrix}.
$$

Then

$$
A^2-5A-2I =
\begin{bmatrix}
7 & 10 \\
15 & 22
\end{bmatrix} -
\begin{bmatrix}
5 & 10 \\
15 & 20
\end{bmatrix} -
\begin{bmatrix}
2 & 0 \\
0 & 2
\end{bmatrix}.
$$

Thus

$$
A^2-5A-2I =
\begin{bmatrix}
0 & 0 \\
0 & 0
\end{bmatrix}.
$$

So \(A\) satisfies its characteristic equation.

## 70.5 Meaning of the Theorem

The theorem says that high powers of a matrix are not independent forever.

For an \(n\times n\) matrix, the characteristic polynomial gives a relation among

$$
I,A,A^2,\ldots,A^n.
$$

If

$$
p_A(t)=t^n+c_{n-1}t^{n-1}+\cdots+c_1t+c_0,
$$

then

$$
A^n=-c_{n-1}A^{n-1}-\cdots-c_1A-c_0I.
$$

Thus every power \(A^k\) with \(k\geq n\) can be reduced to a linear combination of

$$
I,A,\ldots,A^{n-1}.
$$

This makes the theorem useful for matrix powers, recurrence relations, matrix functions, and inverse formulas.

## 70.6 Relation to the Minimal Polynomial

The minimal polynomial \(m_A(t)\) is the monic polynomial of least degree satisfying

$$
m_A(A)=0.
$$

The Cayley-Hamilton theorem says

$$
p_A(A)=0.
$$

Therefore the characteristic polynomial is an annihilating polynomial.

Since the minimal polynomial divides every annihilating polynomial, we get

$$
m_A(t)\mid p_A(t).
$$

This is one of the main consequences of Cayley-Hamilton: the minimal polynomial always divides the characteristic polynomial.

## 70.7 Eigenvalue Interpretation

Suppose

$$
Av=\lambda v
$$

with

$$
v\neq 0.
$$

For any polynomial \(q\),

$$
q(A)v=q(\lambda)v.
$$

If \(q=p_A\), then

$$
p_A(\lambda)=0
$$

because \(\lambda\) is an eigenvalue. Hence

$$
p_A(A)v=0
$$

for every eigenvector \(v\).

This observation suggests the theorem, but it does not prove it in general. Eigenvectors may not form a basis. Some matrices are defective. The Cayley-Hamilton theorem is stronger: it says \(p_A(A)=0\) as a matrix, even when there are not enough eigenvectors.

## 70.8 Proof Idea Using Diagonalization

If \(A\) is diagonalizable, the theorem is easy.

Suppose

$$
A=PDP^{-1},
$$

where

$$
D=\operatorname{diag}(\lambda_1,\ldots,\lambda_n).
$$

For any polynomial \(q\),

$$
q(A)=Pq(D)P^{-1}.
$$

The characteristic polynomial is

$$
p_A(t)=\prod_{i=1}^n(t-\lambda_i),
$$

counting multiplicities.

Then

$$
p_A(D)=
\operatorname{diag}
\left(
p_A(\lambda_1),
\ldots,
p_A(\lambda_n)
\right).
$$

Each diagonal entry is zero, so

$$
p_A(D)=0.
$$

Therefore

$$
p_A(A)=P0P^{-1}=0.
$$

This proves Cayley-Hamilton for diagonalizable matrices. The full theorem requires an argument that also covers defective matrices.

## 70.9 Proof Idea Using Jordan Form

Over \(\mathbb{C}\), every square matrix has a Jordan form:

$$
A=PJP^{-1}.
$$

The characteristic polynomial of \(A\) is the same as that of \(J\). Each Jordan block for eigenvalue \(\lambda\) has the form

$$
J_k(\lambda)=\lambda I+N,
$$

where

$$
N^k=0.
$$

The characteristic polynomial contains the factor

$$
(t-\lambda)^k
$$

for that block. Therefore, when the characteristic polynomial is evaluated on the block, the nilpotent part is killed.

Since every block is killed, the whole Jordan matrix is killed:

$$
p_A(J)=0.
$$

Then

$$
p_A(A)=Pp_A(J)P^{-1}=0.
$$

This proof shows the structural reason behind the theorem. The characteristic polynomial contains enough powers of each \((t-\lambda)\) factor to kill every Jordan block.

## 70.10 Proof Idea Using the Adjugate

There is also a determinant-based proof.

For a square matrix \(M\),

$$
\operatorname{adj}(M)M=\det(M)I.
$$

Apply this identity to

$$
M=tI-A.
$$

Then

$$
\operatorname{adj}(tI-A)(tI-A)=\det(tI-A)I.
$$

The right side is

$$
p_A(t)I.
$$

The adjugate matrix has polynomial entries in \(t\). Expanding both sides and comparing coefficients gives a matrix polynomial identity. Substituting \(t=A\) correctly then yields

$$
p_A(A)=0.
$$

This adjugate proof is a standard route to the theorem, with care needed because one cannot naively substitute a matrix into every occurrence of the scalar variable before establishing the polynomial identity.

## 70.11 Why Naive Substitution Can Mislead

The characteristic polynomial is

$$
p_A(t)=\det(tI-A).
$$

One might try to prove the theorem by writing

$$
p_A(A)=\det(AI-A).
$$

This is not a valid argument.

The expression

$$
p_A(A)
$$

means evaluating a scalar polynomial at the matrix \(A\):

$$
p_A(A)=A^n+c_{n-1}A^{n-1}+\cdots+c_0I.
$$

It does not mean taking the determinant of a block-like expression obtained by replacing \(t\) inside \(\det(tI-A)\) with a matrix.

The determinant expression first produces a scalar polynomial. Only after that polynomial is formed may it be evaluated at \(A\). This distinction matters in rigorous proofs.

## 70.12 Reducing Powers

Suppose

$$
p_A(t)=t^n+c_{n-1}t^{n-1}+\cdots+c_1t+c_0.
$$

By Cayley-Hamilton,

$$
A^n+c_{n-1}A^{n-1}+\cdots+c_1A+c_0I=0.
$$

So

$$
A^n=-c_{n-1}A^{n-1}-\cdots-c_1A-c_0I.
$$

Multiplying both sides by \(A\) gives

$$
A^{n+1} =
-c_{n-1}A^n-\cdots-c_1A^2-c_0A.
$$

Then reduce \(A^n\) again using the same relation.

Repeating this process expresses every high power of \(A\) as a linear combination of lower powers.

## 70.13 Example: Reducing Powers

Let

$$
A=
\begin{bmatrix}
1 & 2 \\
3 & 4
\end{bmatrix}.
$$

We found

$$
A^2-5A-2I=0.
$$

Thus

$$
A^2=5A+2I.
$$

To compute \(A^3\),

$$
A^3=A A^2.
$$

Substitute:

$$
A^3=A(5A+2I)=5A^2+2A.
$$

Reduce \(A^2\):

$$
A^3=5(5A+2I)+2A.
$$

Thus

$$
A^3=27A+10I.
$$

So without multiplying three matrices directly, we obtain

$$
A^3=
27
\begin{bmatrix}
1 & 2 \\
3 & 4
\end{bmatrix}
+
10
\begin{bmatrix}
1 & 0 \\
0 & 1
\end{bmatrix}.
$$

Therefore

$$
A^3=
\begin{bmatrix}
37 & 54 \\
81 & 118
\end{bmatrix}.
$$

## 70.14 Formula for the Inverse

If \(A\) is invertible, then Cayley-Hamilton gives a formula for \(A^{-1}\) as a polynomial in \(A\).

Let

$$
p_A(t)=t^n+c_{n-1}t^{n-1}+\cdots+c_1t+c_0.
$$

Since

$$
c_0=p_A(0)=\det(-A)=(-1)^n\det(A),
$$

invertibility implies

$$
c_0\neq 0.
$$

By Cayley-Hamilton,

$$
A^n+c_{n-1}A^{n-1}+\cdots+c_1A+c_0I=0.
$$

Factor out \(A\) from all terms except the constant term:

$$
A(A^{n-1}+c_{n-1}A^{n-2}+\cdots+c_1I)+c_0I=0.
$$

Hence

$$
A(A^{n-1}+c_{n-1}A^{n-2}+\cdots+c_1I)=-c_0I.
$$

Therefore

$$
A^{-1} =
-\frac{1}{c_0}
\left(
A^{n-1}+c_{n-1}A^{n-2}+\cdots+c_1I
\right).
$$

This gives an exact inverse formula, although it is usually not the best numerical method for computing inverses.

## 70.15 Example: Inverse from Cayley-Hamilton

For a \(2\times2\) matrix,

$$
A^2-\operatorname{tr}(A)A+\det(A)I=0.
$$

If \(A\) is invertible, then

$$
\det(A)\neq 0.
$$

Rewrite:

$$
A^2-\operatorname{tr}(A)A=-\det(A)I.
$$

Factor:

$$
A(A-\operatorname{tr}(A)I)=-\det(A)I.
$$

Thus

$$
A^{-1} =
\frac{1}{\det(A)}
(\operatorname{tr}(A)I-A).
$$

For

$$
A=
\begin{bmatrix}
a & b \\
c & d
\end{bmatrix},
$$

this becomes

$$
A^{-1} =
\frac{1}{ad-bc}
\left(
\begin{bmatrix}
a+d & 0 \\
0 & a+d
\end{bmatrix} -
\begin{bmatrix}
a & b \\
c & d
\end{bmatrix}
\right).
$$

Hence

$$
A^{-1} =
\frac{1}{ad-bc}
\begin{bmatrix}
d & -b \\
-c & a
\end{bmatrix}.
$$

This recovers the standard inverse formula for \(2\times2\) matrices.

## 70.16 Recurrence Relations

Cayley-Hamilton gives recurrence relations for matrix sequences.

If

$$
p_A(t)=t^n+c_{n-1}t^{n-1}+\cdots+c_0,
$$

then

$$
A^n=-c_{n-1}A^{n-1}-\cdots-c_0I.
$$

Multiplying by \(A^k\) gives

$$
A^{n+k} =
-c_{n-1}A^{n-1+k}
-\cdots
-c_1A^{1+k}
-c_0A^k.
$$

Thus the sequence

$$
I,A,A^2,A^3,\ldots
$$

satisfies a linear recurrence whose coefficients come from the characteristic polynomial.

This is useful in discrete dynamical systems and linear recurrences.

## 70.17 Matrix Functions

Cayley-Hamilton also simplifies matrix functions.

If \(f\) is a polynomial, one can divide \(f\) by \(p_A\):

$$
f(t)=q(t)p_A(t)+r(t),
$$

where

$$
\deg r<n.
$$

Evaluate at \(A\):

$$
f(A)=q(A)p_A(A)+r(A).
$$

Since

$$
p_A(A)=0,
$$

we get

$$
f(A)=r(A).
$$

Thus every polynomial in \(A\) is equivalent to a polynomial of degree less than \(n\).

This idea extends, with additional hypotheses, to analytic functions of matrices.

## 70.18 Determinant and Trace in the Two by Two Case

For a \(2\times2\) matrix, Cayley-Hamilton says

$$
A^2-\operatorname{tr}(A)A+\det(A)I=0.
$$

This identity shows how trace and determinant control the second power of \(A\).

The trace controls the coefficient of \(A\). The determinant controls the constant term.

For higher-dimensional matrices, the coefficients of the characteristic polynomial play analogous roles. They are built from determinant-like invariants and determine how \(A^n\) reduces to lower powers.

## 70.19 Projection Example

Let \(P\) be a projection:

$$
P^2=P.
$$

Then

$$
P^2-P=0.
$$

So \(P\) is annihilated by

$$
t^2-t=t(t-1).
$$

The Cayley-Hamilton theorem guarantees that the characteristic polynomial also annihilates \(P\).

If \(P\) projects onto an \(r\)-dimensional subspace of an \(n\)-dimensional space, then its eigenvalues are \(1\) with multiplicity \(r\), and \(0\) with multiplicity \(n-r\). Hence

$$
p_P(t)=t^{n-r}(t-1)^r.
$$

Cayley-Hamilton gives

$$
P^{n-r}(P-I)^r=0.
$$

This is true, but the smaller polynomial

$$
t(t-1)
$$

already annihilates \(P\). This illustrates the difference between the characteristic polynomial and the minimal polynomial.

## 70.20 Nilpotent Example

Let \(N\) be nilpotent with

$$
N^k=0.
$$

If \(N\) is \(n\times n\), all eigenvalues are zero, so the characteristic polynomial is

$$
p_N(t)=t^n.
$$

Cayley-Hamilton gives

$$
N^n=0.
$$

Thus every nilpotent \(n\times n\) matrix has nilpotency index at most \(n\).

The minimal polynomial may be

$$
t^k
$$

for some

$$
k\leq n.
$$

Again, the minimal polynomial gives the sharp exponent, while the characteristic polynomial gives a universal exponent.

## 70.21 Consequences

The Cayley-Hamilton theorem has several important consequences.

| Consequence | Meaning |
|---|---|
| \(p_A(A)=0\) | The characteristic polynomial annihilates \(A\) |
| \(m_A\mid p_A\) | The minimal polynomial divides the characteristic polynomial |
| High powers reduce | \(A^k\) for \(k\geq n\) reduces to lower powers |
| Inverses become polynomials | If \(A\) is invertible, \(A^{-1}\) is a polynomial in \(A\) |
| Matrix recurrences | Powers of \(A\) satisfy a linear recurrence |
| Nilpotent bound | If \(N\) is nilpotent \(n\times n\), then \(N^n=0\) |

These consequences make the theorem both structural and computational.

## 70.22 Common Errors

The first common error is to confuse

$$
p_A(A)
$$

with

$$
\det(AI-A).
$$

The former is matrix polynomial evaluation. The latter is not the correct interpretation.

The second common error is to forget the identity matrix in the constant term. If

$$
p(t)=t^2+3t+5,
$$

then

$$
p(A)=A^2+3A+5I,
$$

not

$$
A^2+3A+5.
$$

The third common error is to assume the characteristic polynomial is the smallest annihilating polynomial. It may not be. The smallest one is the minimal polynomial.

The fourth common error is to use Cayley-Hamilton as a numerical method for large matrix inverses. The theorem is exact and algebraic, but direct inverse computation through powers is usually unstable and inefficient.

## 70.23 Summary

The Cayley-Hamilton theorem states that every square matrix satisfies its own characteristic equation.

If

$$
p_A(t)=\det(tI-A),
$$

then

$$
p_A(A)=0.
$$

In expanded form, if

$$
p_A(t)=t^n+c_{n-1}t^{n-1}+\cdots+c_0,
$$

then

$$
A^n+c_{n-1}A^{n-1}+\cdots+c_0I=0.
$$

The theorem proves that the characteristic polynomial is an annihilating polynomial. It implies that the minimal polynomial divides the characteristic polynomial, that high powers of a matrix reduce to lower powers, and that invertible matrices have inverses expressible as polynomials in themselves.