Chapter 70. Cayley-Hamilton Theorem

The Cayley-Hamilton theorem states that every square matrix satisfies its own characteristic equation.

If $A$ is an $n\times n$ matrix and its characteristic polynomial is

p_A(t)=\det(tI-A),

then the theorem says

p_A(A)=0.

This means that if

p_A(t)=t^n+c_{n-1}t^{n-1}+\cdots+c_1t+c_0,

then

A^n+c_{n-1}A^{n-1}+\cdots+c_1A+c_0I=0.

The theorem is a basic bridge between determinants, eigenvalues, matrix powers, and minimal polynomials. Standard statements of the theorem say that the characteristic polynomial of an $n\times n$ matrix annihilates that matrix.

70.1 Matrix Polynomials

Let

p(t)=a_0+a_1t+a_2t^2+\cdots+a_kt^k

be a polynomial.

For a square matrix $A$ , define

p(A)=a_0I+a_1A+a_2A^2+\cdots+a_kA^k.

The identity matrix $I$ appears in the constant term so that every term is an $n\times n$ matrix.

For example, if

p(t)=t^2-5t-2,

then

p(A)=A^2-5A-2I.

A polynomial $p$ annihilates $A$ if

p(A)=0.

The Cayley-Hamilton theorem says that the characteristic polynomial always annihilates the matrix.

70.2 Statement of the Theorem

Let $A\in F^{n\times n}$ , where $F$ is a field. Let

p_A(t)=\det(tI-A)

be the characteristic polynomial of $A$ .

p_A(t)=t^n+c_{n-1}t^{n-1}+\cdots+c_1t+c_0,

then

p_A(A)=A^n+c_{n-1}A^{n-1}+\cdots+c_1A+c_0I=0.

This is the Cayley-Hamilton theorem.

It says that the matrix $A$ is a root of its own characteristic polynomial, in the sense of matrix polynomial evaluation.

70.3 A Two by Two Form

For a $2\times2$ matrix

A= \begin{bmatrix} a & b \\ c & d \end{bmatrix},

the characteristic polynomial is

p_A(t)=t^2-\operatorname{tr}(A)t+\det(A).

Therefore the Cayley-Hamilton theorem gives

A^2-\operatorname{tr}(A)A+\det(A)I=0.

This compact identity holds for every $2\times2$ matrix.

For example, if

A= \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix},

then

\operatorname{tr}(A)=5

and

\det(A)=-2.

So the theorem predicts

A^2-5A-2I=0.

This example is a common concrete illustration of the theorem.

70.4 Direct Verification in a Two by Two Example

Let

A= \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix}.

Compute

A^2= \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix} \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix} = \begin{bmatrix} 7 & 10 \\ 15 & 22 \end{bmatrix}.

Now compute

5A= \begin{bmatrix} 5 & 10 \\ 15 & 20 \end{bmatrix}.

Then

A^2-5A-2I = \begin{bmatrix} 7 & 10 \\ 15 & 22 \end{bmatrix} - \begin{bmatrix} 5 & 10 \\ 15 & 20 \end{bmatrix} - \begin{bmatrix} 2 & 0 \\ 0 & 2 \end{bmatrix}.

Thus

A^2-5A-2I = \begin{bmatrix} 0 & 0 \\ 0 & 0 \end{bmatrix}.

So $A$ satisfies its characteristic equation.

70.5 Meaning of the Theorem

The theorem says that high powers of a matrix are not independent forever.

For an $n\times n$ matrix, the characteristic polynomial gives a relation among

I,A,A^2,\ldots,A^n.

p_A(t)=t^n+c_{n-1}t^{n-1}+\cdots+c_1t+c_0,

then

A^n=-c_{n-1}A^{n-1}-\cdots-c_1A-c_0I.

Thus every power $A^k$ with $k\geq n$ can be reduced to a linear combination of

I,A,\ldots,A^{n-1}.

This makes the theorem useful for matrix powers, recurrence relations, matrix functions, and inverse formulas.

70.6 Relation to the Minimal Polynomial

The minimal polynomial $m_A(t)$ is the monic polynomial of least degree satisfying

m_A(A)=0.

The Cayley-Hamilton theorem says

p_A(A)=0.

Therefore the characteristic polynomial is an annihilating polynomial.

Since the minimal polynomial divides every annihilating polynomial, we get

m_A(t)\mid p_A(t).

This is one of the main consequences of Cayley-Hamilton: the minimal polynomial always divides the characteristic polynomial.

70.7 Eigenvalue Interpretation

Suppose

Av=\lambda v

with

v\neq 0.

For any polynomial $q$ ,

q(A)v=q(\lambda)v.

If $q=p_A$ , then

p_A(\lambda)=0

because $\lambda$ is an eigenvalue. Hence

p_A(A)v=0

for every eigenvector $v$ .

This observation suggests the theorem, but it does not prove it in general. Eigenvectors may not form a basis. Some matrices are defective. The Cayley-Hamilton theorem is stronger: it says $p_A(A)=0$ as a matrix, even when there are not enough eigenvectors.

70.8 Proof Idea Using Diagonalization

If $A$ is diagonalizable, the theorem is easy.

Suppose

A=PDP^{-1},

where

D=\operatorname{diag}(\lambda_1,\ldots,\lambda_n).

For any polynomial $q$ ,

q(A)=Pq(D)P^{-1}.

The characteristic polynomial is

p_A(t)=\prod_{i=1}^n(t-\lambda_i),

counting multiplicities.

Then

p_A(D)= \operatorname{diag} \left( p_A(\lambda_1), \ldots, p_A(\lambda_n) \right).

Each diagonal entry is zero, so

p_A(D)=0.

Therefore

p_A(A)=P0P^{-1}=0.

This proves Cayley-Hamilton for diagonalizable matrices. The full theorem requires an argument that also covers defective matrices.

70.9 Proof Idea Using Jordan Form

Over $\mathbb{C}$ , every square matrix has a Jordan form:

A=PJP^{-1}.

The characteristic polynomial of $A$ is the same as that of $J$ . Each Jordan block for eigenvalue $\lambda$ has the form

J_k(\lambda)=\lambda I+N,

where

N^k=0.

The characteristic polynomial contains the factor

(t-\lambda)^k

for that block. Therefore, when the characteristic polynomial is evaluated on the block, the nilpotent part is killed.

Since every block is killed, the whole Jordan matrix is killed:

p_A(J)=0.

Then

p_A(A)=Pp_A(J)P^{-1}=0.

This proof shows the structural reason behind the theorem. The characteristic polynomial contains enough powers of each $(t-\lambda)$ factor to kill every Jordan block.

70.10 Proof Idea Using the Adjugate

There is also a determinant-based proof.

For a square matrix $M$ ,

\operatorname{adj}(M)M=\det(M)I.

Apply this identity to

M=tI-A.

Then

\operatorname{adj}(tI-A)(tI-A)=\det(tI-A)I.

The right side is

p_A(t)I.

The adjugate matrix has polynomial entries in $t$ . Expanding both sides and comparing coefficients gives a matrix polynomial identity. Substituting $t=A$ correctly then yields

p_A(A)=0.

This adjugate proof is a standard route to the theorem, with care needed because one cannot naively substitute a matrix into every occurrence of the scalar variable before establishing the polynomial identity.

70.11 Why Naive Substitution Can Mislead

The characteristic polynomial is

p_A(t)=\det(tI-A).

One might try to prove the theorem by writing

p_A(A)=\det(AI-A).

This is not a valid argument.

The expression

p_A(A)

means evaluating a scalar polynomial at the matrix $A$ :

p_A(A)=A^n+c_{n-1}A^{n-1}+\cdots+c_0I.

It does not mean taking the determinant of a block-like expression obtained by replacing $t$ inside $\det(tI-A)$ with a matrix.

The determinant expression first produces a scalar polynomial. Only after that polynomial is formed may it be evaluated at $A$ . This distinction matters in rigorous proofs.

70.12 Reducing Powers

Suppose

p_A(t)=t^n+c_{n-1}t^{n-1}+\cdots+c_1t+c_0.

By Cayley-Hamilton,

A^n+c_{n-1}A^{n-1}+\cdots+c_1A+c_0I=0.

A^n=-c_{n-1}A^{n-1}-\cdots-c_1A-c_0I.

Multiplying both sides by $A$ gives

A^{n+1} = -c_{n-1}A^n-\cdots-c_1A^2-c_0A.

Then reduce $A^n$ again using the same relation.

Repeating this process expresses every high power of $A$ as a linear combination of lower powers.

70.13 Example: Reducing Powers

Let

A= \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix}.

We found

A^2-5A-2I=0.

Thus

A^2=5A+2I.

To compute $A^3$ ,

A^3=A A^2.

Substitute:

A^3=A(5A+2I)=5A^2+2A.

Reduce $A^2$ :

A^3=5(5A+2I)+2A.

Thus

A^3=27A+10I.

So without multiplying three matrices directly, we obtain

A^3= 27 \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix} + 10 \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix}.

Therefore

A^3= \begin{bmatrix} 37 & 54 \\ 81 & 118 \end{bmatrix}.

70.14 Formula for the Inverse

If $A$ is invertible, then Cayley-Hamilton gives a formula for $A^{-1}$ as a polynomial in $A$ .

Let

p_A(t)=t^n+c_{n-1}t^{n-1}+\cdots+c_1t+c_0.

Since

c_0=p_A(0)=\det(-A)=(-1)^n\det(A),

invertibility implies

c_0\neq 0.

By Cayley-Hamilton,

A^n+c_{n-1}A^{n-1}+\cdots+c_1A+c_0I=0.

Factor out $A$ from all terms except the constant term:

A(A^{n-1}+c_{n-1}A^{n-2}+\cdots+c_1I)+c_0I=0.

Hence

A(A^{n-1}+c_{n-1}A^{n-2}+\cdots+c_1I)=-c_0I.

Therefore

A^{-1} = -\frac{1}{c_0} \left( A^{n-1}+c_{n-1}A^{n-2}+\cdots+c_1I \right).

This gives an exact inverse formula, although it is usually not the best numerical method for computing inverses.

70.15 Example: Inverse from Cayley-Hamilton

For a $2\times2$ matrix,

A^2-\operatorname{tr}(A)A+\det(A)I=0.

If $A$ is invertible, then

\det(A)\neq 0.

Rewrite:

A^2-\operatorname{tr}(A)A=-\det(A)I.

Factor:

A(A-\operatorname{tr}(A)I)=-\det(A)I.

Thus

A^{-1} = \frac{1}{\det(A)} (\operatorname{tr}(A)I-A).

For

A= \begin{bmatrix} a & b \\ c & d \end{bmatrix},

this becomes

A^{-1} = \frac{1}{ad-bc} \left( \begin{bmatrix} a+d & 0 \\ 0 & a+d \end{bmatrix} - \begin{bmatrix} a & b \\ c & d \end{bmatrix} \right).

Hence

A^{-1} = \frac{1}{ad-bc} \begin{bmatrix} d & -b \\ -c & a \end{bmatrix}.

This recovers the standard inverse formula for $2\times2$ matrices.

70.16 Recurrence Relations

Cayley-Hamilton gives recurrence relations for matrix sequences.

p_A(t)=t^n+c_{n-1}t^{n-1}+\cdots+c_0,

then

A^n=-c_{n-1}A^{n-1}-\cdots-c_0I.

Multiplying by $A^k$ gives

A^{n+k} = -c_{n-1}A^{n-1+k} -\cdots -c_1A^{1+k} -c_0A^k.

Thus the sequence

I,A,A^2,A^3,\ldots

satisfies a linear recurrence whose coefficients come from the characteristic polynomial.

This is useful in discrete dynamical systems and linear recurrences.

70.17 Matrix Functions

Cayley-Hamilton also simplifies matrix functions.

If $f$ is a polynomial, one can divide $f$ by $p_A$ :

f(t)=q(t)p_A(t)+r(t),

where

\deg r<n.

Evaluate at $A$ :

f(A)=q(A)p_A(A)+r(A).

Since

p_A(A)=0,

we get

f(A)=r(A).

Thus every polynomial in $A$ is equivalent to a polynomial of degree less than $n$ .

This idea extends, with additional hypotheses, to analytic functions of matrices.

70.18 Determinant and Trace in the Two by Two Case

For a $2\times2$ matrix, Cayley-Hamilton says

A^2-\operatorname{tr}(A)A+\det(A)I=0.

This identity shows how trace and determinant control the second power of $A$ .

The trace controls the coefficient of $A$ . The determinant controls the constant term.

For higher-dimensional matrices, the coefficients of the characteristic polynomial play analogous roles. They are built from determinant-like invariants and determine how $A^n$ reduces to lower powers.

70.19 Projection Example

Let $P$ be a projection:

P^2=P.

Then

P^2-P=0.

So $P$ is annihilated by

t^2-t=t(t-1).

The Cayley-Hamilton theorem guarantees that the characteristic polynomial also annihilates $P$ .

If $P$ projects onto an $r$ -dimensional subspace of an $n$ -dimensional space, then its eigenvalues are $1$ with multiplicity $r$ , and $0$ with multiplicity $n-r$ . Hence

p_P(t)=t^{n-r}(t-1)^r.

Cayley-Hamilton gives

P^{n-r}(P-I)^r=0.

This is true, but the smaller polynomial

t(t-1)

already annihilates $P$ . This illustrates the difference between the characteristic polynomial and the minimal polynomial.

70.20 Nilpotent Example

Let $N$ be nilpotent with

N^k=0.

If $N$ is $n\times n$ , all eigenvalues are zero, so the characteristic polynomial is

p_N(t)=t^n.

Cayley-Hamilton gives

N^n=0.

Thus every nilpotent $n\times n$ matrix has nilpotency index at most $n$ .

The minimal polynomial may be

t^k

for some

k\leq n.

Again, the minimal polynomial gives the sharp exponent, while the characteristic polynomial gives a universal exponent.

70.21 Consequences

The Cayley-Hamilton theorem has several important consequences.

Consequence	Meaning
$p_A(A)=0$	The characteristic polynomial annihilates $A$
$m_A\mid p_A$	The minimal polynomial divides the characteristic polynomial
High powers reduce	$A^k$ for $k\geq n$ reduces to lower powers
Inverses become polynomials	If $A$ is invertible, $A^{-1}$ is a polynomial in $A$
Matrix recurrences	Powers of $A$ satisfy a linear recurrence
Nilpotent bound	If $N$ is nilpotent $n\times n$ , then $N^n=0$

These consequences make the theorem both structural and computational.

70.22 Common Errors

The first common error is to confuse

p_A(A)

with

\det(AI-A).

The former is matrix polynomial evaluation. The latter is not the correct interpretation.

The second common error is to forget the identity matrix in the constant term. If

p(t)=t^2+3t+5,

then

p(A)=A^2+3A+5I,

not

A^2+3A+5.

The third common error is to assume the characteristic polynomial is the smallest annihilating polynomial. It may not be. The smallest one is the minimal polynomial.

The fourth common error is to use Cayley-Hamilton as a numerical method for large matrix inverses. The theorem is exact and algebraic, but direct inverse computation through powers is usually unstable and inefficient.

70.23 Summary

The Cayley-Hamilton theorem states that every square matrix satisfies its own characteristic equation.

p_A(t)=\det(tI-A),

then

p_A(A)=0.

In expanded form, if

p_A(t)=t^n+c_{n-1}t^{n-1}+\cdots+c_0,

then

A^n+c_{n-1}A^{n-1}+\cdots+c_0I=0.

The theorem proves that the characteristic polynomial is an annihilating polynomial. It implies that the minimal polynomial divides the characteristic polynomial, that high powers of a matrix reduce to lower powers, and that invertible matrices have inverses expressible as polynomials in themselves.