Skip to content

Chapter 70. Cayley-Hamilton Theorem

The Cayley-Hamilton theorem states that every square matrix satisfies its own characteristic equation.

If AA is an n×nn\times n matrix and its characteristic polynomial is

pA(t)=det(tIA), p_A(t)=\det(tI-A),

then the theorem says

pA(A)=0. p_A(A)=0.

This means that if

pA(t)=tn+cn1tn1++c1t+c0, p_A(t)=t^n+c_{n-1}t^{n-1}+\cdots+c_1t+c_0,

then

An+cn1An1++c1A+c0I=0. A^n+c_{n-1}A^{n-1}+\cdots+c_1A+c_0I=0.

The theorem is a basic bridge between determinants, eigenvalues, matrix powers, and minimal polynomials. Standard statements of the theorem say that the characteristic polynomial of an n×nn\times n matrix annihilates that matrix.

70.1 Matrix Polynomials

Let

p(t)=a0+a1t+a2t2++aktk p(t)=a_0+a_1t+a_2t^2+\cdots+a_kt^k

be a polynomial.

For a square matrix AA, define

p(A)=a0I+a1A+a2A2++akAk. p(A)=a_0I+a_1A+a_2A^2+\cdots+a_kA^k.

The identity matrix II appears in the constant term so that every term is an n×nn\times n matrix.

For example, if

p(t)=t25t2, p(t)=t^2-5t-2,

then

p(A)=A25A2I. p(A)=A^2-5A-2I.

A polynomial pp annihilates AA if

p(A)=0. p(A)=0.

The Cayley-Hamilton theorem says that the characteristic polynomial always annihilates the matrix.

70.2 Statement of the Theorem

Let AFn×nA\in F^{n\times n}, where FF is a field. Let

pA(t)=det(tIA) p_A(t)=\det(tI-A)

be the characteristic polynomial of AA.

If

pA(t)=tn+cn1tn1++c1t+c0, p_A(t)=t^n+c_{n-1}t^{n-1}+\cdots+c_1t+c_0,

then

pA(A)=An+cn1An1++c1A+c0I=0. p_A(A)=A^n+c_{n-1}A^{n-1}+\cdots+c_1A+c_0I=0.

This is the Cayley-Hamilton theorem.

It says that the matrix AA is a root of its own characteristic polynomial, in the sense of matrix polynomial evaluation.

70.3 A Two by Two Form

For a 2×22\times2 matrix

A=[abcd], A= \begin{bmatrix} a & b \\ c & d \end{bmatrix},

the characteristic polynomial is

pA(t)=t2tr(A)t+det(A). p_A(t)=t^2-\operatorname{tr}(A)t+\det(A).

Therefore the Cayley-Hamilton theorem gives

A2tr(A)A+det(A)I=0. A^2-\operatorname{tr}(A)A+\det(A)I=0.

This compact identity holds for every 2×22\times2 matrix.

For example, if

A=[1234], A= \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix},

then

tr(A)=5 \operatorname{tr}(A)=5

and

det(A)=2. \det(A)=-2.

So the theorem predicts

A25A2I=0. A^2-5A-2I=0.

This example is a common concrete illustration of the theorem.

70.4 Direct Verification in a Two by Two Example

Let

A=[1234]. A= \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix}.

Compute

A2=[1234][1234]=[7101522]. A^2= \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix} \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix} = \begin{bmatrix} 7 & 10 \\ 15 & 22 \end{bmatrix}.

Now compute

5A=[5101520]. 5A= \begin{bmatrix} 5 & 10 \\ 15 & 20 \end{bmatrix}.

Then

A25A2I=[7101522][5101520][2002]. A^2-5A-2I = \begin{bmatrix} 7 & 10 \\ 15 & 22 \end{bmatrix} - \begin{bmatrix} 5 & 10 \\ 15 & 20 \end{bmatrix} - \begin{bmatrix} 2 & 0 \\ 0 & 2 \end{bmatrix}.

Thus

A25A2I=[0000]. A^2-5A-2I = \begin{bmatrix} 0 & 0 \\ 0 & 0 \end{bmatrix}.

So AA satisfies its characteristic equation.

70.5 Meaning of the Theorem

The theorem says that high powers of a matrix are not independent forever.

For an n×nn\times n matrix, the characteristic polynomial gives a relation among

I,A,A2,,An. I,A,A^2,\ldots,A^n.

If

pA(t)=tn+cn1tn1++c1t+c0, p_A(t)=t^n+c_{n-1}t^{n-1}+\cdots+c_1t+c_0,

then

An=cn1An1c1Ac0I. A^n=-c_{n-1}A^{n-1}-\cdots-c_1A-c_0I.

Thus every power AkA^k with knk\geq n can be reduced to a linear combination of

I,A,,An1. I,A,\ldots,A^{n-1}.

This makes the theorem useful for matrix powers, recurrence relations, matrix functions, and inverse formulas.

70.6 Relation to the Minimal Polynomial

The minimal polynomial mA(t)m_A(t) is the monic polynomial of least degree satisfying

mA(A)=0. m_A(A)=0.

The Cayley-Hamilton theorem says

pA(A)=0. p_A(A)=0.

Therefore the characteristic polynomial is an annihilating polynomial.

Since the minimal polynomial divides every annihilating polynomial, we get

mA(t)pA(t). m_A(t)\mid p_A(t).

This is one of the main consequences of Cayley-Hamilton: the minimal polynomial always divides the characteristic polynomial.

70.7 Eigenvalue Interpretation

Suppose

Av=λv Av=\lambda v

with

v0. v\neq 0.

For any polynomial qq,

q(A)v=q(λ)v. q(A)v=q(\lambda)v.

If q=pAq=p_A, then

pA(λ)=0 p_A(\lambda)=0

because λ\lambda is an eigenvalue. Hence

pA(A)v=0 p_A(A)v=0

for every eigenvector vv.

This observation suggests the theorem, but it does not prove it in general. Eigenvectors may not form a basis. Some matrices are defective. The Cayley-Hamilton theorem is stronger: it says pA(A)=0p_A(A)=0 as a matrix, even when there are not enough eigenvectors.

70.8 Proof Idea Using Diagonalization

If AA is diagonalizable, the theorem is easy.

Suppose

A=PDP1, A=PDP^{-1},

where

D=diag(λ1,,λn). D=\operatorname{diag}(\lambda_1,\ldots,\lambda_n).

For any polynomial qq,

q(A)=Pq(D)P1. q(A)=Pq(D)P^{-1}.

The characteristic polynomial is

pA(t)=i=1n(tλi), p_A(t)=\prod_{i=1}^n(t-\lambda_i),

counting multiplicities.

Then

pA(D)=diag(pA(λ1),,pA(λn)). p_A(D)= \operatorname{diag} \left( p_A(\lambda_1), \ldots, p_A(\lambda_n) \right).

Each diagonal entry is zero, so

pA(D)=0. p_A(D)=0.

Therefore

pA(A)=P0P1=0. p_A(A)=P0P^{-1}=0.

This proves Cayley-Hamilton for diagonalizable matrices. The full theorem requires an argument that also covers defective matrices.

70.9 Proof Idea Using Jordan Form

Over C\mathbb{C}, every square matrix has a Jordan form:

A=PJP1. A=PJP^{-1}.

The characteristic polynomial of AA is the same as that of JJ. Each Jordan block for eigenvalue λ\lambda has the form

Jk(λ)=λI+N, J_k(\lambda)=\lambda I+N,

where

Nk=0. N^k=0.

The characteristic polynomial contains the factor

(tλ)k (t-\lambda)^k

for that block. Therefore, when the characteristic polynomial is evaluated on the block, the nilpotent part is killed.

Since every block is killed, the whole Jordan matrix is killed:

pA(J)=0. p_A(J)=0.

Then

pA(A)=PpA(J)P1=0. p_A(A)=Pp_A(J)P^{-1}=0.

This proof shows the structural reason behind the theorem. The characteristic polynomial contains enough powers of each (tλ)(t-\lambda) factor to kill every Jordan block.

70.10 Proof Idea Using the Adjugate

There is also a determinant-based proof.

For a square matrix MM,

adj(M)M=det(M)I. \operatorname{adj}(M)M=\det(M)I.

Apply this identity to

M=tIA. M=tI-A.

Then

adj(tIA)(tIA)=det(tIA)I. \operatorname{adj}(tI-A)(tI-A)=\det(tI-A)I.

The right side is

pA(t)I. p_A(t)I.

The adjugate matrix has polynomial entries in tt. Expanding both sides and comparing coefficients gives a matrix polynomial identity. Substituting t=At=A correctly then yields

pA(A)=0. p_A(A)=0.

This adjugate proof is a standard route to the theorem, with care needed because one cannot naively substitute a matrix into every occurrence of the scalar variable before establishing the polynomial identity.

70.11 Why Naive Substitution Can Mislead

The characteristic polynomial is

pA(t)=det(tIA). p_A(t)=\det(tI-A).

One might try to prove the theorem by writing

pA(A)=det(AIA). p_A(A)=\det(AI-A).

This is not a valid argument.

The expression

pA(A) p_A(A)

means evaluating a scalar polynomial at the matrix AA:

pA(A)=An+cn1An1++c0I. p_A(A)=A^n+c_{n-1}A^{n-1}+\cdots+c_0I.

It does not mean taking the determinant of a block-like expression obtained by replacing tt inside det(tIA)\det(tI-A) with a matrix.

The determinant expression first produces a scalar polynomial. Only after that polynomial is formed may it be evaluated at AA. This distinction matters in rigorous proofs.

70.12 Reducing Powers

Suppose

pA(t)=tn+cn1tn1++c1t+c0. p_A(t)=t^n+c_{n-1}t^{n-1}+\cdots+c_1t+c_0.

By Cayley-Hamilton,

An+cn1An1++c1A+c0I=0. A^n+c_{n-1}A^{n-1}+\cdots+c_1A+c_0I=0.

So

An=cn1An1c1Ac0I. A^n=-c_{n-1}A^{n-1}-\cdots-c_1A-c_0I.

Multiplying both sides by AA gives

An+1=cn1Anc1A2c0A. A^{n+1} = -c_{n-1}A^n-\cdots-c_1A^2-c_0A.

Then reduce AnA^n again using the same relation.

Repeating this process expresses every high power of AA as a linear combination of lower powers.

70.13 Example: Reducing Powers

Let

A=[1234]. A= \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix}.

We found

A25A2I=0. A^2-5A-2I=0.

Thus

A2=5A+2I. A^2=5A+2I.

To compute A3A^3,

A3=AA2. A^3=A A^2.

Substitute:

A3=A(5A+2I)=5A2+2A. A^3=A(5A+2I)=5A^2+2A.

Reduce A2A^2:

A3=5(5A+2I)+2A. A^3=5(5A+2I)+2A.

Thus

A3=27A+10I. A^3=27A+10I.

So without multiplying three matrices directly, we obtain

A3=27[1234]+10[1001]. A^3= 27 \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix} + 10 \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix}.

Therefore

A3=[375481118]. A^3= \begin{bmatrix} 37 & 54 \\ 81 & 118 \end{bmatrix}.

70.14 Formula for the Inverse

If AA is invertible, then Cayley-Hamilton gives a formula for A1A^{-1} as a polynomial in AA.

Let

pA(t)=tn+cn1tn1++c1t+c0. p_A(t)=t^n+c_{n-1}t^{n-1}+\cdots+c_1t+c_0.

Since

c0=pA(0)=det(A)=(1)ndet(A), c_0=p_A(0)=\det(-A)=(-1)^n\det(A),

invertibility implies

c00. c_0\neq 0.

By Cayley-Hamilton,

An+cn1An1++c1A+c0I=0. A^n+c_{n-1}A^{n-1}+\cdots+c_1A+c_0I=0.

Factor out AA from all terms except the constant term:

A(An1+cn1An2++c1I)+c0I=0. A(A^{n-1}+c_{n-1}A^{n-2}+\cdots+c_1I)+c_0I=0.

Hence

A(An1+cn1An2++c1I)=c0I. A(A^{n-1}+c_{n-1}A^{n-2}+\cdots+c_1I)=-c_0I.

Therefore

A1=1c0(An1+cn1An2++c1I). A^{-1} = -\frac{1}{c_0} \left( A^{n-1}+c_{n-1}A^{n-2}+\cdots+c_1I \right).

This gives an exact inverse formula, although it is usually not the best numerical method for computing inverses.

70.15 Example: Inverse from Cayley-Hamilton

For a 2×22\times2 matrix,

A2tr(A)A+det(A)I=0. A^2-\operatorname{tr}(A)A+\det(A)I=0.

If AA is invertible, then

det(A)0. \det(A)\neq 0.

Rewrite:

A2tr(A)A=det(A)I. A^2-\operatorname{tr}(A)A=-\det(A)I.

Factor:

A(Atr(A)I)=det(A)I. A(A-\operatorname{tr}(A)I)=-\det(A)I.

Thus

A1=1det(A)(tr(A)IA). A^{-1} = \frac{1}{\det(A)} (\operatorname{tr}(A)I-A).

For

A=[abcd], A= \begin{bmatrix} a & b \\ c & d \end{bmatrix},

this becomes

A1=1adbc([a+d00a+d][abcd]). A^{-1} = \frac{1}{ad-bc} \left( \begin{bmatrix} a+d & 0 \\ 0 & a+d \end{bmatrix} - \begin{bmatrix} a & b \\ c & d \end{bmatrix} \right).

Hence

A1=1adbc[dbca]. A^{-1} = \frac{1}{ad-bc} \begin{bmatrix} d & -b \\ -c & a \end{bmatrix}.

This recovers the standard inverse formula for 2×22\times2 matrices.

70.16 Recurrence Relations

Cayley-Hamilton gives recurrence relations for matrix sequences.

If

pA(t)=tn+cn1tn1++c0, p_A(t)=t^n+c_{n-1}t^{n-1}+\cdots+c_0,

then

An=cn1An1c0I. A^n=-c_{n-1}A^{n-1}-\cdots-c_0I.

Multiplying by AkA^k gives

An+k=cn1An1+kc1A1+kc0Ak. A^{n+k} = -c_{n-1}A^{n-1+k} -\cdots -c_1A^{1+k} -c_0A^k.

Thus the sequence

I,A,A2,A3, I,A,A^2,A^3,\ldots

satisfies a linear recurrence whose coefficients come from the characteristic polynomial.

This is useful in discrete dynamical systems and linear recurrences.

70.17 Matrix Functions

Cayley-Hamilton also simplifies matrix functions.

If ff is a polynomial, one can divide ff by pAp_A:

f(t)=q(t)pA(t)+r(t), f(t)=q(t)p_A(t)+r(t),

where

degr<n. \deg r<n.

Evaluate at AA:

f(A)=q(A)pA(A)+r(A). f(A)=q(A)p_A(A)+r(A).

Since

pA(A)=0, p_A(A)=0,

we get

f(A)=r(A). f(A)=r(A).

Thus every polynomial in AA is equivalent to a polynomial of degree less than nn.

This idea extends, with additional hypotheses, to analytic functions of matrices.

70.18 Determinant and Trace in the Two by Two Case

For a 2×22\times2 matrix, Cayley-Hamilton says

A2tr(A)A+det(A)I=0. A^2-\operatorname{tr}(A)A+\det(A)I=0.

This identity shows how trace and determinant control the second power of AA.

The trace controls the coefficient of AA. The determinant controls the constant term.

For higher-dimensional matrices, the coefficients of the characteristic polynomial play analogous roles. They are built from determinant-like invariants and determine how AnA^n reduces to lower powers.

70.19 Projection Example

Let PP be a projection:

P2=P. P^2=P.

Then

P2P=0. P^2-P=0.

So PP is annihilated by

t2t=t(t1). t^2-t=t(t-1).

The Cayley-Hamilton theorem guarantees that the characteristic polynomial also annihilates PP.

If PP projects onto an rr-dimensional subspace of an nn-dimensional space, then its eigenvalues are 11 with multiplicity rr, and 00 with multiplicity nrn-r. Hence

pP(t)=tnr(t1)r. p_P(t)=t^{n-r}(t-1)^r.

Cayley-Hamilton gives

Pnr(PI)r=0. P^{n-r}(P-I)^r=0.

This is true, but the smaller polynomial

t(t1) t(t-1)

already annihilates PP. This illustrates the difference between the characteristic polynomial and the minimal polynomial.

70.20 Nilpotent Example

Let NN be nilpotent with

Nk=0. N^k=0.

If NN is n×nn\times n, all eigenvalues are zero, so the characteristic polynomial is

pN(t)=tn. p_N(t)=t^n.

Cayley-Hamilton gives

Nn=0. N^n=0.

Thus every nilpotent n×nn\times n matrix has nilpotency index at most nn.

The minimal polynomial may be

tk t^k

for some

kn. k\leq n.

Again, the minimal polynomial gives the sharp exponent, while the characteristic polynomial gives a universal exponent.

70.21 Consequences

The Cayley-Hamilton theorem has several important consequences.

ConsequenceMeaning
pA(A)=0p_A(A)=0The characteristic polynomial annihilates AA
mApAm_A\mid p_AThe minimal polynomial divides the characteristic polynomial
High powers reduceAkA^k for knk\geq n reduces to lower powers
Inverses become polynomialsIf AA is invertible, A1A^{-1} is a polynomial in AA
Matrix recurrencesPowers of AA satisfy a linear recurrence
Nilpotent boundIf NN is nilpotent n×nn\times n, then Nn=0N^n=0

These consequences make the theorem both structural and computational.

70.22 Common Errors

The first common error is to confuse

pA(A) p_A(A)

with

det(AIA). \det(AI-A).

The former is matrix polynomial evaluation. The latter is not the correct interpretation.

The second common error is to forget the identity matrix in the constant term. If

p(t)=t2+3t+5, p(t)=t^2+3t+5,

then

p(A)=A2+3A+5I, p(A)=A^2+3A+5I,

not

A2+3A+5. A^2+3A+5.

The third common error is to assume the characteristic polynomial is the smallest annihilating polynomial. It may not be. The smallest one is the minimal polynomial.

The fourth common error is to use Cayley-Hamilton as a numerical method for large matrix inverses. The theorem is exact and algebraic, but direct inverse computation through powers is usually unstable and inefficient.

70.23 Summary

The Cayley-Hamilton theorem states that every square matrix satisfies its own characteristic equation.

If

pA(t)=det(tIA), p_A(t)=\det(tI-A),

then

pA(A)=0. p_A(A)=0.

In expanded form, if

pA(t)=tn+cn1tn1++c0, p_A(t)=t^n+c_{n-1}t^{n-1}+\cdots+c_0,

then

An+cn1An1++c0I=0. A^n+c_{n-1}A^{n-1}+\cdots+c_0I=0.

The theorem proves that the characteristic polynomial is an annihilating polynomial. It implies that the minimal polynomial divides the characteristic polynomial, that high powers of a matrix reduce to lower powers, and that invertible matrices have inverses expressible as polynomials in themselves.