Skip to content

Chapter 63. Diagonalization

Diagonalization is the process of replacing a matrix by a diagonal matrix through a change of basis.

A diagonal matrix is simple because it acts independently on each coordinate. If a matrix can be diagonalized, then its action becomes easy to describe, its powers become easy to compute, and its long-term behavior becomes easier to analyze.

The central idea is this: a matrix is diagonalizable when the space has a basis made of eigenvectors. In that basis, the matrix only rescales each coordinate. An n×nn \times n matrix is diagonalizable exactly when it has nn linearly independent eigenvectors.

63.1 Diagonal Matrices

A diagonal matrix has zero entries outside the main diagonal:

D=[d1000d2000dn]. D= \begin{bmatrix} d_1 & 0 & \cdots & 0 \\ 0 & d_2 & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & d_n \end{bmatrix}.

For a vector

x=[x1x2xn], x= \begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{bmatrix},

we have

Dx=[d1x1d2x2dnxn]. Dx= \begin{bmatrix} d_1x_1 \\ d_2x_2 \\ \vdots \\ d_nx_n \end{bmatrix}.

Thus each coordinate is scaled independently.

The first coordinate is multiplied by d1d_1. The second coordinate is multiplied by d2d_2. In general, the ii-th coordinate is multiplied by did_i.

Diagonal matrices are the simplest square matrices after scalar multiples of the identity.

63.2 Similarity

Two square matrices AA and BB are similar if there exists an invertible matrix PP such that

B=P1AP. B=P^{-1}AP.

Similarity means that AA and BB represent the same linear transformation written in different bases.

The matrix PP changes coordinates from one basis to another. The matrix P1P^{-1} changes them back.

A diagonalization of AA is a similarity relation in which BB is diagonal.

Thus AA is diagonalizable if there exists an invertible matrix PP and a diagonal matrix DD such that

P1AP=D. P^{-1}AP=D.

Equivalently,

A=PDP1. A=PDP^{-1}.

A square matrix is called diagonalizable when it is similar to a diagonal matrix.

63.3 Definition

Let AA be an n×nn \times n matrix over a field FF.

The matrix AA is diagonalizable over FF if there exist an invertible n×nn \times n matrix PP and a diagonal n×nn \times n matrix DD, both with entries in FF, such that

A=PDP1. A=PDP^{-1}.

Equivalently,

P1AP=D. P^{-1}AP=D.

The diagonal entries of DD are eigenvalues of AA. The columns of PP are corresponding eigenvectors of AA.

The field matters. A real matrix may fail to diagonalize over R\mathbb{R}, but diagonalize over C\mathbb{C}.

63.4 Why Eigenvectors Produce Diagonalization

Suppose AA has nn linearly independent eigenvectors

v1,v2,,vn. v_1,v_2,\ldots,v_n.

Suppose their eigenvalues are

λ1,λ2,,λn, \lambda_1,\lambda_2,\ldots,\lambda_n,

so that

Avi=λivi Av_i=\lambda_i v_i

for each ii.

Form the matrix

P=[v1v2vn]. P= \begin{bmatrix} | & | & & | \\ v_1 & v_2 & \cdots & v_n \\ | & | & & | \end{bmatrix}.

Since the vectors v1,,vnv_1,\ldots,v_n are linearly independent, PP is invertible.

Now compute APAP. Multiplying AA by PP applies AA to each column:

AP=[Av1Av2Avn]. AP= \begin{bmatrix} | & | & & | \\ Av_1 & Av_2 & \cdots & Av_n \\ | & | & & | \end{bmatrix}.

Using Avi=λiviAv_i=\lambda_i v_i,

AP=[λ1v1λ2v2λnvn]. AP= \begin{bmatrix} | & | & & | \\ \lambda_1v_1 & \lambda_2v_2 & \cdots & \lambda_nv_n \\ | & | & & | \end{bmatrix}.

This can be written as

AP=PD, AP=PD,

where

D=[λ1000λ2000λn]. D= \begin{bmatrix} \lambda_1 & 0 & \cdots & 0 \\ 0 & \lambda_2 & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & \lambda_n \end{bmatrix}.

Since PP is invertible,

A=PDP1. A=PDP^{-1}.

This is the diagonalization of AA.

63.5 The Diagonalization Theorem

An n×nn \times n matrix AA is diagonalizable if and only if AA has nn linearly independent eigenvectors.

If

v1,v2,,vn v_1,v_2,\ldots,v_n

are linearly independent eigenvectors with corresponding eigenvalues

λ1,λ2,,λn, \lambda_1,\lambda_2,\ldots,\lambda_n,

then

P=[v1v2vn] P= \begin{bmatrix} | & | & & | \\ v_1 & v_2 & \cdots & v_n \\ | & | & & | \end{bmatrix}

and

D=[λ1000λ2000λn] D= \begin{bmatrix} \lambda_1 & 0 & \cdots & 0 \\ 0 & \lambda_2 & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & \lambda_n \end{bmatrix}

satisfy

A=PDP1. A=PDP^{-1}.

The eigenvectors must appear in PP in the same order as their eigenvalues appear in DD. This is the standard diagonalization theorem.

63.6 How to Diagonalize a Matrix

To diagonalize an n×nn \times n matrix AA, use the following procedure.

StepOperation
1Find the eigenvalues of AA.
2Find a basis for each eigenspace.
3Count the total number of independent eigenvectors.
4If the total is nn, place these eigenvectors as columns of PP.
5Place the matching eigenvalues on the diagonal of DD.
6Write A=PDP1A=PDP^{-1}.

If fewer than nn independent eigenvectors are available, then AA cannot be diagonalized.

63.7 Example: A Diagonalizable Matrix

Let

A=[2112]. A= \begin{bmatrix} 2 & 1 \\ 1 & 2 \end{bmatrix}.

The characteristic polynomial is

det(AλI)=det[2λ112λ]. \det(A-\lambda I) = \det \begin{bmatrix} 2-\lambda & 1 \\ 1 & 2-\lambda \end{bmatrix}.

Thus

det(AλI)=(2λ)21. \det(A-\lambda I)=(2-\lambda)^2-1.

Expand:

(2λ)21=λ24λ+3. (2-\lambda)^2-1 = \lambda^2-4\lambda+3.

Factor:

λ24λ+3=(λ1)(λ3). \lambda^2-4\lambda+3=(\lambda-1)(\lambda-3).

The eigenvalues are

λ=3andλ=1. \lambda=3 \qquad \text{and} \qquad \lambda=1.

For λ=3\lambda=3,

A3I=[1111]. A-3I= \begin{bmatrix} -1 & 1 \\ 1 & -1 \end{bmatrix}.

Solving

(A3I)v=0 (A-3I)v=0

gives

v1=[11]. v_1= \begin{bmatrix} 1 \\ 1 \end{bmatrix}.

For λ=1\lambda=1,

AI=[1111]. A-I= \begin{bmatrix} 1 & 1 \\ 1 & 1 \end{bmatrix}.

Solving

(AI)v=0 (A-I)v=0

gives

v2=[11]. v_2= \begin{bmatrix} 1 \\ -1 \end{bmatrix}.

The two eigenvectors are linearly independent, so AA is diagonalizable.

Set

P=[1111] P= \begin{bmatrix} 1 & 1 \\ 1 & -1 \end{bmatrix}

and

D=[3001]. D= \begin{bmatrix} 3 & 0 \\ 0 & 1 \end{bmatrix}.

Then

A=PDP1. A=PDP^{-1}.

63.8 Checking the Diagonalization

We can check the relation by verifying

AP=PD. AP=PD.

Compute

AP=[2112][1111]. AP= \begin{bmatrix} 2 & 1 \\ 1 & 2 \end{bmatrix} \begin{bmatrix} 1 & 1 \\ 1 & -1 \end{bmatrix}.

This gives

AP=[3131]. AP= \begin{bmatrix} 3 & 1 \\ 3 & -1 \end{bmatrix}.

Now compute

PD=[1111][3001]. PD= \begin{bmatrix} 1 & 1 \\ 1 & -1 \end{bmatrix} \begin{bmatrix} 3 & 0 \\ 0 & 1 \end{bmatrix}.

This gives

PD=[3131]. PD= \begin{bmatrix} 3 & 1 \\ 3 & -1 \end{bmatrix}.

Thus

AP=PD. AP=PD.

Since PP is invertible,

A=PDP1. A=PDP^{-1}.

63.9 Distinct Eigenvalues

If an n×nn \times n matrix has nn distinct eigenvalues, then it is diagonalizable.

This follows because eigenvectors corresponding to distinct eigenvalues are linearly independent.

Thus distinct eigenvalues give a simple sufficient condition for diagonalizability.

The converse is false. A matrix may be diagonalizable even when some eigenvalues are repeated.

For example,

A=[2002] A= \begin{bmatrix} 2 & 0 \\ 0 & 2 \end{bmatrix}

has only one distinct eigenvalue, namely 22. Yet every nonzero vector is an eigenvector, so the matrix is diagonalizable.

63.10 Repeated Eigenvalues

Repeated eigenvalues require more care.

Suppose λ\lambda is an eigenvalue with algebraic multiplicity mm. The eigenspace EλE_\lambda may have dimension less than mm, equal to mm, but never greater than mm.

For diagonalization, the sum of all eigenspace dimensions must equal nn:

λdimEλ=n. \sum_{\lambda}\dim E_\lambda=n.

If this equality holds, then the matrix is diagonalizable.

If it fails, then the matrix is defective.

This criterion is one of the most useful tests for diagonalizability: an n×nn \times n matrix is diagonalizable exactly when the dimensions of its eigenspaces add to nn.

63.11 Example: A Repeated Eigenvalue That Diagonalizes

Let

A=[400040007]. A= \begin{bmatrix} 4 & 0 & 0 \\ 0 & 4 & 0 \\ 0 & 0 & 7 \end{bmatrix}.

The eigenvalues are

4,4,7. 4,4,7.

The eigenspace for λ=4\lambda=4 is

E4=span{[100],[010]}. E_4= \operatorname{span} \left\{ \begin{bmatrix} 1 \\ 0 \\ 0 \end{bmatrix}, \begin{bmatrix} 0 \\ 1 \\ 0 \end{bmatrix} \right\}.

The eigenspace for λ=7\lambda=7 is

E7=span{[001]}. E_7= \operatorname{span} \left\{ \begin{bmatrix} 0 \\ 0 \\ 1 \end{bmatrix} \right\}.

The dimensions add to

dimE4+dimE7=2+1=3. \dim E_4+\dim E_7=2+1=3.

Therefore the matrix is diagonalizable.

In fact, it is already diagonal.

63.12 Example: A Repeated Eigenvalue That Does Not Diagonalize

Let

B=[2102]. B= \begin{bmatrix} 2 & 1 \\ 0 & 2 \end{bmatrix}.

The characteristic polynomial is

(2λ)2. (2-\lambda)^2.

Thus λ=2\lambda=2 has algebraic multiplicity 22.

Now compute the eigenspace:

B2I=[0100]. B-2I= \begin{bmatrix} 0 & 1 \\ 0 & 0 \end{bmatrix}.

Solving

(B2I)v=0 (B-2I)v=0

gives

y=0. y=0.

Therefore

E2=span{[10]}. E_2= \operatorname{span} \left\{ \begin{bmatrix} 1 \\ 0 \end{bmatrix} \right\}.

The eigenspace has dimension 11, but the matrix is 2×22\times 2. There are not enough independent eigenvectors to form a basis.

Therefore BB is not diagonalizable.

63.13 Diagonalization as a Change of Coordinates

Diagonalization is best understood as a change of coordinates.

In the standard basis, the matrix AA may mix coordinates. In an eigenbasis, it does not mix them.

Suppose

A=PDP1. A=PDP^{-1}.

To compute AxAx, one may view the operation in three stages:

StageOperationMeaning
1P1xP^{-1}xExpress xx in the eigenvector basis
2D(P1x)D(P^{-1}x)Scale each eigen-coordinate
3P[D(P1x)]P[D(P^{-1}x)]Return to the original basis

Thus

Ax=PDP1x. Ax=PDP^{-1}x.

The matrix P1P^{-1} changes into eigen-coordinates. The diagonal matrix DD performs independent scaling. The matrix PP changes back.

63.14 Powers of a Diagonalizable Matrix

One major use of diagonalization is computing powers.

If

A=PDP1, A=PDP^{-1},

then

A2=(PDP1)(PDP1). A^2=(PDP^{-1})(PDP^{-1}).

Since

P1P=I, P^{-1}P=I,

we get

A2=PD2P1. A^2=PD^2P^{-1}.

By induction,

Ak=PDkP1. A^k=PD^kP^{-1}.

For a diagonal matrix,

Dk=[λ1k000λ2k000λnk]. D^k= \begin{bmatrix} \lambda_1^k & 0 & \cdots & 0 \\ 0 & \lambda_2^k & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & \lambda_n^k \end{bmatrix}.

Thus powers of AA reduce to powers of its eigenvalues. This is a standard application of diagonalization.

63.15 Example: Computing Powers

Let

A=[2112]. A= \begin{bmatrix} 2 & 1 \\ 1 & 2 \end{bmatrix}.

We found

A=PDP1, A=PDP^{-1},

where

P=[1111],D=[3001]. P= \begin{bmatrix} 1 & 1 \\ 1 & -1 \end{bmatrix}, \qquad D= \begin{bmatrix} 3 & 0 \\ 0 & 1 \end{bmatrix}.

The inverse of PP is

P1=12[1111]. P^{-1} = \frac{1}{2} \begin{bmatrix} 1 & 1 \\ 1 & -1 \end{bmatrix}.

Therefore

Ak=P[3k001]P1. A^k = P \begin{bmatrix} 3^k & 0 \\ 0 & 1 \end{bmatrix} P^{-1}.

Compute:

Ak=[1111][3k001]12[1111]. A^k = \begin{bmatrix} 1 & 1 \\ 1 & -1 \end{bmatrix} \begin{bmatrix} 3^k & 0 \\ 0 & 1 \end{bmatrix} \frac{1}{2} \begin{bmatrix} 1 & 1 \\ 1 & -1 \end{bmatrix}.

First multiply the first two matrices:

[3k13k1]. \begin{bmatrix} 3^k & 1 \\ 3^k & -1 \end{bmatrix}.

Then multiply by P1P^{-1}:

Ak=12[3k+13k13k13k+1]. A^k= \frac{1}{2} \begin{bmatrix} 3^k+1 & 3^k-1 \\ 3^k-1 & 3^k+1 \end{bmatrix}.

This gives a closed formula for every positive integer kk.

63.16 Matrix Functions

Diagonalization also simplifies functions of matrices.

If a function ff can be applied to the eigenvalues, then for a diagonalizable matrix

A=PDP1, A=PDP^{-1},

we define

f(A)=Pf(D)P1, f(A)=P f(D) P^{-1},

where

f(D)=[f(λ1)000f(λ2)000f(λn)]. f(D)= \begin{bmatrix} f(\lambda_1) & 0 & \cdots & 0 \\ 0 & f(\lambda_2) & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & f(\lambda_n) \end{bmatrix}.

Important examples include:

Matrix functionUse
AkA^kDiscrete dynamical systems
A1A^{-1}Solving linear systems
eAe^ADifferential equations
A\sqrt{A}Matrix analysis
logA\log ALie theory and numerical analysis

For example, if

A=PDP1, A=PDP^{-1},

then

eA=PeDP1. e^A=Pe^DP^{-1}.

Since eDe^D is diagonal with entries eλie^{\lambda_i}, the computation becomes much simpler.

63.17 Diagonalization and Difference Equations

Consider a discrete dynamical system

xk+1=Axk. x_{k+1}=Ax_k.

Then

xk=Akx0. x_k=A^kx_0.

If AA is diagonalizable, then

xk=PDkP1x0. x_k=PD^kP^{-1}x_0.

The behavior of xkx_k is controlled by the powers of the eigenvalues.

If λi<1|\lambda_i|<1, the corresponding component decays.

If λi>1|\lambda_i|>1, the corresponding component grows.

If λi=1|\lambda_i|=1, the corresponding component persists or oscillates.

Diagonalization therefore separates the system into independent modes.

63.18 Orthogonal Diagonalization

Some matrices diagonalize in a stronger way.

A real symmetric matrix can be diagonalized using an orthogonal matrix:

A=QDQT, A=QDQ^T,

where

QTQ=I. Q^TQ=I.

The columns of QQ are orthonormal eigenvectors.

This is called orthogonal diagonalization. It is stronger than ordinary diagonalization because

Q1=QT. Q^{-1}=Q^T.

Real symmetric matrices are always orthogonally diagonalizable. More generally, normal complex matrices are unitarily diagonalizable.

63.19 Diagonalization over Different Fields

Diagonalizability depends on the field.

Consider the real matrix

R=[0110]. R= \begin{bmatrix} 0 & -1 \\ 1 & 0 \end{bmatrix}.

This matrix rotates the plane by 9090^\circ. Its characteristic polynomial is

λ2+1. \lambda^2+1.

Over R\mathbb{R}, this polynomial has no roots. Therefore RR has no real eigenvectors and cannot be diagonalized over R\mathbb{R}.

Over C\mathbb{C}, the roots are

iandi. i \qquad \text{and} \qquad -i.

The matrix has two complex eigenvectors and can be diagonalized over C\mathbb{C}.

When discussing diagonalization, one must specify the scalar field.

63.20 Diagonalization of Linear Transformations

Let

T:VV T:V\to V

be a linear transformation on a finite-dimensional vector space.

The transformation TT is diagonalizable if there exists a basis of VV consisting of eigenvectors of TT.

If such a basis exists, then the matrix of TT in that basis is diagonal.

Thus diagonalization is fundamentally a statement about bases, not about a single array of numbers.

A matrix diagonalizes when we can choose coordinates in which the transformation acts by independent scalar multiplication.

63.21 Common Errors

The first common error is to assume that every matrix is diagonalizable. Some matrices do not have enough independent eigenvectors.

The second common error is to place eigenvalues in DD in an order that does not match the eigenvectors in PP. The order must agree column by column.

The third common error is to confuse algebraic multiplicity with geometric multiplicity. Repeated roots of the characteristic polynomial do not automatically provide enough eigenvectors.

The fourth common error is to ignore the field. A matrix may diagonalize over C\mathbb{C} but not over R\mathbb{R}.

The fifth common error is to write A=P1DPA=P^{-1}DP instead of A=PDP1A=PDP^{-1}. If PP has eigenvectors as columns, the correct formula is

A=PDP1. A=PDP^{-1}.

63.22 Summary

Diagonalization expresses a matrix in the form

A=PDP1, A=PDP^{-1},

where DD is diagonal and PP is invertible.

The columns of PP are eigenvectors of AA. The diagonal entries of DD are the corresponding eigenvalues.

An n×nn\times n matrix is diagonalizable if and only if it has nn linearly independent eigenvectors. Equivalently, the dimensions of its eigenspaces add to nn.

Diagonalization changes coordinates into an eigenvector basis. In that basis, the linear transformation acts by independent scaling. This makes powers, matrix functions, dynamical systems, and spectral analysis much simpler.