Skip to content

Chapter 72. Matrix Functions

A matrix function is a rule that applies a scalar function to a square matrix.

For a scalar xx, expressions such as

x2,ex,x,logx x^2,\qquad e^x,\qquad \sqrt{x},\qquad \log x

are ordinary functions of one variable. For a square matrix AA, one may define analogous expressions:

A2,eA,A1/2,logA. A^2,\qquad e^A,\qquad A^{1/2},\qquad \log A.

Matrix functions are important because they let us transfer scalar functions into linear algebra. They appear in differential equations, Markov processes, control theory, numerical analysis, quantum mechanics, optimization, statistics, and graph theory. The matrix exponential, for example, is defined by a power series and is used to solve linear systems of differential equations.

72.1 Polynomial Functions of Matrices

The simplest matrix functions are polynomial functions.

Let

p(t)=a0+a1t+a2t2++aktk. p(t)=a_0+a_1t+a_2t^2+\cdots+a_kt^k.

For a square matrix AA, define

p(A)=a0I+a1A+a2A2++akAk. p(A)=a_0I+a_1A+a_2A^2+\cdots+a_kA^k.

The identity matrix II appears in the constant term.

For example, if

p(t)=t32t+5, p(t)=t^3-2t+5,

then

p(A)=A32A+5I. p(A)=A^3-2A+5I.

This definition is purely algebraic. It uses only matrix addition, scalar multiplication, and matrix multiplication.

72.2 Powers as Matrix Functions

The function

f(t)=tk f(t)=t^k

gives the matrix function

f(A)=Ak. f(A)=A^k.

Powers of matrices are basic in discrete dynamical systems. If

xm+1=Axm, x_{m+1}=Ax_m,

then

xm=Amx0. x_m=A^m x_0.

Thus the behavior of the system is controlled by the powers of AA.

When AA is diagonalizable, powers are especially simple. If

A=PDP1, A=PDP^{-1},

then

Ak=PDkP1. A^k=PD^kP^{-1}.

Since DD is diagonal, DkD^k is obtained by raising each diagonal entry to the kk-th power.

72.3 Functions of Diagonal Matrices

Let

D=[λ1000λ2000λn]. D= \begin{bmatrix} \lambda_1 & 0 & \cdots & 0 \\ 0 & \lambda_2 & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & \lambda_n \end{bmatrix}.

If ff is a scalar function defined at each λi\lambda_i, define

f(D)=[f(λ1)000f(λ2)000f(λn)]. f(D)= \begin{bmatrix} f(\lambda_1) & 0 & \cdots & 0 \\ 0 & f(\lambda_2) & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & f(\lambda_n) \end{bmatrix}.

Thus a function of a diagonal matrix is obtained by applying the function entry by entry to the diagonal.

For example,

eD=[eλ1000eλ2000eλn]. e^D= \begin{bmatrix} e^{\lambda_1} & 0 & \cdots & 0 \\ 0 & e^{\lambda_2} & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & e^{\lambda_n} \end{bmatrix}.

This diagonal rule is the model for the general theory.

72.4 Functions of Diagonalizable Matrices

Suppose AA is diagonalizable:

A=PDP1. A=PDP^{-1}.

Then define

f(A)=Pf(D)P1. f(A)=Pf(D)P^{-1}.

This definition says: change to an eigenvector basis, apply ff to each eigenvalue, then change back.

If

D=diag(λ1,,λn), D=\operatorname{diag}(\lambda_1,\ldots,\lambda_n),

then

f(D)=diag(f(λ1),,f(λn)). f(D)=\operatorname{diag}(f(\lambda_1),\ldots,f(\lambda_n)).

Therefore

f(A)=Pdiag(f(λ1),,f(λn))P1. f(A)=P \operatorname{diag}(f(\lambda_1),\ldots,f(\lambda_n)) P^{-1}.

For polynomial functions, this agrees with direct polynomial evaluation. If A=PDP1A=PDP^{-1}, then p(A)=Pp(D)P1p(A)=Pp(D)P^{-1}, and p(D)p(D) is obtained by applying pp to the diagonal entries.

72.5 Example: A Matrix Square

Let

A=[2112]. A= \begin{bmatrix} 2 & 1 \\ 1 & 2 \end{bmatrix}.

This matrix has eigenvalues

3and1. 3 \qquad \text{and} \qquad 1.

One diagonalization is

A=PDP1, A=PDP^{-1},

where

P=[1111],D=[3001]. P= \begin{bmatrix} 1 & 1 \\ 1 & -1 \end{bmatrix}, \qquad D= \begin{bmatrix} 3 & 0 \\ 0 & 1 \end{bmatrix}.

Then

Ak=PDkP1. A^k=PD^kP^{-1}.

Since

Dk=[3k001], D^k= \begin{bmatrix} 3^k & 0 \\ 0 & 1 \end{bmatrix},

we get

Ak=P[3k001]P1. A^k= P \begin{bmatrix} 3^k & 0 \\ 0 & 1 \end{bmatrix} P^{-1}.

Thus powers of AA reduce to powers of its eigenvalues.

72.6 Matrix Exponential

The matrix exponential is one of the most important matrix functions.

For a square matrix AA, define

eA=k=0Akk!. e^A = \sum_{k=0}^{\infty}\frac{A^k}{k!}.

That is,

eA=I+A+A22!+A33!+. e^A = I+A+\frac{A^2}{2!}+\frac{A^3}{3!}+\cdots.

This series always converges for real or complex square matrices, so the matrix exponential is well-defined.

If AA is diagonalizable and

A=PDP1, A=PDP^{-1},

then

eA=PeDP1. e^A=Pe^DP^{-1}.

If

D=diag(λ1,,λn), D=\operatorname{diag}(\lambda_1,\ldots,\lambda_n),

then

eD=diag(eλ1,,eλn). e^D=\operatorname{diag}(e^{\lambda_1},\ldots,e^{\lambda_n}).

72.7 Matrix Exponential and Differential Equations

The matrix exponential solves constant-coefficient linear systems.

Consider

x(t)=Ax(t), x'(t)=Ax(t),

with initial condition

x(0)=x0. x(0)=x_0.

The solution is

x(t)=etAx0. x(t)=e^{tA}x_0.

This is the matrix analogue of the scalar equation

x(t)=ax(t), x'(t)=ax(t),

whose solution is

x(t)=eatx(0). x(t)=e^{at}x(0).

The matrix exponential is therefore the natural evolution operator for linear differential equations. It is commonly characterized as the solution operator for such systems.

72.8 Exponential of a Diagonalizable Matrix

Let

A=PDP1, A=PDP^{-1},

where

D=diag(λ1,,λn). D=\operatorname{diag}(\lambda_1,\ldots,\lambda_n).

Then

etA=PetDP1. e^{tA}=Pe^{tD}P^{-1}.

Since

etD=diag(etλ1,,etλn), e^{tD} = \operatorname{diag} (e^{t\lambda_1},\ldots,e^{t\lambda_n}),

we have

etA=Pdiag(etλ1,,etλn)P1. e^{tA} = P \operatorname{diag} (e^{t\lambda_1},\ldots,e^{t\lambda_n}) P^{-1}.

This formula separates the solution into independent modes. Each eigenvalue contributes a scalar exponential factor.

If Re(λi)<0\operatorname{Re}(\lambda_i)<0, that mode decays.

If Re(λi)>0\operatorname{Re}(\lambda_i)>0, that mode grows.

If Re(λi)=0\operatorname{Re}(\lambda_i)=0, that mode persists or oscillates.

72.9 Exponential of a Nilpotent Matrix

A matrix NN is nilpotent if

Nm=0 N^m=0

for some positive integer mm.

For such a matrix, the exponential series terminates:

eN=I+N+N22!++Nm1(m1)!. e^N = I+N+\frac{N^2}{2!}+\cdots+\frac{N^{m-1}}{(m-1)!}.

All higher powers vanish.

For example, let

N=[0100]. N= \begin{bmatrix} 0 & 1 \\ 0 & 0 \end{bmatrix}.

Then

N2=0. N^2=0.

Therefore

eN=I+N=[1101]. e^N=I+N = \begin{bmatrix} 1 & 1 \\ 0 & 1 \end{bmatrix}.

Nilpotent matrices explain the polynomial factors that occur in exponentials of Jordan blocks.

72.10 Functions of Jordan Blocks

Let

J=λI+N J=\lambda I+N

be a Jordan block, where NN is nilpotent.

For a polynomial or analytic function ff, the function of JJ is obtained from the finite Taylor expansion

f(J)=f(λ)I+f(λ)N+f(λ)2!N2++f(k1)(λ)(k1)!Nk1. f(J) = f(\lambda)I + f'(\lambda)N + \frac{f''(\lambda)}{2!}N^2 + \cdots + \frac{f^{(k-1)}(\lambda)}{(k-1)!}N^{k-1}.

Here kk is the size of the Jordan block.

For the exponential function,

eJ=eλI+N. e^J=e^{\lambda I+N}.

Since λI\lambda I commutes with NN,

eJ=eλeN. e^J=e^\lambda e^N.

This is the standard way to compute matrix functions through Jordan form. Matrix functions for Jordan blocks involve derivatives of the scalar function at the eigenvalue.

72.11 Matrix Functions and the Minimal Polynomial

The minimal polynomial controls matrix functions algebraically.

Suppose mA(t)m_A(t) has degree rr. Every polynomial in AA can be reduced modulo mA(t)m_A(t) to a polynomial of degree less than rr.

If

p(t)=q(t)mA(t)+s(t), p(t)=q(t)m_A(t)+s(t),

where

degs<degmA, \deg s<\deg m_A,

then

p(A)=s(A), p(A)=s(A),

because

mA(A)=0. m_A(A)=0.

Thus polynomial functions of AA live in the finite-dimensional algebra

F[A]=span{I,A,A2,,Ar1}. F[A]=\operatorname{span}\{I,A,A^2,\ldots,A^{r-1}\}.

The Cayley-Hamilton theorem gives the weaker but always useful reduction to degree less than nn.

72.12 Interpolation Definition

For many functions, f(A)f(A) can be described by polynomial interpolation.

If AA is diagonalizable with distinct eigenvalues

λ1,,λk, \lambda_1,\ldots,\lambda_k,

choose a polynomial p(t)p(t) such that

p(λi)=f(λi) p(\lambda_i)=f(\lambda_i)

for every ii.

Then define

f(A)=p(A). f(A)=p(A).

This is well-defined: any two such polynomials differ by a polynomial that vanishes at all eigenvalues, and hence annihilates AA when AA is diagonalizable with those eigenvalues.

For non-diagonalizable matrices, interpolation must also match derivatives up to the sizes of the Jordan blocks. This is Hermite interpolation.

72.13 Matrix Square Roots

A matrix BB is a square root of AA if

B2=A. B^2=A.

We write

B=A1/2 B=A^{1/2}

when a particular square root is chosen.

If AA is diagonalizable and

A=PDP1, A=PDP^{-1},

then a square root may be constructed by taking square roots of the eigenvalues:

A1/2=PD1/2P1, A^{1/2} = P D^{1/2} P^{-1},

where

D1/2=diag(λ1,,λn). D^{1/2} = \operatorname{diag} (\sqrt{\lambda_1},\ldots,\sqrt{\lambda_n}).

This requires choosing square roots of the eigenvalues.

For symmetric positive semidefinite matrices, there is a unique symmetric positive semidefinite square root. This case is especially important in statistics, covariance matrices, optimization, and numerical analysis.

72.14 Matrix Logarithm

A matrix logarithm of AA is a matrix BB such that

eB=A. e^B=A.

We write

B=logA B=\log A

when a particular branch is chosen.

The matrix logarithm is more delicate than the exponential. It may not be unique, and existence depends on spectral conditions and the field.

If AA is diagonalizable and

A=PDP1, A=PDP^{-1},

then one may define

logA=P(logD)P1, \log A=P(\log D)P^{-1},

where

logD=diag(logλ1,,logλn). \log D= \operatorname{diag} (\log \lambda_1,\ldots,\log \lambda_n).

This requires choosing branches of the scalar logarithm for the eigenvalues.

The logarithm is important in Lie theory, differential equations, geometric integration, and matrix means.

72.15 Trigonometric Matrix Functions

Trigonometric functions can also be defined by power series.

For a square matrix AA,

cosA=IA22!+A44!, \cos A = I-\frac{A^2}{2!}+\frac{A^4}{4!}-\cdots,

and

sinA=AA33!+A55!. \sin A = A-\frac{A^3}{3!}+\frac{A^5}{5!}-\cdots.

These definitions parallel the scalar power series.

Matrix trigonometric functions appear in oscillatory systems, wave equations, rotations, and second-order differential equations.

A general matrix function theory includes exponential, logarithmic, square root, and trigonometric functions, often using power series or spectral definitions.

72.16 Commuting Matrices

For scalars,

ex+y=exey. e^{x+y}=e^xe^y.

For matrices, this identity requires commutation.

If

AB=BA, AB=BA,

then

eA+B=eAeB. e^{A+B}=e^Ae^B.

If AA and BB do not commute, this equality may fail. This is one of the main differences between scalar functions and matrix functions. The exponential identity eA+B=eAeBe^{A+B}=e^Ae^B holds for commuting matrices, but not in general.

Similarly, many scalar identities become conditional in matrix algebra because multiplication is not commutative.

72.17 Functions Preserve Similarity

Matrix functions are compatible with similarity.

If

B=P1AP, B=P^{-1}AP,

then for any polynomial pp,

p(B)=P1p(A)P. p(B)=P^{-1}p(A)P.

The same relation holds for matrix functions defined by power series or spectral calculus:

f(B)=P1f(A)P. f(B)=P^{-1}f(A)P.

Thus similar matrices have similar matrix functions.

This is essential because a matrix function should describe the underlying linear transformation, not the accidental choice of basis.

72.18 Spectral Mapping

For many standard matrix functions, eigenvalues transform according to the scalar function.

If

Av=λv, Av=\lambda v,

then for a polynomial pp,

p(A)v=p(λ)v. p(A)v=p(\lambda)v.

Thus vv is also an eigenvector of p(A)p(A), with eigenvalue p(λ)p(\lambda).

More generally, when f(A)f(A) is defined through an appropriate functional calculus, the eigenvalues of f(A)f(A) are

f(λ), f(\lambda),

where λ\lambda ranges over the eigenvalues of AA, with multiplicities handled according to the chosen setting.

This principle is called spectral mapping.

72.19 Matrix Functions of Normal Matrices

If AA is normal over C\mathbb{C}, then

A=UDU A=UDU^*

with UU unitary and DD diagonal.

For such matrices,

f(A)=Uf(D)U. f(A)=Uf(D)U^*.

This is the cleanest setting for matrix functions. The unitary matrix UU preserves norms and inner products, so the computation is stable and geometrically transparent.

Hermitian matrices, unitary matrices, and real symmetric matrices are important special cases.

For a Hermitian matrix, if ff is real-valued on the spectrum, then f(A)f(A) is Hermitian.

For a positive definite Hermitian matrix, functions such as

A1/2,logA,Aα A^{1/2}, \qquad \log A, \qquad A^\alpha

are especially well behaved.

72.20 Numerical Computation

Computing matrix functions numerically is a separate problem from defining them.

For small diagonalizable matrices, an eigenvalue decomposition may be convenient. For symmetric or Hermitian matrices, unitary diagonalization is often stable.

For general matrices, eigenvalue methods may be unstable, especially near repeated eigenvalues or defective matrices. Practical algorithms often use Schur decomposition, scaling and squaring, Padé approximants, Krylov methods, or specialized iterative methods.

For the matrix exponential, common numerical approaches include scaling and squaring with Padé approximation, Taylor methods, and methods based on Schur decomposition. General-purpose methods vary in stability depending on the matrix class.

72.21 Applications

Matrix functions appear whenever a scalar transformation must act on a linear operator.

FunctionTypical use
AkA^kDiscrete dynamics, Markov chains
etAe^{tA}Linear differential equations
A1/2A^{1/2}Covariance matrices, positive definite geometry
logA\log ALie groups, matrix means, geometric integration
sinA,cosA\sin A,\cos AOscillatory systems
(A+αI)1(A+\alpha I)^{-1}Regularization, resolvents
f(L)f(L) for graph Laplacian LLGraph filters, diffusion, spectral methods

The common idea is that the matrix represents a linear transformation, and the function changes its spectral behavior.

72.22 Common Errors

The first common error is to apply a scalar function entrywise to a general matrix. For example,

eA e^A

usually does not mean exponentiating each entry of AA. Entrywise exponentiation is a different operation.

The second common error is to assume scalar identities always hold. Matrix multiplication is noncommutative, so identities such as

eA+B=eAeB e^{A+B}=e^Ae^B

need commutation.

The third common error is to ignore the spectrum. Functions such as logA\log A and A1/2A^{1/2} depend on eigenvalues and branch choices.

The fourth common error is to diagonalize numerically without checking conditioning. A matrix may be diagonalizable in theory but poorly conditioned in practice.

The fifth common error is to confuse the characteristic polynomial with the minimal polynomial. The minimal polynomial gives the sharper algebraic reduction for functions of AA.

72.23 Summary

A matrix function applies a scalar function to a square matrix.

For polynomials,

p(A)=a0I+a1A++akAk. p(A)=a_0I+a_1A+\cdots+a_kA^k.

For diagonalizable matrices,

A=PDP1 A=PDP^{-1}

gives

f(A)=Pf(D)P1. f(A)=Pf(D)P^{-1}.

For Jordan blocks, derivatives of ff enter through the nilpotent part.

The matrix exponential is defined by

eA=k=0Akk!, e^A=\sum_{k=0}^{\infty}\frac{A^k}{k!},

and it solves linear systems of differential equations.

Matrix functions generalize powers, exponentials, square roots, logarithms, and trigonometric functions to linear transformations. They are controlled by eigenvalues, minimal polynomials, Jordan structure, and similarity.