Skip to content

Appendix E. Calculus Review

Linear algebra can be developed without much calculus, but many applications use both subjects together. Differential equations, optimization, least squares, matrix exponentials, Fourier analysis, numerical methods, and machine learning all rely on calculus ideas.

This appendix reviews the calculus needed later in the book. The purpose is not to replace a calculus course. It is to collect the definitions and formulas that occur most often in linear algebra applications.

E.1 Functions of One Variable

A function of one real variable assigns a real number f(x)f(x) to each input xx in its domain.

f:DR,xf(x). f : D \to \mathbb{R}, \qquad x \mapsto f(x).

For example,

f(x)=x2+3x1 f(x)=x^2+3x-1

defines a function on all real numbers.

The graph of ff is the set of points

{(x,f(x)):xD}. \{(x,f(x)) : x \in D\}.

A function may be studied locally, near a single point, or globally, over its whole domain. Calculus studies how functions change, how they accumulate area, and how they can be approximated by simpler functions.

E.2 Limits

The limit

limxaf(x)=L \lim_{x\to a} f(x)=L

means that f(x)f(x) becomes arbitrarily close to LL when xx is sufficiently close to aa, with xax\neq a.

Limits describe local behavior. They do not depend only on the value f(a)f(a). A function may have a limit at aa even if f(a)f(a) is undefined.

For example,

f(x)=x21x1 f(x)=\frac{x^2-1}{x-1}

is undefined at x=1x=1, but for x1x\neq 1,

x21x1=(x1)(x+1)x1=x+1. \frac{x^2-1}{x-1} = \frac{(x-1)(x+1)}{x-1} = x+1.

Thus

limx1x21x1=2. \lim_{x\to 1}\frac{x^2-1}{x-1}=2.

Limits are used to define derivatives, continuity, integrals, and infinite series.

E.3 Continuity

A function ff is continuous at aa if

limxaf(x)=f(a). \lim_{x\to a} f(x)=f(a).

This condition includes three requirements:

RequirementMeaning
f(a)f(a) existsThe function is defined at aa
limxaf(x)\lim_{x\to a}f(x) existsNearby values approach one number
The two are equalThe function has no jump or hole at aa

A function is continuous on an interval if it is continuous at every point of that interval.

Polynomials are continuous everywhere. Rational functions are continuous wherever their denominators are nonzero.

Continuity matters in linear algebra because many matrix-valued expressions depend continuously on their entries. Determinants, matrix products, eigenvalue approximations, and norms are all studied using continuity.

E.4 Derivatives

The derivative of ff at aa is

f(a)=limh0f(a+h)f(a)h, f'(a) = \lim_{h\to 0} \frac{f(a+h)-f(a)}{h},

when this limit exists.

The derivative measures the instantaneous rate of change of ff at aa. Geometrically, it is the slope of the tangent line to the graph of ff at aa.

For example, if

f(x)=x2, f(x)=x^2,

then

f(a)=limh0(a+h)2a2h. f'(a) = \lim_{h\to 0} \frac{(a+h)^2-a^2}{h}.

Expanding,

(a+h)2a2=2ah+h2. (a+h)^2-a^2 = 2ah+h^2.

Thus

(a+h)2a2h=2a+h. \frac{(a+h)^2-a^2}{h} = 2a+h.

Taking the limit gives

f(a)=2a. f'(a)=2a.

Therefore,

ddxx2=2x. \frac{d}{dx}x^2=2x.

E.5 Basic Differentiation Rules

The following rules are used throughout applied linear algebra.

RuleFormula
Constant ruleddxc=0\frac{d}{dx}c=0
Power ruleddxxn=nxn1\frac{d}{dx}x^n=nx^{n-1}
Constant multiple ruleddx(cf)=cf\frac{d}{dx}(cf)=cf'
Sum ruleddx(f+g)=f+g\frac{d}{dx}(f+g)=f'+g'
Product ruleddx(fg)=fg+fg\frac{d}{dx}(fg)=f'g+fg'
Quotient ruleddx(fg)=fgfgg2\frac{d}{dx}\left(\frac{f}{g}\right)=\frac{f'g-fg'}{g^2}
Chain ruleddxf(g(x))=f(g(x))g(x)\frac{d}{dx}f(g(x))=f'(g(x))g'(x)

Example

Let

f(x)=(x2+1)5. f(x)=(x^2+1)^5.

By the chain rule,

f(x)=5(x2+1)4(2x)=10x(x2+1)4. f'(x) = 5(x^2+1)^4(2x) = 10x(x^2+1)^4.

The chain rule is especially important in optimization and machine learning, where functions are often built as compositions of simpler maps.

E.6 Higher Derivatives

The second derivative is the derivative of the derivative:

f(x)=ddxf(x). f''(x)=\frac{d}{dx}f'(x).

More generally, the kk-th derivative is denoted by

f(k)(x). f^{(k)}(x).

If

f(x)=x4, f(x)=x^4,

then

f(x)=4x3, f'(x)=4x^3, f(x)=12x2, f''(x)=12x^2, f(3)(x)=24x, f^{(3)}(x)=24x,

and

f(4)(x)=24. f^{(4)}(x)=24.

The first derivative measures slope. The second derivative measures curvature. Higher derivatives appear in Taylor polynomials and error estimates.

E.7 Critical Points and Optimization

A critical point of a differentiable function ff is a point aa where

f(a)=0. f'(a)=0.

At such a point, the tangent line is horizontal. Local maxima and local minima often occur at critical points.

If ff is twice differentiable, then the second derivative test gives a useful classification:

ConditionConclusion
f(a)=0f'(a)=0, f(a)>0f''(a)>0Local minimum
f(a)=0f'(a)=0, f(a)<0f''(a)<0Local maximum
f(a)=0f'(a)=0, f(a)=0f''(a)=0Test inconclusive

Example

Let

f(x)=x24x+7. f(x)=x^2-4x+7.

Then

f(x)=2x4. f'(x)=2x-4.

Set

2x4=0. 2x-4=0.

Thus

x=2. x=2.

Since

f(x)=2>0, f''(x)=2>0,

the function has a local minimum at x=2x=2. The minimum value is

f(2)=48+7=3. f(2)=4-8+7=3.

Optimization in higher dimensions generalizes this idea using gradients and Hessian matrices.

E.8 Integrals

The definite integral

abf(x)dx \int_a^b f(x)\,dx

measures signed accumulation of ff over the interval [a,b][a,b]. Geometrically, it measures signed area under the graph.

An antiderivative of ff is a function FF such that

F(x)=f(x). F'(x)=f(x).

The indefinite integral is written

f(x)dx=F(x)+C, \int f(x)\,dx = F(x)+C,

where CC is an arbitrary constant.

Example

Since

ddxx3=3x2, \frac{d}{dx}x^3=3x^2,

we have

3x2dx=x3+C. \int 3x^2\,dx=x^3+C.

Integrals appear in continuous least squares, Fourier coefficients, inner products of functions, probability, and differential equations.

E.9 Fundamental Theorem of Calculus

The fundamental theorem of calculus connects derivatives and integrals.

If ff is continuous on [a,b][a,b] and FF is an antiderivative of ff, then

abf(x)dx=F(b)F(a). \int_a^b f(x)\,dx = F(b)-F(a).

For example,

012xdx=[x2]01=1202=1. \int_0^1 2x\,dx = [x^2]_0^1 = 1^2-0^2 = 1.

This theorem turns many integral problems into antiderivative problems. In linear algebra applications, it justifies many formulas involving continuous inner products and energy norms.

E.10 Basic Integration Rules

RuleFormula
Constant multiplecf(x)dx=cf(x)dx\int cf(x)\,dx=c\int f(x)\,dx
Sum rule(f+g)dx=fdx+gdx\int(f+g)\,dx=\int f\,dx+\int g\,dx
Power rulexndx=xn+1n+1+C, n1\int x^n\,dx=\frac{x^{n+1}}{n+1}+C,\ n\neq -1
Reciprocal rule(\int \frac{1}{x},dx=\ln
Exponential ruleexdx=ex+C\int e^x\,dx=e^x+C
Sine rulesinxdx=cosx+C\int \sin x\,dx=-\cos x+C
Cosine rulecosxdx=sinx+C\int \cos x\,dx=\sin x+C

Example

(3x24x+1)dx=x32x2+x+C. \int (3x^2-4x+1)\,dx = x^3-2x^2+x+C.

Integration rules are used less often than differentiation rules in basic linear algebra, but they become essential when vector spaces of functions are studied.

E.11 Integration by Parts

Integration by parts follows from the product rule. If uu and vv are differentiable functions, then

udv=uvvdu. \int u\,dv = uv-\int v\,du.

Equivalently,

abu(x)v(x)dx=[u(x)v(x)]ababu(x)v(x)dx. \int_a^b u(x)v'(x)\,dx = [u(x)v(x)]_a^b - \int_a^b u'(x)v(x)\,dx.

This identity is important in differential equations, Fourier analysis, and weak formulations of linear systems.

Example

Compute

xexdx. \int x e^x\,dx.

Let

u=x,dv=exdx. u=x, \qquad dv=e^x\,dx.

Then

du=dx,v=ex. du=dx, \qquad v=e^x.

Therefore,

xexdx=xexexdx=xexex+C. \int x e^x\,dx = xe^x-\int e^x\,dx = xe^x-e^x+C.

E.12 Functions of Several Variables

A function of several variables has the form

f:RnR. f:\mathbb{R}^n\to\mathbb{R}.

For example,

f(x,y)=x2+xy+y2 f(x,y)=x^2+xy+y^2

is a function from R2\mathbb{R}^2 to R\mathbb{R}.

In vector notation, we often write

f(x) f(x)

where

x=[x1xn]. x= \begin{bmatrix} x_1 \\ \vdots \\ x_n \end{bmatrix}.

Such functions occur constantly in optimization. A common example is the least squares objective

f(x)=Axb2. f(x)=\|Ax-b\|^2.

This is a scalar-valued function of a vector variable.

E.13 Partial Derivatives

The partial derivative of f(x1,,xn)f(x_1,\ldots,x_n) with respect to xix_i measures how ff changes when xix_i varies and all other variables are held fixed.

It is denoted by

fxi. \frac{\partial f}{\partial x_i}.

Example

Let

f(x,y)=x2y+3y2. f(x,y)=x^2y+3y^2.

Then

fx=2xy, \frac{\partial f}{\partial x} = 2xy,

because yy is treated as constant.

Also,

fy=x2+6y. \frac{\partial f}{\partial y} = x^2+6y.

Partial derivatives are the building blocks of gradients, Jacobians, and Hessians.

E.14 Gradient

For a differentiable function

f:RnR, f:\mathbb{R}^n\to\mathbb{R},

the gradient is the vector of partial derivatives:

f(x)=[fx1(x)fx2(x)fxn(x)]. \nabla f(x) = \begin{bmatrix} \frac{\partial f}{\partial x_1}(x) \\ \frac{\partial f}{\partial x_2}(x) \\ \vdots \\ \frac{\partial f}{\partial x_n}(x) \end{bmatrix}.

The gradient points in the direction of steepest increase of the function.

Example

Let

f(x,y)=x2+xy+y2. f(x,y)=x^2+xy+y^2.

Then

f(x,y)=[2x+yx+2y]. \nabla f(x,y) = \begin{bmatrix} 2x+y \\ x+2y \end{bmatrix}.

For optimization, critical points satisfy

f(x)=0. \nabla f(x)=0.

In least squares, setting a gradient equal to zero leads to the normal equations.

E.15 Hessian Matrix

The Hessian matrix of a twice differentiable function

f:RnR f:\mathbb{R}^n\to\mathbb{R}

is the matrix of second partial derivatives:

Hf(x)=[2fx122fx1x22fx1xn2fx2x12fx222fx2xn2fxnx12fxnx22fxn2]. H_f(x) = \begin{bmatrix} \frac{\partial^2 f}{\partial x_1^2} & \frac{\partial^2 f}{\partial x_1\partial x_2} & \cdots & \frac{\partial^2 f}{\partial x_1\partial x_n} \\ \frac{\partial^2 f}{\partial x_2\partial x_1} & \frac{\partial^2 f}{\partial x_2^2} & \cdots & \frac{\partial^2 f}{\partial x_2\partial x_n} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial^2 f}{\partial x_n\partial x_1} & \frac{\partial^2 f}{\partial x_n\partial x_2} & \cdots & \frac{\partial^2 f}{\partial x_n^2} \end{bmatrix}.

The Hessian describes local curvature.

Example

For

f(x,y)=x2+xy+y2, f(x,y)=x^2+xy+y^2,

we have

Hf(x,y)=[2112]. H_f(x,y) = \begin{bmatrix} 2 & 1 \\ 1 & 2 \end{bmatrix}.

This matrix is constant because ff is a quadratic function.

Quadratic functions are central in linear algebra because their gradients are linear functions and their Hessians are constant matrices.

E.16 Directional Derivatives

Let

f:RnR f:\mathbb{R}^n\to\mathbb{R}

be differentiable, and let uRnu\in\mathbb{R}^n. The directional derivative of ff at xx in the direction uu is

Duf(x)=limt0f(x+tu)f(x)t. D_u f(x) = \lim_{t\to 0} \frac{f(x+tu)-f(x)}{t}.

If uu is a unit vector, then Duf(x)D_u f(x) measures the rate of change of ff per unit distance in the direction uu.

For differentiable functions,

Duf(x)=f(x)u. D_u f(x)=\nabla f(x)\cdot u.

Thus the gradient contains all directional derivative information.

E.17 Jacobian Matrix

For a differentiable vector-valued function

F:RnRm, F:\mathbb{R}^n\to\mathbb{R}^m,

where

F(x)=[F1(x)Fm(x)], F(x)= \begin{bmatrix} F_1(x) \\ \vdots \\ F_m(x) \end{bmatrix},

the Jacobian matrix is

JF(x)=[F1x1F1xnFmx1Fmxn]. J_F(x) = \begin{bmatrix} \frac{\partial F_1}{\partial x_1} & \cdots & \frac{\partial F_1}{\partial x_n} \\ \vdots & \ddots & \vdots \\ \frac{\partial F_m}{\partial x_1} & \cdots & \frac{\partial F_m}{\partial x_n} \end{bmatrix}.

The Jacobian is the best linear approximation to FF near xx.

For a linear map

F(x)=Ax, F(x)=Ax,

the Jacobian is simply

JF(x)=A. J_F(x)=A.

Thus matrices appear naturally as derivatives of vector-valued functions.

E.18 Taylor Polynomials

Taylor polynomials approximate a differentiable function near a point by a polynomial. For a function ff with sufficiently many derivatives, the Taylor polynomial of degree nn about aa is

Tn(x)=f(a)+f(a)(xa)+f(a)2!(xa)2++f(n)(a)n!(xa)n. T_n(x) = f(a)+f'(a)(x-a)+\frac{f''(a)}{2!}(x-a)^2+\cdots+ \frac{f^{(n)}(a)}{n!}(x-a)^n.

The corresponding Taylor series is

k=0f(k)(a)k!(xa)k. \sum_{k=0}^{\infty} \frac{f^{(k)}(a)}{k!}(x-a)^k.

When a=0a=0, it is called a Maclaurin series.

Taylor expansions are used to approximate nonlinear functions by linear or quadratic functions. This is the bridge from nonlinear problems back to linear algebra.

For small hh,

f(a+h)f(a)+f(a)h. f(a+h) \approx f(a)+f'(a)h.

This is the first-order, or linear, approximation.

For a function of several variables,

f(x+h)f(x)+f(x)Th. f(x+h) \approx f(x)+\nabla f(x)^T h.

The right-hand side is affine in hh. Its linear part is determined by the gradient.

E.19 Common Taylor Series

The following Taylor series are frequently used:

FunctionSeries near 00
exe^xk=0xkk!\sum_{k=0}^{\infty}\frac{x^k}{k!}
sinx\sin xk=0(1)kx2k+1(2k+1)!\sum_{k=0}^{\infty}(-1)^k\frac{x^{2k+1}}{(2k+1)!}
cosx\cos xk=0(1)kx2k(2k)!\sum_{k=0}^{\infty}(-1)^k\frac{x^{2k}}{(2k)!}
11x\frac{1}{1-x}(\sum_{k=0}^{\infty}x^k,\
ln(1+x)\ln(1+x)(\sum_{k=1}^{\infty}(-1)^{k+1}\frac{x^k}{k},\

These expansions are used in matrix functions. For example, the matrix exponential is defined by replacing xx with a square matrix AA:

eA=I+A+A22!+A33!+. e^A = I+A+\frac{A^2}{2!}+\frac{A^3}{3!}+\cdots.

E.20 Differential Equations

A differential equation is an equation involving an unknown function and its derivatives.

A first-order linear differential equation may have the form

x(t)=ax(t). x'(t)=ax(t).

Its solution is

x(t)=Ceat. x(t)=Ce^{at}.

For systems, one obtains

x(t)=Ax(t), x'(t)=Ax(t),

where AA is a matrix and x(t)x(t) is a vector-valued function.

The solution is expressed using the matrix exponential:

x(t)=etAx(0). x(t)=e^{tA}x(0).

Thus linear algebra gives the natural language for systems of differential equations.

E.21 Inner Products of Functions

Calculus allows vector space ideas to be applied to functions.

For continuous functions on [a,b][a,b], define

f,g=abf(x)g(x)dx. \langle f,g\rangle = \int_a^b f(x)g(x)\,dx.

This is an inner product on a suitable function space.

The corresponding norm is

f=abf(x)2dx. \|f\| = \sqrt{ \int_a^b f(x)^2\,dx }.

Orthogonality means

abf(x)g(x)dx=0. \int_a^b f(x)g(x)\,dx=0.

This idea leads to Fourier series, orthogonal polynomials, projection methods, and continuous least squares.

E.22 Summary

Calculus studies change, accumulation, approximation, and motion. Linear algebra studies vectors, matrices, spaces, and linear transformations. The two subjects meet whenever a problem is approximated, optimized, discretized, or written as a system.

Key ideas from this appendix include:

ConceptRole in linear algebra
DerivativeLocal rate of change
GradientVector of first derivatives
HessianMatrix of second derivatives
JacobianMatrix of a derivative
IntegralContinuous accumulation
Taylor polynomialLinear and quadratic approximation
Differential equationDynamics expressed by matrices
Function inner productGeometry of function spaces

The most important connection is this: the derivative of a sufficiently smooth function is a linear approximation. That is why matrices occur throughout calculus-based applications.