Skip to content

Chapter 116. Computer Graphics

Computer graphics studies how to construct, transform, project, shade, and display geometric objects using computation.

Linear algebra is the basic language of computer graphics. Points are represented by vectors. Transformations are represented by matrices. Cameras are represented by coordinate changes and projection matrices. Surfaces are described by vertices, normals, tangent vectors, and meshes. Images are arrays of color values.

The most common graphics pipeline repeatedly applies linear algebra:

object coordinatesworld coordinatescamera coordinatesclip coordinatesscreen coordinates. \text{object coordinates} \to \text{world coordinates} \to \text{camera coordinates} \to \text{clip coordinates} \to \text{screen coordinates}.

Each step is a change of representation. Matrix multiplication makes these changes precise and efficient.

116.1 Points and Vectors

A point represents a position. A vector represents a direction and magnitude.

In ordinary coordinates, both may be written as triples:

p=[xyz],v=[vxvyvz]. p = \begin{bmatrix} x \\ y \\ z \end{bmatrix}, \qquad v = \begin{bmatrix} v_x \\ v_y \\ v_z \end{bmatrix}.

Their meanings are different. A point says where something is. A vector says how to move or in which direction something points.

For example, if pp is a point and vv is a vector, then

p+v p+v

is another point.

But adding two points has no direct geometric meaning unless an origin has been chosen.

Computer graphics must distinguish these two ideas. Homogeneous coordinates provide a convenient way to do this.

116.2 Homogeneous Coordinates

In three-dimensional graphics, a point is often represented as

[xyz1], \begin{bmatrix} x \\ y \\ z \\ 1 \end{bmatrix},

while a direction vector is represented as

[vxvyvz0]. \begin{bmatrix} v_x \\ v_y \\ v_z \\ 0 \end{bmatrix}.

The final coordinate is called the homogeneous coordinate.

The value 11 marks a point. The value 00 marks a direction.

This distinction matters because translations affect points but do not affect direction vectors. A 4×44 \times 4 homogeneous transformation matrix can translate points while leaving direction vectors unaffected. Sources on graphics transformations commonly use homogeneous coordinates so that affine transformations, including translation, can be expressed as matrix multiplication.

116.3 Affine Transformations

An affine transformation has the form

p=Ap+b, p' = Ap + b,

where AA is a matrix and bb is a translation vector.

The matrix AA may rotate, scale, shear, or reflect. The vector bb shifts the point.

Using homogeneous coordinates, the affine transformation becomes one matrix multiplication:

[p1]=[Ab01][p1]. \begin{bmatrix} p' \\ 1 \end{bmatrix} = \begin{bmatrix} A & b \\ 0 & 1 \end{bmatrix} \begin{bmatrix} p \\ 1 \end{bmatrix}.

For a three-dimensional point, this is a 4×44 \times 4 matrix. This is why graphics systems usually store object transformations as 4×44 \times 4 matrices.

116.4 Translation

Translation moves every point by the same displacement.

If

t=[txtytz], t = \begin{bmatrix} t_x \\ t_y \\ t_z \end{bmatrix},

then translation sends

pp+t. p \mapsto p+t.

In homogeneous coordinates, the translation matrix is

T=[100tx010ty001tz0001]. T = \begin{bmatrix} 1 & 0 & 0 & t_x \\ 0 & 1 & 0 & t_y \\ 0 & 0 & 1 & t_z \\ 0 & 0 & 0 & 1 \end{bmatrix}.

Then

T[xyz1]=[x+txy+tyz+tz1]. T \begin{bmatrix} x \\ y \\ z \\ 1 \end{bmatrix} = \begin{bmatrix} x+t_x \\ y+t_y \\ z+t_z \\ 1 \end{bmatrix}.

If the input is a direction vector,

T[vxvyvz0]=[vxvyvz0]. T \begin{bmatrix} v_x \\ v_y \\ v_z \\ 0 \end{bmatrix} = \begin{bmatrix} v_x \\ v_y \\ v_z \\ 0 \end{bmatrix}.

The translation has no effect because the last coordinate is 00.

116.5 Scaling

Scaling changes size.

A nonuniform scaling matrix is

S=[sx0000sy0000sz00001]. S = \begin{bmatrix} s_x & 0 & 0 & 0 \\ 0 & s_y & 0 & 0 \\ 0 & 0 & s_z & 0 \\ 0 & 0 & 0 & 1 \end{bmatrix}.

It sends

[xyz1][sxxsyyszz1]. \begin{bmatrix} x \\ y \\ z \\ 1 \end{bmatrix} \mapsto \begin{bmatrix} s_xx \\ s_yy \\ s_zz \\ 1 \end{bmatrix}.

If

sx=sy=sz, s_x=s_y=s_z,

the scaling is uniform.

If the scale factors differ, the object is stretched by different amounts along different axes.

Scaling changes distances and areas. Nonuniform scaling also changes angles.

116.6 Rotation

Rotation preserves distances and angles.

In two dimensions, rotation by angle θ\theta is represented by

R(θ)=[cosθsinθsinθcosθ]. R(\theta) = \begin{bmatrix} \cos\theta & -\sin\theta \\ \sin\theta & \cos\theta \end{bmatrix}.

In homogeneous coordinates,

R(θ)=[cosθsinθ0sinθcosθ0001]. R(\theta) = \begin{bmatrix} \cos\theta & -\sin\theta & 0 \\ \sin\theta & \cos\theta & 0 \\ 0 & 0 & 1 \end{bmatrix}.

In three dimensions, rotations are usually expressed around coordinate axes.

Rotation about the zz-axis is

Rz(θ)=[cosθsinθ00sinθcosθ0000100001]. R_z(\theta) = \begin{bmatrix} \cos\theta & -\sin\theta & 0 & 0 \\ \sin\theta & \cos\theta & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{bmatrix}.

Rotation matrices are orthogonal:

RTR=I. R^TR=I.

Thus

R1=RT. R^{-1}=R^T.

This makes rotations numerically and geometrically special.

116.7 Composition of Transformations

Graphics transformations are composed by matrix multiplication.

If M1M_1 is applied first and M2M_2 is applied second, then the combined transformation is

M2M1. M_2M_1.

For a point pp,

p=M2(M1p)=(M2M1)p. p' = M_2(M_1p) = (M_2M_1)p.

Matrix multiplication is associative, so many transformations can be precomputed into one matrix:

M=TRS. M = T R S.

This may represent scaling, followed by rotation, followed by translation.

The order matters. In general,

TRRT. TR \neq RT.

Rotating then translating usually produces a different result from translating then rotating.

116.8 Object Coordinates and World Coordinates

A model is usually defined in its own local coordinate system.

These are object coordinates.

For example, a cube may be defined with vertices centered at the origin:

(1,1,1),,(1,1,1). (-1,-1,-1),\ldots,(1,1,1).

To place the cube in a scene, a model matrix MM maps object coordinates to world coordinates:

pworld=Mpobject. p_{\text{world}} = M p_{\text{object}}.

The model matrix encodes the object’s scale, orientation, and position.

Each object in a scene may have its own model matrix.

116.9 Camera Coordinates

The camera defines another coordinate system.

World coordinates describe where objects are in the scene. Camera coordinates describe where objects are relative to the viewer.

A view matrix VV maps world coordinates to camera coordinates:

pcamera=Vpworld. p_{\text{camera}} = V p_{\text{world}}.

This is usually the inverse of the camera’s world transform.

If the camera is moved to the right, the scene appears to move left. If the camera rotates, the world appears to rotate oppositely.

Thus the view matrix is a change of basis and origin.

116.10 Projection

Projection maps three-dimensional points to a lower-dimensional viewing space.

Two important projections are orthographic projection and perspective projection.

Orthographic projection keeps parallel lines parallel. It removes depth from the visible coordinates without making distant objects smaller.

A simple orthographic projection onto the plane z=0z=0 sends

(x,y,z)(x,y,0). (x,y,z) \mapsto (x,y,0).

It can be represented by the matrix

P=[100010000]. P = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 0 \end{bmatrix}.

Orthographic projection is a linear transformation. Standard references give this projection and its homogeneous-coordinate version as matrix transformations.

116.11 Perspective Projection

Perspective projection makes distant objects appear smaller.

This matches ordinary visual experience. Parallel lines may appear to meet at a vanishing point.

In homogeneous coordinates, perspective projection is represented by a matrix followed by division by the homogeneous coordinate.

After projection, a point has the form

[xcyczcwc]. \begin{bmatrix} x_c \\ y_c \\ z_c \\ w_c \end{bmatrix}.

The visible normalized coordinates are obtained by perspective division:

xn=xcwc,yn=ycwc,zn=zcwc. x_n = \frac{x_c}{w_c}, \qquad y_n = \frac{y_c}{w_c}, \qquad z_n = \frac{z_c}{w_c}.

The division by wcw_c is what produces perspective foreshortening.

This is why projective geometry and homogeneous coordinates are natural in computer graphics.

116.12 The Model-View-Projection Matrix

The main transformation chain is often written as

pclip=PVMpobject. p_{\text{clip}} = PVMp_{\text{object}}.

Here:

MatrixMeaning
MMModel matrix
VVView matrix
PPProjection matrix

Their product

MVP=PVM MVP = PVM

is often called the model-view-projection transformation, with the convention depending on whether column vectors or row vectors are used.

For column vectors, the rightmost matrix acts first:

pclip=PVMpobject. p_{\text{clip}} = P V M p_{\text{object}}.

This chain is a direct application of composition of linear and affine maps.

116.13 Coordinate Frames

A coordinate frame consists of an origin and basis vectors.

In three dimensions, a frame may be written as

(o;e1,e2,e3). (o; e_1,e_2,e_3).

A point can be described relative to this frame by coordinates

p=o+xe1+ye2+ze3. p = o + x e_1 + y e_2 + z e_3.

Changing coordinate frames is a change of basis plus a translation.

This is why camera movement, object placement, skeletal animation, and scene graphs all depend on matrix representations of coordinate frames.

116.14 Normals

A surface normal is a vector perpendicular to a surface.

Normals are used for lighting, shading, reflection, and visibility.

If a surface has tangent vectors uu and vv, then a normal direction is given by the cross product:

n=u×v. n = u \times v.

For a triangle with vertices p1,p2,p3p_1,p_2,p_3, two edge vectors are

u=p2p1,v=p3p1. u=p_2-p_1, \qquad v=p_3-p_1.

Then

n=u×v n = u \times v

is perpendicular to the triangle.

Normals are usually normalized:

n^=nn. \hat{n}=\frac{n}{\|n\|}.

Unit normals are needed because many lighting formulas use dot products.

116.15 Transforming Normals

Normals do not always transform like ordinary direction vectors.

Suppose positions are transformed by a matrix AA. A normal nn should remain perpendicular to transformed tangent vectors.

If tangent vectors transform by

u=Au, u' = Au,

then the transformed normal must satisfy

(n)T(Au)=0 (n')^T(Au)=0

for every tangent vector uu with

nTu=0. n^Tu=0.

This is achieved by

n=ATn. n' = A^{-T}n.

Thus normals are transformed by the inverse transpose of the linear part of the model matrix.

For pure rotations, this equals the original rotation matrix. For nonuniform scaling or shear, the distinction matters.

116.16 Lighting and Dot Products

A simple diffuse lighting model depends on the angle between a surface normal and a light direction.

Let nn be a unit normal and ll be a unit vector pointing toward the light.

The diffuse intensity is proportional to

max(0,nTl). \max(0,n^Tl).

The dot product measures alignment.

If nTl=1n^Tl=1, the surface faces the light directly.

If nTl=0n^Tl=0, the light is tangent to the surface.

If nTl<0n^Tl<0, the surface faces away from the light, so the diffuse contribution is usually set to zero.

This is one of the simplest places where inner products appear in rendering.

116.17 Reflection

Reflection of a direction vector vv across a plane with unit normal nn is

r=v2(nTv)n. r = v - 2(n^Tv)n.

This formula subtracts twice the component of vv in the normal direction.

Reflection is used in mirror effects, specular highlights, ray tracing, and collision response.

The formula follows from orthogonal decomposition:

v=v+v, v = v_{\parallel} + v_{\perp},

where

v=(nTv)n. v_{\perp}=(n^Tv)n.

Reflection preserves the parallel part and reverses the perpendicular part.

116.18 Triangles and Meshes

Most real-time graphics represents surfaces using triangle meshes.

A mesh contains vertices, edges, and faces.

Each triangle is specified by three vertex positions:

p1,p2,p3. p_1,p_2,p_3.

A point inside the triangle can be represented using barycentric coordinates:

p=αp1+βp2+γp3, p = \alpha p_1 + \beta p_2 + \gamma p_3,

where

α+β+γ=1. \alpha+\beta+\gamma=1.

If

α,β,γ0, \alpha,\beta,\gamma \geq 0,

then the point lies inside or on the boundary of the triangle.

Barycentric coordinates are used for interpolation of colors, normals, texture coordinates, and depth values.

116.19 Rasterization

Rasterization converts geometric primitives into pixels.

For each triangle, the renderer determines which pixels it covers. Then it computes interpolated quantities at those pixels.

If a vertex has color values c1,c2,c3c_1,c_2,c_3, then the color at a point with barycentric coordinates (α,β,γ)(\alpha,\beta,\gamma) is

c=αc1+βc2+γc3. c = \alpha c_1 + \beta c_2 + \gamma c_3.

The same linear interpolation applies to texture coordinates, normals, and other vertex attributes.

Thus rasterization uses affine combinations across triangles.

116.20 Depth and Visibility

When several objects project to the same pixel, the visible one is usually the closest to the camera.

The depth buffer stores a depth value for each pixel.

For a new fragment, the renderer compares its depth with the stored depth. If the new fragment is closer, it replaces the old value.

Depth values are produced by the projection transformation and perspective division.

Although visibility is a geometric problem, its implementation depends on the transformed coordinates produced by matrix operations.

116.21 Texture Mapping

Texture mapping attaches image data to a surface.

Each vertex may have texture coordinates

(u,v). (u,v).

For a point inside a triangle, texture coordinates are interpolated from the vertex values.

The texture image is then sampled at the interpolated coordinate.

Perspective-correct interpolation is required because ordinary linear interpolation after projection does not preserve the correct projective relationship.

Homogeneous coordinates again appear: the renderer interpolates quantities divided by depth and then corrects by the interpolated reciprocal depth.

116.22 Quaternions and Rotations

Rotations in three dimensions can be represented by matrices, Euler angles, or quaternions.

Rotation matrices are convenient for transforming vectors.

Euler angles are easy to interpret but can suffer from singularities such as gimbal lock.

Quaternions provide a compact and stable representation of orientation.

A unit quaternion can represent a rotation. It is often used for animation and interpolation.

Spherical linear interpolation, or slerp, smoothly interpolates between two orientations.

Even when quaternions are used internally, they are commonly converted to matrices before being applied to vertices.

116.23 Skeletal Animation

Skeletal animation represents a character as a mesh controlled by bones.

Each bone has a transformation matrix. Vertices are influenced by one or more bones.

A skinned vertex position is often computed as

p=i=1kwiMip, p' = \sum_{i=1}^k w_i M_i p,

where MiM_i is a bone matrix and wiw_i is a weight.

The weights satisfy

w1++wk=1. w_1+\cdots+w_k=1.

This is a weighted linear combination of transformed positions.

Skeletal animation therefore combines affine transformations, matrix products, and convex combinations.

116.24 Ray Tracing

Ray tracing follows rays through a scene.

A ray has the form

r(t)=o+td,t0. r(t)=o+td, \qquad t\geq 0.

Here oo is the ray origin and dd is the ray direction.

Intersecting rays with objects requires solving equations.

For a plane with normal nn and point p0p_0, the intersection satisfies

nT(o+tdp0)=0. n^T(o+td-p_0)=0.

Solving for tt,

t=nT(p0o)nTd. t = \frac{n^T(p_0-o)}{n^Td}.

This formula uses dot products and linear equations.

Ray tracing also uses reflection vectors, refraction, coordinate transforms, acceleration structures, and many matrix operations.

116.25 Cameras and Projection Matrices

A camera maps three-dimensional world points to two-dimensional image points.

In projective form, a camera can be represented by a matrix. In computer vision, a pinhole camera projection matrix is commonly written as a 3×43 \times 4 matrix that maps homogeneous 3D points to homogeneous image points up to scale.

If XX is a homogeneous world point and CC is a camera matrix, then

xCX. x \sim CX.

The symbol \sim means equality up to nonzero scalar multiple.

This expresses the projective nature of image formation.

Computer graphics and computer vision use closely related projective geometry, although often in opposite directions: graphics projects known 3D scenes to images, while vision often reconstructs 3D structure from images.

116.26 Linear Algebra in the Graphics Pipeline

The standard graphics pipeline is a chain of linear algebra operations.

Graphics conceptLinear algebra object
PointVector with homogeneous coordinate 11
DirectionVector with homogeneous coordinate 00
TranslationHomogeneous matrix
RotationOrthogonal matrix
ScalingDiagonal matrix
Object placementModel matrix
Camera transformView matrix
ProjectionProjection matrix
Surface normalOrthogonal vector
LightingDot product
Triangle interpolationBarycentric coordinates
AnimationMatrix products and weighted sums
ImageMatrix of pixels

This table shows why graphics is one of the most direct applications of finite-dimensional linear algebra.

116.27 Numerical Issues

Computer graphics uses floating point arithmetic.

This creates numerical concerns.

Repeated transformations can accumulate error. Rotation matrices may slowly lose orthogonality. Very small or very large depth ranges can reduce depth-buffer precision. Nearly parallel rays and planes may produce unstable intersection calculations.

Common remedies include:

IssueRemedy
Rotation driftRe-normalize or use quaternions
Ill-conditioned transformsAvoid extreme scaling
Depth precision lossChoose near and far planes carefully
Normal distortionUse inverse-transpose normal matrix
Ray intersection instabilityUse tolerances and robust tests

Graphics systems therefore depend not only on exact linear algebra but also on numerical linear algebra.

116.28 Summary

Computer graphics represents geometry with vectors and transformations with matrices.

Homogeneous coordinates allow translation, rotation, scaling, shear, and projection to be handled in a unified matrix framework. Model, view, and projection matrices form the main coordinate transformation chain. Dot products support lighting. Cross products produce normals. Barycentric coordinates support interpolation across triangles. Matrix products support animation, camera movement, and object placement.

The central principle is that images are produced by transforming geometry through coordinate systems. Linear algebra supplies the precise operations that make this possible.