Chapter 116. Computer Graphics

Computer graphics studies how to construct, transform, project, shade, and display geometric objects using computation.

Linear algebra is the basic language of computer graphics. Points are represented by vectors. Transformations are represented by matrices. Cameras are represented by coordinate changes and projection matrices. Surfaces are described by vertices, normals, tangent vectors, and meshes. Images are arrays of color values.

The most common graphics pipeline repeatedly applies linear algebra:

\text{object coordinates} \to \text{world coordinates} \to \text{camera coordinates} \to \text{clip coordinates} \to \text{screen coordinates}.

Each step is a change of representation. Matrix multiplication makes these changes precise and efficient.

116.1 Points and Vectors

A point represents a position. A vector represents a direction and magnitude.

In ordinary coordinates, both may be written as triples:

p = \begin{bmatrix} x \\ y \\ z \end{bmatrix}, \qquad v = \begin{bmatrix} v_x \\ v_y \\ v_z \end{bmatrix}.

Their meanings are different. A point says where something is. A vector says how to move or in which direction something points.

For example, if $p$ is a point and $v$ is a vector, then

p+v

is another point.

But adding two points has no direct geometric meaning unless an origin has been chosen.

Computer graphics must distinguish these two ideas. Homogeneous coordinates provide a convenient way to do this.

116.2 Homogeneous Coordinates

In three-dimensional graphics, a point is often represented as

\begin{bmatrix} x \\ y \\ z \\ 1 \end{bmatrix},

while a direction vector is represented as

\begin{bmatrix} v_x \\ v_y \\ v_z \\ 0 \end{bmatrix}.

The final coordinate is called the homogeneous coordinate.

The value $1$ marks a point. The value $0$ marks a direction.

This distinction matters because translations affect points but do not affect direction vectors. A $4 \times 4$ homogeneous transformation matrix can translate points while leaving direction vectors unaffected. Sources on graphics transformations commonly use homogeneous coordinates so that affine transformations, including translation, can be expressed as matrix multiplication.

116.3 Affine Transformations

An affine transformation has the form

p' = Ap + b,

where $A$ is a matrix and $b$ is a translation vector.

The matrix $A$ may rotate, scale, shear, or reflect. The vector $b$ shifts the point.

Using homogeneous coordinates, the affine transformation becomes one matrix multiplication:

\begin{bmatrix} p' \\ 1 \end{bmatrix} = \begin{bmatrix} A & b \\ 0 & 1 \end{bmatrix} \begin{bmatrix} p \\ 1 \end{bmatrix}.

For a three-dimensional point, this is a $4 \times 4$ matrix. This is why graphics systems usually store object transformations as $4 \times 4$ matrices.

116.4 Translation

Translation moves every point by the same displacement.

t = \begin{bmatrix} t_x \\ t_y \\ t_z \end{bmatrix},

then translation sends

p \mapsto p+t.

In homogeneous coordinates, the translation matrix is

T = \begin{bmatrix} 1 & 0 & 0 & t_x \\ 0 & 1 & 0 & t_y \\ 0 & 0 & 1 & t_z \\ 0 & 0 & 0 & 1 \end{bmatrix}.

Then

T \begin{bmatrix} x \\ y \\ z \\ 1 \end{bmatrix} = \begin{bmatrix} x+t_x \\ y+t_y \\ z+t_z \\ 1 \end{bmatrix}.

If the input is a direction vector,

T \begin{bmatrix} v_x \\ v_y \\ v_z \\ 0 \end{bmatrix} = \begin{bmatrix} v_x \\ v_y \\ v_z \\ 0 \end{bmatrix}.

The translation has no effect because the last coordinate is $0$ .

116.5 Scaling

Scaling changes size.

A nonuniform scaling matrix is

S = \begin{bmatrix} s_x & 0 & 0 & 0 \\ 0 & s_y & 0 & 0 \\ 0 & 0 & s_z & 0 \\ 0 & 0 & 0 & 1 \end{bmatrix}.

It sends

\begin{bmatrix} x \\ y \\ z \\ 1 \end{bmatrix} \mapsto \begin{bmatrix} s_xx \\ s_yy \\ s_zz \\ 1 \end{bmatrix}.

s_x=s_y=s_z,

the scaling is uniform.

If the scale factors differ, the object is stretched by different amounts along different axes.

Scaling changes distances and areas. Nonuniform scaling also changes angles.

116.6 Rotation

Rotation preserves distances and angles.

In two dimensions, rotation by angle $\theta$ is represented by

R(\theta) = \begin{bmatrix} \cos\theta & -\sin\theta \\ \sin\theta & \cos\theta \end{bmatrix}.

In homogeneous coordinates,

R(\theta) = \begin{bmatrix} \cos\theta & -\sin\theta & 0 \\ \sin\theta & \cos\theta & 0 \\ 0 & 0 & 1 \end{bmatrix}.

In three dimensions, rotations are usually expressed around coordinate axes.

Rotation about the $z$ -axis is

R_z(\theta) = \begin{bmatrix} \cos\theta & -\sin\theta & 0 & 0 \\ \sin\theta & \cos\theta & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{bmatrix}.

Rotation matrices are orthogonal:

R^TR=I.

Thus

R^{-1}=R^T.

This makes rotations numerically and geometrically special.

116.7 Composition of Transformations

Graphics transformations are composed by matrix multiplication.

If $M_1$ is applied first and $M_2$ is applied second, then the combined transformation is

M_2M_1.

For a point $p$ ,

p' = M_2(M_1p) = (M_2M_1)p.

Matrix multiplication is associative, so many transformations can be precomputed into one matrix:

M = T R S.

This may represent scaling, followed by rotation, followed by translation.

The order matters. In general,

TR \neq RT.

Rotating then translating usually produces a different result from translating then rotating.

116.8 Object Coordinates and World Coordinates

A model is usually defined in its own local coordinate system.

These are object coordinates.

For example, a cube may be defined with vertices centered at the origin:

(-1,-1,-1),\ldots,(1,1,1).

To place the cube in a scene, a model matrix $M$ maps object coordinates to world coordinates:

p_{\text{world}} = M p_{\text{object}}.

The model matrix encodes the object’s scale, orientation, and position.

Each object in a scene may have its own model matrix.

116.9 Camera Coordinates

The camera defines another coordinate system.

World coordinates describe where objects are in the scene. Camera coordinates describe where objects are relative to the viewer.

A view matrix $V$ maps world coordinates to camera coordinates:

p_{\text{camera}} = V p_{\text{world}}.

This is usually the inverse of the camera’s world transform.

If the camera is moved to the right, the scene appears to move left. If the camera rotates, the world appears to rotate oppositely.

Thus the view matrix is a change of basis and origin.

116.10 Projection

Projection maps three-dimensional points to a lower-dimensional viewing space.

Two important projections are orthographic projection and perspective projection.

Orthographic projection keeps parallel lines parallel. It removes depth from the visible coordinates without making distant objects smaller.

A simple orthographic projection onto the plane $z=0$ sends

(x,y,z) \mapsto (x,y,0).

It can be represented by the matrix

P = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 0 \end{bmatrix}.

Orthographic projection is a linear transformation. Standard references give this projection and its homogeneous-coordinate version as matrix transformations.

116.11 Perspective Projection

Perspective projection makes distant objects appear smaller.

This matches ordinary visual experience. Parallel lines may appear to meet at a vanishing point.

In homogeneous coordinates, perspective projection is represented by a matrix followed by division by the homogeneous coordinate.

After projection, a point has the form

\begin{bmatrix} x_c \\ y_c \\ z_c \\ w_c \end{bmatrix}.

The visible normalized coordinates are obtained by perspective division:

x_n = \frac{x_c}{w_c}, \qquad y_n = \frac{y_c}{w_c}, \qquad z_n = \frac{z_c}{w_c}.

The division by $w_c$ is what produces perspective foreshortening.

This is why projective geometry and homogeneous coordinates are natural in computer graphics.

116.12 The Model-View-Projection Matrix

The main transformation chain is often written as

p_{\text{clip}} = PVMp_{\text{object}}.

Here:

Matrix	Meaning
$M$	Model matrix
$V$	View matrix
$P$	Projection matrix

Their product

MVP = PVM

is often called the model-view-projection transformation, with the convention depending on whether column vectors or row vectors are used.

For column vectors, the rightmost matrix acts first:

p_{\text{clip}} = P V M p_{\text{object}}.

This chain is a direct application of composition of linear and affine maps.

116.13 Coordinate Frames

A coordinate frame consists of an origin and basis vectors.

In three dimensions, a frame may be written as

(o; e_1,e_2,e_3).

A point can be described relative to this frame by coordinates

p = o + x e_1 + y e_2 + z e_3.

Changing coordinate frames is a change of basis plus a translation.

This is why camera movement, object placement, skeletal animation, and scene graphs all depend on matrix representations of coordinate frames.

116.14 Normals

A surface normal is a vector perpendicular to a surface.

Normals are used for lighting, shading, reflection, and visibility.

If a surface has tangent vectors $u$ and $v$ , then a normal direction is given by the cross product:

n = u \times v.

For a triangle with vertices $p_1,p_2,p_3$ , two edge vectors are

u=p_2-p_1, \qquad v=p_3-p_1.

Then

n = u \times v

is perpendicular to the triangle.

Normals are usually normalized:

\hat{n}=\frac{n}{\|n\|}.

Unit normals are needed because many lighting formulas use dot products.

116.15 Transforming Normals

Normals do not always transform like ordinary direction vectors.

Suppose positions are transformed by a matrix $A$ . A normal $n$ should remain perpendicular to transformed tangent vectors.

If tangent vectors transform by

u' = Au,

then the transformed normal must satisfy

(n')^T(Au)=0

for every tangent vector $u$ with

n^Tu=0.

This is achieved by

n' = A^{-T}n.

Thus normals are transformed by the inverse transpose of the linear part of the model matrix.

For pure rotations, this equals the original rotation matrix. For nonuniform scaling or shear, the distinction matters.

116.16 Lighting and Dot Products

A simple diffuse lighting model depends on the angle between a surface normal and a light direction.

Let $n$ be a unit normal and $l$ be a unit vector pointing toward the light.

The diffuse intensity is proportional to

\max(0,n^Tl).

The dot product measures alignment.

If $n^Tl=1$ , the surface faces the light directly.

If $n^Tl=0$ , the light is tangent to the surface.

If $n^Tl<0$ , the surface faces away from the light, so the diffuse contribution is usually set to zero.

This is one of the simplest places where inner products appear in rendering.

116.17 Reflection

Reflection of a direction vector $v$ across a plane with unit normal $n$ is

r = v - 2(n^Tv)n.

This formula subtracts twice the component of $v$ in the normal direction.

Reflection is used in mirror effects, specular highlights, ray tracing, and collision response.

The formula follows from orthogonal decomposition:

v = v_{\parallel} + v_{\perp},

where

v_{\perp}=(n^Tv)n.

Reflection preserves the parallel part and reverses the perpendicular part.

116.18 Triangles and Meshes

Most real-time graphics represents surfaces using triangle meshes.

A mesh contains vertices, edges, and faces.

Each triangle is specified by three vertex positions:

p_1,p_2,p_3.

A point inside the triangle can be represented using barycentric coordinates:

p = \alpha p_1 + \beta p_2 + \gamma p_3,

where

\alpha+\beta+\gamma=1.

\alpha,\beta,\gamma \geq 0,

then the point lies inside or on the boundary of the triangle.

Barycentric coordinates are used for interpolation of colors, normals, texture coordinates, and depth values.

116.19 Rasterization

Rasterization converts geometric primitives into pixels.

For each triangle, the renderer determines which pixels it covers. Then it computes interpolated quantities at those pixels.

If a vertex has color values $c_1,c_2,c_3$ , then the color at a point with barycentric coordinates $(\alpha,\beta,\gamma)$ is

c = \alpha c_1 + \beta c_2 + \gamma c_3.

The same linear interpolation applies to texture coordinates, normals, and other vertex attributes.

Thus rasterization uses affine combinations across triangles.

116.20 Depth and Visibility

When several objects project to the same pixel, the visible one is usually the closest to the camera.

The depth buffer stores a depth value for each pixel.

For a new fragment, the renderer compares its depth with the stored depth. If the new fragment is closer, it replaces the old value.

Depth values are produced by the projection transformation and perspective division.

Although visibility is a geometric problem, its implementation depends on the transformed coordinates produced by matrix operations.

116.21 Texture Mapping

Texture mapping attaches image data to a surface.

Each vertex may have texture coordinates

(u,v).

For a point inside a triangle, texture coordinates are interpolated from the vertex values.

The texture image is then sampled at the interpolated coordinate.

Perspective-correct interpolation is required because ordinary linear interpolation after projection does not preserve the correct projective relationship.

Homogeneous coordinates again appear: the renderer interpolates quantities divided by depth and then corrects by the interpolated reciprocal depth.

116.22 Quaternions and Rotations

Rotations in three dimensions can be represented by matrices, Euler angles, or quaternions.

Rotation matrices are convenient for transforming vectors.

Euler angles are easy to interpret but can suffer from singularities such as gimbal lock.

Quaternions provide a compact and stable representation of orientation.

A unit quaternion can represent a rotation. It is often used for animation and interpolation.

Spherical linear interpolation, or slerp, smoothly interpolates between two orientations.

Even when quaternions are used internally, they are commonly converted to matrices before being applied to vertices.

116.23 Skeletal Animation

Skeletal animation represents a character as a mesh controlled by bones.

Each bone has a transformation matrix. Vertices are influenced by one or more bones.

A skinned vertex position is often computed as

p' = \sum_{i=1}^k w_i M_i p,

where $M_i$ is a bone matrix and $w_i$ is a weight.

The weights satisfy

w_1+\cdots+w_k=1.

This is a weighted linear combination of transformed positions.

Skeletal animation therefore combines affine transformations, matrix products, and convex combinations.

116.24 Ray Tracing

Ray tracing follows rays through a scene.

A ray has the form

r(t)=o+td, \qquad t\geq 0.

Here $o$ is the ray origin and $d$ is the ray direction.

Intersecting rays with objects requires solving equations.

For a plane with normal $n$ and point $p_0$ , the intersection satisfies

n^T(o+td-p_0)=0.

Solving for $t$ ,

t = \frac{n^T(p_0-o)}{n^Td}.

This formula uses dot products and linear equations.

Ray tracing also uses reflection vectors, refraction, coordinate transforms, acceleration structures, and many matrix operations.

116.25 Cameras and Projection Matrices

A camera maps three-dimensional world points to two-dimensional image points.

In projective form, a camera can be represented by a matrix. In computer vision, a pinhole camera projection matrix is commonly written as a $3 \times 4$ matrix that maps homogeneous 3D points to homogeneous image points up to scale.

If $X$ is a homogeneous world point and $C$ is a camera matrix, then

x \sim CX.

The symbol $\sim$ means equality up to nonzero scalar multiple.

This expresses the projective nature of image formation.

Computer graphics and computer vision use closely related projective geometry, although often in opposite directions: graphics projects known 3D scenes to images, while vision often reconstructs 3D structure from images.

116.26 Linear Algebra in the Graphics Pipeline

The standard graphics pipeline is a chain of linear algebra operations.

Graphics concept	Linear algebra object
Point	Vector with homogeneous coordinate $1$
Direction	Vector with homogeneous coordinate $0$
Translation	Homogeneous matrix
Rotation	Orthogonal matrix
Scaling	Diagonal matrix
Object placement	Model matrix
Camera transform	View matrix
Projection	Projection matrix
Surface normal	Orthogonal vector
Lighting	Dot product
Triangle interpolation	Barycentric coordinates
Animation	Matrix products and weighted sums
Image	Matrix of pixels

This table shows why graphics is one of the most direct applications of finite-dimensional linear algebra.

116.27 Numerical Issues

Computer graphics uses floating point arithmetic.

This creates numerical concerns.

Repeated transformations can accumulate error. Rotation matrices may slowly lose orthogonality. Very small or very large depth ranges can reduce depth-buffer precision. Nearly parallel rays and planes may produce unstable intersection calculations.

Common remedies include:

Issue	Remedy
Rotation drift	Re-normalize or use quaternions
Ill-conditioned transforms	Avoid extreme scaling
Depth precision loss	Choose near and far planes carefully
Normal distortion	Use inverse-transpose normal matrix
Ray intersection instability	Use tolerances and robust tests

Graphics systems therefore depend not only on exact linear algebra but also on numerical linear algebra.

116.28 Summary

Computer graphics represents geometry with vectors and transformations with matrices.

Homogeneous coordinates allow translation, rotation, scaling, shear, and projection to be handled in a unified matrix framework. Model, view, and projection matrices form the main coordinate transformation chain. Dot products support lighting. Cross products produce normals. Barycentric coordinates support interpolation across triangles. Matrix products support animation, camera movement, and object placement.

The central principle is that images are produced by transforming geometry through coordinate systems. Linear algebra supplies the precise operations that make this possible.