# Chapter 35. Composition of Transformations

# Chapter 35. Composition of Transformations

Composition is the operation of applying one transformation after another. If

$$
T:U\to V
$$

and

$$
S:V\to W
$$

are transformations, then their composition is the transformation

$$
S\circ T:U\to W
$$

defined by

$$
(S\circ T)(u)=S(T(u)).
$$

The order matters. In \(S\circ T\), the transformation \(T\) is applied first, and \(S\) is applied second. Composition of linear transformations corresponds to multiplication of their representing matrices. If \(T(x)=Ax\) and \(S(y)=By\), then

$$
(S\circ T)(x)=B(Ax)=(BA)x.
$$

Thus the matrix of \(S\circ T\) is \(BA\). This is the standard reason matrix multiplication is defined in its usual form. It encodes composition of linear maps.

## 35.1 Definition of Composition

Let \(U,V,W\) be vector spaces over the same field \(F\). Suppose

$$
T:U\to V
$$

and

$$
S:V\to W
$$

are transformations. Their composition is

$$
S\circ T:U\to W,
$$

where

$$
(S\circ T)(u)=S(T(u)).
$$

The expression is read as “\(S\) after \(T\).” The vector \(u\) first enters \(T\). The output \(T(u)\) lies in \(V\), so it can be used as an input for \(S\). The final output \(S(T(u))\) lies in \(W\).

Composition is defined only when the codomain of the first transformation matches the domain of the second transformation, or at least lies inside it.

## 35.2 Composition of Linear Transformations

If both transformations are linear, then their composition is linear.

Let

$$
T:U\to V
$$

and

$$
S:V\to W
$$

be linear. We prove that

$$
S\circ T:U\to W
$$

is linear.

For \(u_1,u_2\in U\),

$$
(S\circ T)(u_1+u_2) =
S(T(u_1+u_2)).
$$

Since \(T\) is linear,

$$
T(u_1+u_2)=T(u_1)+T(u_2).
$$

Therefore

$$
S(T(u_1+u_2)) =
S(T(u_1)+T(u_2)).
$$

Since \(S\) is linear,

$$
S(T(u_1)+T(u_2)) =
S(T(u_1))+S(T(u_2)).
$$

Hence

$$
(S\circ T)(u_1+u_2) =
(S\circ T)(u_1)+(S\circ T)(u_2).
$$

For scalar multiplication, let \(c\in F\). Then

$$
(S\circ T)(cu) =
S(T(cu)).
$$

Since \(T\) is linear,

$$
T(cu)=cT(u).
$$

Since \(S\) is linear,

$$
S(cT(u))=cS(T(u)).
$$

Therefore

$$
(S\circ T)(cu)=c(S\circ T)(u).
$$

So \(S\circ T\) is linear.

## 35.3 Order of Application

The notation

$$
S\circ T
$$

means first apply \(T\), then apply \(S\). This order can be confusing because the symbol \(S\) appears on the left but acts second.

The reason is that functions act on what appears to their right:

$$
(S\circ T)(u)=S(T(u)).
$$

The innermost operation is done first. This is the same convention used in ordinary function composition.

For example, let

$$
T(x)=2x
$$

and

$$
S(x)=x+3.
$$

Then

$$
(S\circ T)(x)=S(2x)=2x+3,
$$

while

$$
(T\circ S)(x)=T(x+3)=2x+6.
$$

These are different functions. Composition usually depends on order.

## 35.4 Matrix Form

Let

$$
T:F^n\to F^m
$$

have matrix \(A\), and let

$$
S:F^m\to F^p
$$

have matrix \(B\). Thus

$$
T(x)=Ax
$$

and

$$
S(y)=By.
$$

Then

$$
(S\circ T)(x)=S(T(x))=S(Ax)=B(Ax).
$$

By associativity of matrix multiplication,

$$
B(Ax)=(BA)x.
$$

Therefore the matrix of \(S\circ T\) is

$$
BA.
$$

The order is important. The matrix for the first transformation appears on the right. The matrix for the second transformation appears on the left.

If

$$
A \in F^{m\times n}
$$

and

$$
B \in F^{p\times m},
$$

then

$$
BA \in F^{p\times n}.
$$

This matches the map

$$
S\circ T:F^n\to F^p.
$$

## 35.5 Example: Two Transformations in the Plane

Let

$$
T:\mathbb{R}^2\to\mathbb{R}^2
$$

be scaling by \(2\) in the \(x\)-direction and by \(3\) in the \(y\)-direction:

$$
T
\begin{bmatrix}
x\\
y
\end{bmatrix} =
\begin{bmatrix}
2x\\
3y
\end{bmatrix}.
$$

Its matrix is

$$
A=
\begin{bmatrix}
2 & 0\\
0 & 3
\end{bmatrix}.
$$

Let

$$
S:\mathbb{R}^2\to\mathbb{R}^2
$$

be the shear

$$
S
\begin{bmatrix}
x\\
y
\end{bmatrix} =
\begin{bmatrix}
x+y\\
y
\end{bmatrix}.
$$

Its matrix is

$$
B=
\begin{bmatrix}
1 & 1\\
0 & 1
\end{bmatrix}.
$$

The composition \(S\circ T\) means scale first, then shear. Its matrix is

$$
BA =
\begin{bmatrix}
1 & 1\\
0 & 1
\end{bmatrix}
\begin{bmatrix}
2 & 0\\
0 & 3
\end{bmatrix} =
\begin{bmatrix}
2 & 3\\
0 & 3
\end{bmatrix}.
$$

Thus

$$
(S\circ T)
\begin{bmatrix}
x\\
y
\end{bmatrix} =
\begin{bmatrix}
2x+3y\\
3y
\end{bmatrix}.
$$

Now reverse the order. The composition \(T\circ S\) means shear first, then scale. Its matrix is

$$
AB =
\begin{bmatrix}
2 & 0\\
0 & 3
\end{bmatrix}
\begin{bmatrix}
1 & 1\\
0 & 1
\end{bmatrix} =
\begin{bmatrix}
2 & 2\\
0 & 3
\end{bmatrix}.
$$

Thus

$$
(T\circ S)
\begin{bmatrix}
x\\
y
\end{bmatrix} =
\begin{bmatrix}
2x+2y\\
3y
\end{bmatrix}.
$$

The two results are different. Therefore

$$
S\circ T \neq T\circ S
$$

in this example.

## 35.6 Noncommutativity

Composition of transformations is generally not commutative.

This means

$$
S\circ T
$$

and

$$
T\circ S
$$

may both be defined but may produce different transformations.

The same fact appears in matrix algebra as

$$
BA \neq AB
$$

in general.

There are special cases where two transformations commute. For example, two scalings along the same coordinate axes commute. Two rotations about the origin in the plane commute. A transformation always commutes with the identity transformation. But commutativity is a special property, not the default.

Because of this, the order of transformations must be recorded carefully. In geometry, “rotate then project” may differ from “project then rotate.” In computation, “normalize then transform” may differ from “transform then normalize.”

## 35.7 Associativity of Composition

Composition is associative.

If

$$
R:W\to X,
\qquad
S:V\to W,
\qquad
T:U\to V,
$$

then

$$
R\circ(S\circ T)=(R\circ S)\circ T.
$$

Both sides send \(u\in U\) to

$$
R(S(T(u))).
$$

Thus parentheses do not affect the final transformation, provided the order of the transformations remains the same.

In matrix form, if \(T,S,R\) have matrices \(A,B,C\), respectively, then the matrix of

$$
R\circ S\circ T
$$

is

$$
CBA.
$$

Associativity of composition corresponds to associativity of matrix multiplication:

$$
C(BA)=(CB)A.
$$

This allows long chains of transformations to be grouped in convenient ways.

## 35.8 Identity Transformation

For every vector space \(V\), the identity transformation is the map

$$
I_V:V\to V
$$

defined by

$$
I_V(v)=v.
$$

It is linear because

$$
I_V(u+v)=u+v=I_V(u)+I_V(v)
$$

and

$$
I_V(cv)=cv=cI_V(v).
$$

If

$$
T:U\to V
$$

is linear, then

$$
I_V\circ T=T
$$

and

$$
T\circ I_U=T.
$$

Thus identity transformations behave like multiplicative identities under composition.

In coordinates, the identity transformation on \(F^n\) has matrix

$$
I_n=
\begin{bmatrix}
1 & 0 & \cdots & 0\\
0 & 1 & \cdots & 0\\
\vdots & \vdots & \ddots & \vdots\\
0 & 0 & \cdots & 1
\end{bmatrix}.
$$

For any compatible matrix \(A\),

$$
I_mA=A
$$

and

$$
AI_n=A.
$$

## 35.9 Inverses and Composition

A transformation \(T:V\to W\) is invertible if there exists a transformation

$$
T^{-1}:W\to V
$$

such that

$$
T^{-1}\circ T=I_V
$$

and

$$
T\circ T^{-1}=I_W.
$$

For linear transformations, an invertible linear transformation has a linear inverse.

If \(T(x)=Ax\), then \(T\) is invertible exactly when \(A\) is invertible. In that case,

$$
T^{-1}(x)=A^{-1}x.
$$

The equations

$$
T^{-1}\circ T=I
$$

and

$$
T\circ T^{-1}=I
$$

become

$$
A^{-1}A=I
$$

and

$$
AA^{-1}=I.
$$

The inverse reverses the action of the original transformation.

## 35.10 Composition and Bases

Let

$$
T:U\to V
$$

and

$$
S:V\to W
$$

be linear. Choose ordered bases \(A\) for \(U\), \(B\) for \(V\), and \(C\) for \(W\).

Then

$$
[T]_{B\leftarrow A}
$$

is the matrix of \(T\), and

$$
[S]_{C\leftarrow B}
$$

is the matrix of \(S\).

The composition

$$
S\circ T:U\to W
$$

has matrix

$$
[S\circ T]_{C\leftarrow A} =
[S]_{C\leftarrow B}
[T]_{B\leftarrow A}.
$$

This formula is the coordinate version of composition.

The middle basis \(B\) must match. The output coordinates of \(T\) are expressed in basis \(B\), and these must be valid input coordinates for \(S\). If the bases do not match, a change-of-coordinates matrix is needed.

## 35.11 Example with Abstract Bases

Let \(P_2\) be the vector space of polynomials of degree at most \(2\), and let \(P_1\) be the vector space of polynomials of degree at most \(1\).

Define

$$
D:P_2\to P_1
$$

by differentiation:

$$
D(p)=p'.
$$

Define

$$
E:P_1\to \mathbb{R}
$$

by evaluation at \(1\):

$$
E(q)=q(1).
$$

Both transformations are linear.

Use the bases

$$
B=(1,x,x^2)
$$

for \(P_2\),

$$
C=(1,x)
$$

for \(P_1\),

and

$$
D_0=(1)
$$

for \(\mathbb{R}\).

The matrix of differentiation is

$$
[D]_{C\leftarrow B} =
\begin{bmatrix}
0 & 1 & 0\\
0 & 0 & 2
\end{bmatrix}.
$$

The matrix of evaluation at \(1\) is

$$
[E]_{D_0\leftarrow C} =
\begin{bmatrix}
1 & 1
\end{bmatrix},
$$

since

$$
E(a+bx)=a+b.
$$

The composition

$$
E\circ D:P_2\to\mathbb{R}
$$

sends a polynomial to the value of its derivative at \(1\):

$$
(E\circ D)(p)=p'(1).
$$

Its matrix is

$$
[E\circ D]_{D_0\leftarrow B} =
[E]_{D_0\leftarrow C}
[D]_{C\leftarrow B}.
$$

Compute:

$$
\begin{bmatrix}
1 & 1
\end{bmatrix}
\begin{bmatrix}
0 & 1 & 0\\
0 & 0 & 2
\end{bmatrix} =
\begin{bmatrix}
0 & 1 & 2
\end{bmatrix}.
$$

Thus, for

$$
p(x)=a+bx+cx^2,
$$

we have

$$
(E\circ D)(p)=b+2c.
$$

This agrees with direct computation:

$$
p'(x)=b+2cx,
$$

so

$$
p'(1)=b+2c.
$$

## 35.12 Powers of a Linear Operator

When a linear transformation maps a vector space to itself,

$$
T:V\to V,
$$

we may compose it with itself.

Define

$$
T^2=T\circ T,
$$

$$
T^3=T\circ T\circ T,
$$

and in general,

$$
T^k=\underbrace{T\circ T\circ \cdots \circ T}_{k\text{ times}}.
$$

Also define

$$
T^0=I_V.
$$

If \(T\) has matrix \(A\), then \(T^k\) has matrix

$$
A^k.
$$

Powers of transformations occur in discrete dynamics, Markov chains, recurrence relations, graph walks, and iterative numerical methods.

For example, if

$$
x_{k+1}=Ax_k,
$$

then

$$
x_k=A^kx_0.
$$

The behavior of the system depends on the powers of \(A\).

## 35.13 Nilpotent Transformations

A linear operator \(T:V\to V\) is nilpotent if some positive power of \(T\) is the zero transformation:

$$
T^k=0
$$

for some \(k\geq 1\).

For example, define

$$
T:\mathbb{R}^3\to\mathbb{R}^3
$$

by

$$
T
\begin{bmatrix}
x\\
y\\
z
\end{bmatrix} =
\begin{bmatrix}
y\\
z\\
0
\end{bmatrix}.
$$

Then

$$
T^2
\begin{bmatrix}
x\\
y\\
z
\end{bmatrix} =
T
\begin{bmatrix}
y\\
z\\
0
\end{bmatrix} =
\begin{bmatrix}
z\\
0\\
0
\end{bmatrix},
$$

and

$$
T^3
\begin{bmatrix}
x\\
y\\
z
\end{bmatrix} =
T
\begin{bmatrix}
z\\
0\\
0
\end{bmatrix} =
\begin{bmatrix}
0\\
0\\
0
\end{bmatrix}.
$$

So

$$
T^3=0.
$$

Nilpotent transformations are important in canonical forms, especially Jordan form.

## 35.14 Idempotent Transformations

A linear operator \(P:V\to V\) is idempotent if

$$
P^2=P.
$$

This means applying \(P\) twice gives the same result as applying it once.

Projections are the main examples. Let

$$
P:\mathbb{R}^3\to\mathbb{R}^3
$$

be projection onto the \(xy\)-plane:

$$
P
\begin{bmatrix}
x\\
y\\
z
\end{bmatrix} =
\begin{bmatrix}
x\\
y\\
0
\end{bmatrix}.
$$

Then

$$
P^2
\begin{bmatrix}
x\\
y\\
z
\end{bmatrix} =
P
\begin{bmatrix}
x\\
y\\
0
\end{bmatrix} =
\begin{bmatrix}
x\\
y\\
0
\end{bmatrix}.
$$

Thus

$$
P^2=P.
$$

Once a vector has been projected onto the plane, projecting it again changes nothing.

## 35.15 Commuting Transformations

Two linear operators

$$
S,T:V\to V
$$

commute if

$$
S\circ T=T\circ S.
$$

In matrix form, if \(S\) has matrix \(B\) and \(T\) has matrix \(A\), this condition becomes

$$
BA=AB.
$$

Commuting transformations can often be studied together. For example, if two diagonal matrices have the same size, they commute. If two transformations are diagonal with respect to the same basis, then they commute.

However, most pairs of matrices do not commute. The failure of commutativity is a structural feature of matrix algebra.

## 35.16 Composition and Kernel

Composition affects kernels in a predictable way.

Let

$$
T:U\to V
$$

and

$$
S:V\to W
$$

be linear. If \(u\in\ker(T)\), then

$$
T(u)=0.
$$

Therefore

$$
(S\circ T)(u)=S(T(u))=S(0)=0.
$$

So

$$
\ker(T)\subseteq \ker(S\circ T).
$$

The composition may have a larger kernel than \(T\). This happens when \(T(u)\) is nonzero but lies in \(\ker(S)\).

More precisely,

$$
\ker(S\circ T)=\{u\in U:T(u)\in\ker(S)\}.
$$

Thus the kernel of a composition consists of all vectors that \(T\) sends into the kernel of \(S\).

## 35.17 Composition and Image

The image of a composition is contained in the image of the second transformation.

For

$$
S\circ T:U\to W,
$$

every output has the form

$$
S(T(u)).
$$

Since \(T(u)\in V\), every such output lies in

$$
\operatorname{im}(S).
$$

Therefore

$$
\operatorname{im}(S\circ T)\subseteq \operatorname{im}(S).
$$

Also,

$$
\operatorname{im}(S\circ T)=S(\operatorname{im}(T)).
$$

This means that the image of the composition is obtained by first taking the image of \(T\), then applying \(S\) to that subspace.

Consequently,

$$
\operatorname{rank}(S\circ T)\leq \operatorname{rank}(S)
$$

and

$$
\operatorname{rank}(S\circ T)\leq \operatorname{rank}(T).
$$

Composition cannot have rank larger than either factor.

## 35.18 Invertible Factors

If \(T:U\to V\) and \(S:V\to W\) are both isomorphisms, then

$$
S\circ T:U\to W
$$

is also an isomorphism.

Its inverse is

$$
(S\circ T)^{-1}=T^{-1}\circ S^{-1}.
$$

The order reverses. To undo \(S\circ T\), one first undoes \(S\), then undoes \(T\).

In matrix form, if \(A\) and \(B\) are invertible, then

$$
(BA)^{-1}=A^{-1}B^{-1}.
$$

Again, the order reverses.

## 35.19 Composition in Geometry

Many geometric operations are naturally described as compositions.

In the plane, a rigid motion fixing the origin may be built from rotations and reflections. A general linear deformation may be built from shears, scalings, rotations, and projections. In three dimensions, transformations used in graphics and robotics are often long products of elementary transformations.

For example, suppose a transformation first scales, then rotates. If the scaling matrix is \(A\) and the rotation matrix is \(R\), then the total matrix is

$$
RA.
$$

The rightmost matrix acts first.

This convention is essential in applications. Reversing the order usually gives a different geometric result.

## 35.20 Composition in Computation

In numerical computation, composition appears whenever data passes through a sequence of linear stages.

A signal may be filtered, transformed, compressed, and reconstructed. A vector may be projected into a subspace, multiplied by a smaller matrix, and mapped back. An iterative method may repeatedly apply the same operator.

If each stage is linear, then the entire pipeline is linear. Its total matrix is the product of the matrices for the stages, in the reverse order of application.

This gives a compact representation of a chain of linear operations. It also allows analysis of stability, rank, null spaces, eigenvalues, and error propagation.

## 35.21 Summary

Composition applies transformations in sequence. If

$$
T:U\to V
$$

and

$$
S:V\to W,
$$

then

$$
(S\circ T)(u)=S(T(u)).
$$

If \(S\) and \(T\) are linear, then \(S\circ T\) is linear.

For matrix transformations,

$$
T(x)=Ax,
\qquad
S(y)=By,
$$

the composition has matrix

$$
BA.
$$

Thus

$$
(S\circ T)(x)=(BA)x.
$$

Composition is associative, but it is generally not commutative. The identity transformation acts as the neutral element, and invertible transformations have inverses under composition.

Kernels and images behave naturally under composition:

$$
\ker(T)\subseteq \ker(S\circ T),
$$

and

$$
\operatorname{im}(S\circ T)\subseteq \operatorname{im}(S).
$$

Composition is therefore not a secondary operation. It explains why matrix multiplication has its form and why linear transformations form an algebraic system under chaining.