Skip to content

8.2 Equivalent Formulations

Equivalent forms of the axiom of choice, including Zorn's lemma, the well ordering theorem, maximal principles, and right inverses of surjections.

The axiom of choice has many equivalent forms. Some of these forms speak about selecting elements from sets, some speak about ordering arbitrary sets, and some speak about finding maximal objects in partially ordered sets. Although these statements look different, they express the same underlying principle over the usual axioms of ZF set theory.

The purpose of this section is to explain the most important equivalent formulations in detail. The main examples are the product form of choice, the statement that every surjection has a right inverse, the well ordering theorem, Zorn’s lemma, and the Hausdorff maximal principle.

The Product Form of Choice

The axiom of choice says that every family of nonempty sets has a choice function. This can be restated in terms of products.

Let (Ai)iI(A_i)_{i \in I} be an indexed family of sets. The product of this family is:

iIAi={f:IiIAi:f(i)Ai for every iI} \prod_{i \in I} A_i = \{f : I \to \bigcup_{i \in I} A_i : f(i) \in A_i \text{ for every } i \in I\}

An element of this product is exactly a choice function. It chooses one element f(i)f(i) from each set AiA_i.

Proposition 8.21 (Product Form)

The axiom of choice is equivalent to the statement that whenever (Ai)iI(A_i)_{i \in I} is a family of nonempty sets, the product:

iIAi \prod_{i \in I} A_i

is nonempty.

Proof

Assume the axiom of choice. Let (Ai)iI(A_i)_{i \in I} be a family of nonempty sets. By the axiom of choice, there exists a function ff such that:

f(i)Ai f(i) \in A_i

for every iIi \in I. By the definition of the product, this function ff is an element of:

iIAi \prod_{i \in I} A_i

Therefore the product is nonempty.

Conversely, assume that every product of nonempty sets is nonempty. Let (Ai)iI(A_i)_{i \in I} be any family of nonempty sets. By assumption:

iIAi \prod_{i \in I} A_i \ne \varnothing

so there exists:

fiIAi f \in \prod_{i \in I} A_i

By the definition of product, this means:

f(i)Ai f(i) \in A_i

for every iIi \in I. Thus ff is a choice function for the family, and the axiom of choice follows.

This formulation is often the most compact way to state choice. It says that an arbitrary product of nonempty sets is nonempty.

Right Inverses of Surjections

Another useful form of choice is stated in terms of surjective functions. A surjective function sends at least one element of its domain to every element of its codomain. A right inverse chooses one preimage for each point in the codomain.

Definition 8.22 (Right Inverse)

Let:

f:XY f : X \to Y

be a function. A right inverse of ff is a function:

g:YX g : Y \to X

such that:

f(g(y))=y f(g(y)) = y

for every yYy \in Y.

Equivalently:

fg=idY f \circ g = \mathrm{id}_Y

The function gg chooses, for each yYy \in Y, one element xXx \in X such that f(x)=yf(x)=y.

Proposition 8.23

The axiom of choice is equivalent to the statement that every surjective function has a right inverse.

Proof

Assume the axiom of choice. Let:

f:XY f : X \to Y

be a surjective function.

For each yYy \in Y, define the fiber over yy:

Xy={xX:f(x)=y} X_y = \{x \in X : f(x)=y\}

Since ff is surjective, each fiber is nonempty:

Xy X_y \ne \varnothing

for every yYy \in Y.

The family:

(Xy)yY (X_y)_{y \in Y}

is therefore a family of nonempty sets. By the axiom of choice, there exists a choice function:

g:YyYXy g : Y \to \bigcup_{y \in Y} X_y

such that:

g(y)Xy g(y) \in X_y

for every yYy \in Y.

Since g(y)Xyg(y) \in X_y, the definition of XyX_y gives:

f(g(y))=y f(g(y))=y

for every yYy \in Y. Hence:

fg=idY f \circ g = \mathrm{id}_Y

and gg is a right inverse of ff.

Conversely, assume every surjective function has a right inverse. Let:

(Ai)iI (A_i)_{i \in I}

be a family of nonempty sets. We must construct a choice function.

Define:

X={(i,a):iI and aAi} X = \{(i,a) : i \in I \text{ and } a \in A_i\}

Define:

π:XI \pi : X \to I

by:

π(i,a)=i \pi(i,a)=i

We show that π\pi is surjective. Let iIi \in I. Since AiA_i is nonempty, choose an element:

aAi a \in A_i

Then:

(i,a)X (i,a) \in X

and:

π(i,a)=i \pi(i,a)=i

Therefore every iIi \in I lies in the image of π\pi, so π\pi is surjective.

By assumption, π\pi has a right inverse:

s:IX s : I \to X

such that:

π(s(i))=i \pi(s(i))=i

for every iIi \in I.

Since s(i)Xs(i) \in X, there exist jIj \in I and aAja \in A_j such that:

s(i)=(j,a) s(i)=(j,a)

But:

π(s(i))=i \pi(s(i))=i

and:

π(j,a)=j \pi(j,a)=j

Thus:

j=i j=i

Therefore, for each iIi \in I, the element s(i)s(i) has the form:

s(i)=(i,ai) s(i)=(i,a_i)

for some:

aiAi a_i \in A_i

Define:

c(i)=ai c(i)=a_i

Then:

c(i)Ai c(i) \in A_i

for every iIi \in I, so cc is a choice function. This proves the axiom of choice.

Well Ordered Sets

The well ordering theorem is another equivalent form of choice. It says that every set can be ordered so strongly that every nonempty subset has a first element.

Definition 8.24 (Total Order)

A total order on a set XX is a relation \le such that, for all x,y,zXx,y,z \in X:

  1. xxx \le x.

  2. If xyx \le y and yxy \le x, then x=yx=y.

  3. If xyx \le y and yzy \le z, then xzx \le z.

  4. Either xyx \le y or yxy \le x.

The first three conditions say that \le is a partial order. The fourth condition says that any two elements are comparable.

Definition 8.25 (Well Order)

A well order on a set XX is a total order \le such that every nonempty subset of XX has a least element.

That means that if:

YX Y \subseteq X

and:

Y Y \ne \varnothing

then there exists y0Yy_0 \in Y such that:

y0y y_0 \le y

for every yYy \in Y.

The usual order on N\mathbb{N} is a well order. Every nonempty subset of N\mathbb{N} has a least element.

The usual order on Z\mathbb{Z} is not a well order, because Z\mathbb{Z} itself has no least element.

The usual order on R\mathbb{R} is not a well order, because the interval:

(0,1) (0,1)

has no least element.

Theorem 8.26 (Well Ordering Theorem)

Every set can be well ordered.

This means that for every set XX, there exists some relation \le on XX such that (X,)(X,\le) is a well ordered set.

The theorem does not say that the well order is natural. It only says that some well order exists.

Proposition 8.27

The axiom of choice implies the well ordering theorem.

Proof

Assume the axiom of choice. Let XX be a set. If:

X= X=\varnothing

then the empty relation is a well order on XX, so suppose that XX is nonempty.

By the axiom of choice, there exists a choice function on the family of all nonempty subsets of XX. Thus there is a function:

c:P(X){}X c : \mathcal{P}(X) \setminus \{\varnothing\} \to X

such that:

c(A)A c(A) \in A

for every nonempty subset AXA \subseteq X.

We now build an enumeration of the elements of XX by transfinite recursion. At each stage, if there are elements not yet chosen, we choose one of them.

Suppose elements xβx_\beta have already been chosen for all β<α\beta < \alpha. Let:

Rα=X{xβ:β<α} R_\alpha = X \setminus \{x_\beta : \beta < \alpha\}

If:

Rα R_\alpha \ne \varnothing

define:

xα=c(Rα) x_\alpha = c(R_\alpha)

Thus xαx_\alpha is chosen from the remaining elements of XX.

This construction cannot continue through all ordinals while producing distinct elements of XX, because XX is a set and the class of all ordinals is not a set. Hence there is some ordinal γ\gamma such that:

Rγ= R_\gamma = \varnothing

Then:

X={xα:α<γ} X = \{x_\alpha : \alpha < \gamma\}

Define an order on XX by:

xα<xβ x_\alpha < x_\beta

if and only if:

α<β \alpha < \beta

This order is total because the ordinals below γ\gamma are totally ordered.

It is a well order because every nonempty subset YXY \subseteq X corresponds to a nonempty set of ordinals:

SY={α<γ:xαY} S_Y = \{\alpha < \gamma : x_\alpha \in Y\}

Every nonempty set of ordinals has a least element. Let:

α0=minSY \alpha_0 = \min S_Y

Then xα0x_{\alpha_0} is the least element of YY under the order just defined.

Therefore XX can be well ordered.

Proposition 8.28

The well ordering theorem implies the axiom of choice.

Proof

Assume every set can be well ordered. Let:

(Ai)iI (A_i)_{i \in I}

be a family of nonempty sets. We must construct a choice function.

Let:

U=iIAi U = \bigcup_{i \in I} A_i

By the well ordering theorem, there exists a well order \le on UU.

For each iIi \in I, the set AiA_i is a nonempty subset of UU. Since \le is a well order, AiA_i has a least element.

Define:

f(i)=the least element of Ai with respect to  f(i)=\text{the least element of } A_i \text{ with respect to } \le

Then:

f(i)Ai f(i) \in A_i

for every iIi \in I.

Thus ff is a choice function for the family, so the axiom of choice holds.

Corollary 8.29

The axiom of choice is equivalent to the well ordering theorem.

Proof

Proposition 8.27 proves that the axiom of choice implies the well ordering theorem. Proposition 8.28 proves that the well ordering theorem implies the axiom of choice. Therefore the two statements are equivalent.

Partially Ordered Sets

Zorn’s lemma is stated using partially ordered sets. Before stating it, we recall the relevant definitions.

Definition 8.30 (Partial Order)

A partial order on a set PP is a relation \le satisfying reflexivity, antisymmetry, and transitivity.

Thus, for all x,y,zPx,y,z \in P:

  1. xxx \le x.

  2. If xyx \le y and yxy \le x, then x=yx=y.

  3. If xyx \le y and yzy \le z, then xzx \le z.

A set equipped with a partial order is called a partially ordered set, or poset.

Unlike a total order, a partial order does not require every two elements to be comparable.

Definition 8.31 (Chain)

Let (P,)(P,\le) be a partially ordered set. A chain in PP is a subset CPC \subseteq P such that for all x,yCx,y \in C, either:

xy x \le y

or:

yx y \le x

Thus a chain is a subset on which the partial order becomes a total order.

Definition 8.32 (Upper Bound)

Let (P,)(P,\le) be a partially ordered set, and let CPC \subseteq P. An upper bound for CC is an element uPu \in P such that:

xu x \le u

for every xCx \in C.

The upper bound does not need to belong to CC. It only needs to lie above every element of CC.

Definition 8.33 (Maximal Element)

Let (P,)(P,\le) be a partially ordered set. An element mPm \in P is maximal if:

mx m \le x

implies:

x=m x=m

for every xPx \in P.

Equivalently, there is no element xPx \in P such that:

m<x m < x

A maximal element is not necessarily above every element of PP. It simply cannot be extended upward inside the order.

Lemma 8.34 (Greatest Elements and Maximal Elements)

Every greatest element is maximal, but a maximal element need not be greatest.

Proof

Suppose gg is a greatest element of PP. Then:

xg x \le g

for every xPx \in P.

If:

gy g \le y

then since gg is greatest, we also have:

yg y \le g

By antisymmetry:

y=g y=g

Therefore gg is maximal.

For the converse, consider the set:

P={a,b} P=\{a,b\}

with the partial order given only by:

aa a \le a

and:

bb b \le b

Then both aa and bb are maximal, because neither has a strictly larger element above it. However, there is no greatest element, since aa and bb are incomparable.

Zorn’s Lemma

Zorn’s lemma is one of the most useful equivalent forms of the axiom of choice. It is used to prove the existence of maximal objects.

Theorem 8.35 (Zorn’s Lemma)

Let (P,)(P,\le) be a nonempty partially ordered set. Suppose that every chain in PP has an upper bound in PP. Then PP has a maximal element.

The hypothesis says that every totally ordered subcollection can be extended upward to some element of the poset. The conclusion says that there exists an object that cannot be enlarged further.

Proposition 8.36

The well ordering theorem implies Zorn’s lemma.

Proof

Assume the well ordering theorem. Let (P,)(P,\le) be a nonempty partially ordered set in which every chain has an upper bound.

By the well ordering theorem, choose a well order \preceq of the underlying set PP. This well order is not required to agree with the partial order \le. It is used only to make canonical choices.

We construct a chain CPC \subseteq P by adding elements whenever possible.

Given a chain CC, say that an element xPx \in P is an extension of CC if:

C{x} C \cup \{x\}

is still a chain and xx is not already in CC.

If extensions exist, choose the \preceq least extension and add it. If no extensions exist, stop.

More formally, one performs this construction by transfinite recursion, using the well order \preceq to choose the least possible next element whenever one exists.

The construction must eventually stop, because it cannot add distinct elements of the set PP through all ordinals. Let CC be the chain obtained when no further extension is possible.

We claim that CC has an upper bound uPu \in P. This follows from the hypothesis, since every chain in PP has an upper bound.

We now show that uu is maximal. Suppose:

u<v u < v

for some vPv \in P.

Since uu is an upper bound for CC, every xCx \in C satisfies:

xu x \le u

Together with:

u<v u < v

this gives:

xv x \le v

for every xCx \in C.

Thus:

C{v} C \cup \{v\}

is a chain, and vv is an extension of CC unless vCv \in C.

But if vCv \in C, then because uu is an upper bound for CC, we have:

vu v \le u

which contradicts:

u<v u < v

Therefore vv is a genuine extension of CC, contradicting the fact that the construction stopped.

Hence no such vv exists, and uu is maximal.

Proposition 8.37

Zorn’s lemma implies the axiom of choice.

Proof

Assume Zorn’s lemma. Let:

(Ai)iI (A_i)_{i \in I}

be a family of nonempty sets. We must construct a choice function.

Let PP be the set of all partial choice functions. An element of PP is a function ff such that:

  1. dom(f)I\operatorname{dom}(f) \subseteq I.

  2. f(i)Aif(i) \in A_i for every idom(f)i \in \operatorname{dom}(f).

Order PP by extension:

fg f \le g

if and only if:

dom(f)dom(g) \operatorname{dom}(f) \subseteq \operatorname{dom}(g)

and:

g(i)=f(i) g(i)=f(i)

for every idom(f)i \in \operatorname{dom}(f).

This means that gg extends ff.

The set PP is nonempty, since the empty function belongs to PP.

Let CC be a chain in PP. Define:

h=C h = \bigcup C

We show that hh is a partial choice function. Since CC is linearly ordered by extension, any two functions in CC agree on their common domain. Therefore the union hh is a function.

If idom(h)i \in \operatorname{dom}(h), then idom(f)i \in \operatorname{dom}(f) for some fCf \in C. Since ff is a partial choice function:

h(i)=f(i)Ai h(i)=f(i) \in A_i

Thus hPh \in P.

Also, every fCf \in C is extended by hh, so hh is an upper bound for CC in PP.

By Zorn’s lemma, PP has a maximal element mm.

We claim that:

dom(m)=I \operatorname{dom}(m)=I

Suppose not. Then there exists:

i0Idom(m) i_0 \in I \setminus \operatorname{dom}(m)

Since Ai0A_{i_0} is nonempty, choose:

a0Ai0 a_0 \in A_{i_0}

Define:

m=m{(i0,a0)} m' = m \cup \{(i_0,a_0)\}

Then mm' is a partial choice function extending mm, and it is strictly larger than mm. This contradicts the maximality of mm.

Therefore:

dom(m)=I \operatorname{dom}(m)=I

Hence mm is a choice function for the whole family. This proves the axiom of choice.

Corollary 8.38

The axiom of choice, the well ordering theorem, and Zorn’s lemma are equivalent.

Proof

The axiom of choice implies the well ordering theorem by Proposition 8.27. The well ordering theorem implies Zorn’s lemma by Proposition 8.36. Zorn’s lemma implies the axiom of choice by Proposition 8.37. Therefore all three statements are equivalent.

Hausdorff Maximal Principle

The Hausdorff maximal principle is another maximal principle equivalent to the axiom of choice. It says that every partially ordered set contains a maximal chain.

Theorem 8.39 (Hausdorff Maximal Principle)

Every partially ordered set has a maximal chain.

A maximal chain is a chain that cannot be enlarged by adding more elements while remaining a chain.

Proposition 8.40

Zorn’s lemma implies the Hausdorff maximal principle.

Proof

Let (P,)(P,\le) be a partially ordered set. Let C\mathcal{C} be the set of all chains in PP, ordered by inclusion.

Thus:

CD C \le D

if and only if:

CD C \subseteq D

We apply Zorn’s lemma to C\mathcal{C}.

Let D\mathcal{D} be a chain in C\mathcal{C}. This means that D\mathcal{D} is a collection of chains in PP, and any two members of D\mathcal{D} are comparable by inclusion.

Define:

U=CDC U = \bigcup_{C \in \mathcal{D}} C

We show that UU is a chain in PP.

Let x,yUx,y \in U. Then there exist chains C,DDC,D \in \mathcal{D} such that:

xC x \in C

and:

yD y \in D

Since D\mathcal{D} is a chain under inclusion, either:

CD C \subseteq D

or:

DC D \subseteq C

If CDC \subseteq D, then both xx and yy are in DD, and since DD is a chain, xx and yy are comparable.

If DCD \subseteq C, the same argument applies using CC.

Thus UU is a chain in PP.

Also:

CU C \subseteq U

for every CDC \in \mathcal{D}, so UU is an upper bound for D\mathcal{D} in C\mathcal{C}.

By Zorn’s lemma, C\mathcal{C} has a maximal element. Such an element is exactly a maximal chain in PP.

Proposition 8.41

The Hausdorff maximal principle implies Zorn’s lemma.

Proof

Assume the Hausdorff maximal principle. Let (P,)(P,\le) be a nonempty partially ordered set in which every chain has an upper bound.

By the Hausdorff maximal principle, there exists a maximal chain:

CP C \subseteq P

By hypothesis, this chain has an upper bound:

uP u \in P

Thus:

xu x \le u

for every xCx \in C.

We claim that uu is maximal in PP.

Suppose not. Then there exists vPv \in P such that:

u<v u < v

For every xCx \in C, we have:

xu<v x \le u < v

Hence:

xv x \le v

for every xCx \in C.

Therefore:

C{v} C \cup \{v\}

is a chain.

Moreover, vCv \notin C. If vCv \in C, then since uu is an upper bound of CC, we would have:

vu v \le u

which contradicts:

u<v u < v

Thus C{v}C \cup \{v\} is a strictly larger chain than CC, contradicting the maximality of CC.

Therefore no such vv exists, and uu is maximal. Hence Zorn’s lemma holds.

Corollary 8.42

Zorn’s lemma and the Hausdorff maximal principle are equivalent.

Proof

Proposition 8.40 proves that Zorn’s lemma implies the Hausdorff maximal principle. Proposition 8.41 proves that the Hausdorff maximal principle implies Zorn’s lemma. Therefore the two principles are equivalent.

Maximal Objects in Algebra

Zorn’s lemma is widely used because many mathematical objects are naturally ordered by inclusion.

For example, let VV be a vector space, and consider the set of all linearly independent subsets of VV, ordered by inclusion. A maximal element in this poset is a linearly independent set that cannot be enlarged while preserving linear independence. Such a set is a basis.

The use of Zorn’s lemma follows a common pattern:

  1. Define a poset of partial objects.

  2. Show that every chain has an upper bound.

  3. Apply Zorn’s lemma to obtain a maximal object.

  4. Prove that the maximal object has the desired stronger property.

This pattern appears throughout algebra, topology, and analysis.

Proposition 8.43 (Every Vector Space Has a Basis)

Assume Zorn’s lemma. Every vector space has a basis.

Proof

Let VV be a vector space over a field FF. Let PP be the set of all linearly independent subsets of VV, ordered by inclusion.

The set PP is nonempty because:

P \varnothing \in P

Let CC be a chain in PP. Define:

U=SCS U = \bigcup_{S \in C} S

We show that UU is linearly independent.

Let:

u1,,unU u_1,\dots,u_n \in U

and suppose:

a1u1++anun=0 a_1u_1+\cdots+a_nu_n=0

For each kk, since ukUu_k \in U, there exists SkCS_k \in C such that:

ukSk u_k \in S_k

Because CC is a chain under inclusion, among the finitely many sets:

S1,,Sn S_1,\dots,S_n

there is one that contains all the others. Call it SmS_m.

Then:

u1,,unSm u_1,\dots,u_n \in S_m

Since SmS_m is linearly independent, we have:

a1==an=0 a_1=\cdots=a_n=0

Therefore UU is linearly independent.

Thus UPU \in P, and UU is an upper bound for the chain CC.

By Zorn’s lemma, PP has a maximal element BB. Then BB is a maximal linearly independent subset of VV.

We show that BB spans VV. Suppose not. Then there exists:

vV v \in V

such that:

vspan(B) v \notin \operatorname{span}(B)

Then:

B{v} B \cup \{v\}

is linearly independent. Indeed, if:

av+b1b1++bnbn=0 a v + b_1b_1' + \cdots + b_nb_n' = 0

is a finite linear relation with b1,,bnBb_1',\dots,b_n' \in B, then if a0a \ne 0, we could solve for vv as a linear combination of elements of BB, contradicting vspan(B)v \notin \operatorname{span}(B). Hence a=0a=0, and then all remaining coefficients are zero because BB is linearly independent.

Thus B{v}B \cup \{v\} is a larger linearly independent set, contradicting the maximality of BB.

Therefore:

span(B)=V \operatorname{span}(B)=V

So BB is both linearly independent and spanning, hence BB is a basis of VV.

Summary of Equivalent Forms

Over ZF, the following principles are equivalent:

  1. The axiom of choice.

  2. Every product of nonempty sets is nonempty.

  3. Every surjective function has a right inverse.

  4. Every set can be well ordered.

  5. Zorn’s lemma.

  6. The Hausdorff maximal principle.

Each form is useful in a different setting. The product form is closest to the original statement of choice. The right inverse form is useful in category theoretic and function theoretic arguments. The well ordering theorem is useful for transfinite recursion and comparison of sizes. Zorn’s lemma and the Hausdorff maximal principle are useful for proving existence of maximal objects.