Sets are more fundamental than integers. We like to think that mathematics evolved from arithmetics and the need to count how many animals are in a group, or something. But first you need to identify a collection, and know that you want that specific collection to be counted. If anything, numbers evolved from the need to assign cardinality to sets.

Asaf Karagila

Ask a beginning philosophy of mathematics student why we believe the theorems of mathematics and you are likely to hear, “because we have proofs!” The more sophisticated might add that those proofs are based on true axioms, and that our rules of inference preserve truth. The next question, naturally, is why we believe the axioms, and here the response will usually be that they are “obvious”, or “self-evident”, that to deny them is “to contradict oneself” or “to commit a crime against the intellect”. Again, the more sophisticated might prefer to say that the axioms are “laws of logic” or “implicit definitions” or “conceptual truths” or some such thing.

Unfortunately, heartwarming answers along these lines are no longer tenable (if they ever were). On the one hand, assumptions once thought to be self-evident have turned out to be debatable, like the law of the excluded middle, or outright false, like the idea that every property determines a set. Conversely, the axiomatization of set theory has led to the consideration of axiom candidates that no one finds obvious, not even their staunchest supporters. In such cases, we find the methodology has more in common with the natural scientist’s hypothesis formation and testing than the caricature of the mathematician writing down a few obvious truths and proceeding to draw logical consequences.

Penelope Maddy, *Believing the Axioms*

This time we’re going to tackle something very exciting: ZFC set theory. There is no consensus on what exactly serves as the best foundations for mathematics, and there are a bunch of debates between set theorists, type theorists, intuitionists, and category theorists about this. But one thing that nobody doubts is that the concept of a set is *extremely* fundamental, and naturally pops up everywhere in math. And whether or not ZFC set theory is the *best* foundations for mathematics, it certainly serves as a solid foundation for a massive amount of it.

We start with the following question: What is a set? Intuitively, we all have a concept of a set which is something like “a set is a well-defined collection of objects.” It turns out that precisely formalizing this, successfully designing an axiom set that captures our intuitive concept of a set without giving rise to paradoxes, is no small feat. Many decades of many of our smartest people working on this problem have culminated in today’s standard formulation: Zermelo-Fraenkel set theory. We’ll go through this formulation axiom by axiom, motivating each one as we go. But before we get to the axioms, a few notes are in order.

First of all, we need to specify our language. We’re going to try to formulate our theory of sets in first-order logic, so to specify our language we’ll need to fill in the constants, predicates, and functions in the language. Amazingly, it’ll turn out that all we need to manually insert into the language is a single binary predicate: ∈. ∈(x,y) will commonly written as x ∈ y, and should be read as “x is an element of y”. All the other set relationships that we care about will be definable in terms of ∈ and the standard symbols of first order logic.

Second important note: our theory of sets will contain *only* sets and nothing else. This is a little weird; we ordinarily think of sets as being like cardboard boxes that can contain other types of objects besides other sets. But sets for us will be, in the words of Scott Aaronson, “like a bunch of cardboard boxes that you open only to find *more* cardboard boxes, and so on all the way down.” We could make a two-sorted first-order theory, where some of the objects are sets and others are things-that-are-contained-in-sets (sometimes called *urelements*), but this would be more complicated than what we need.

So! These two clarifications aside, we are ready to dive into our theory of sets. We will be designing a first-order theory with a single binary predicate ∈, and where all the objects in any model of the theory will be sets. Our first axiom, which you’ll see in every axiomatization of set theory, will be the following:

**Axiom of Extensionality
**∀𝑥∀𝑦 (∀𝑧(𝑧 ∈ 𝑥 ↔ 𝑧 ∈ 𝑦) → 𝑥 = 𝑦)

Here’s what that says in plain English: Take any two sets 𝑥 and 𝑦. If it’s the case that any element of 𝑥 is also an element of 𝑦, and vice versa, then it must be the case that 𝑥 = 𝑦. In plainer English, two sets are the same if they have all the same elements. (The converse of this is guaranteed by the substitution rule of equality: if two sets are the same, then they must have the same elements).

This tells us that a set is *defined* by its elements. For any set 𝑥, if you were to list all of its members, you have fully specified everything there is to specify about the set. If there was something else you could say about the set that could further distinguish it, then that would mean that two sets with the same members could be distinguished somehow, which they can’t.

Among other things, this tells us that the set {𝑥, 𝑥} is the same thing as the set {𝑥}, because there’s no difference between a set containing 𝑥 “twice” and a set containing 𝑥 “once”. All you can ask of a set 𝑦 is the following: “Is it the case that 𝑥 ∈ 𝑦?”, which doesn’t distinguish between 𝑦 = {𝑥} and 𝑦 = {𝑥, 𝑥}. Notice that we’re already slightly violating the intuitive concept of a set as just a bag of things. If a set is defined by what it contains, then it seems reasonable that a set could contain two sets that have the same contents. But “two sets with the same contents” are in fact just one set, by the axiom of extensionality. So in this axiomatization, sets *cannot* contain duplicates!

Another consequence of the axiom of extensionality is that the set {𝑥, 𝑦} is the same thing as the set {𝑦, 𝑥}. A set is just an unordered “bag” of things. Both those sets contain 𝑥 as well as 𝑦, and there’s no way to ask if 𝑥 comes “first” in the bag.

On to the next axiom! At this point, we’ve got the idea that a set is defined by its members. But we aren’t yet guaranteed the existence of any particular sets. This axiom we’ll introduce now is a good starting point for narrowing down our models to those that contain familiar sets. It actually usually doesn’t appear in most standard axiomatizations of set theory, but that’s only because it can be derived from axioms that we’ll introduce later.

**Axiom of Empty Set
**∃𝑥∀𝑦 (𝑦 ∉ 𝑥)

This axiom declares the existence of a set 𝑥 that contains no sets. In other words, the empty set exists! This might seem extremely obvious, so much so that we shouldn’t even have to say it (how could there *not* be a set that contains nothing??) But if all we had was the Axiom of Extensionality, then there would be perfectly good models of our theory in which every set contained other sets. For example, we could have a model with exactly one set 𝑥 that contained itself and nothing else (i.e. a model that satisfied the equation ∀𝑥∀𝑦 (𝑥 = 𝑦 ∧ 𝑥 ∈ 𝑦)).

So the axiom of the empty set *is* necessary to constrain our set of models.

Now we are guaranteed a set with no members, but not guaranteed anything else. In particular, one model we have at this point is that there is just the empty set:

We can amend this by making the following conceptual observation: from any set 𝑥, we can form a new set 𝑦 which contains 𝑥 as an element and nothing else. In other words, you can take any set, package it up, and form a new set out of it! Let’s write this formally.

**Axiom of Superset
**∀𝑥∃𝑦∀𝑧 (𝑧 ∈ 𝑦 ↔ 𝑧 = 𝑥)

For any set 𝑥, there exists another set 𝑦, such that every element 𝑧 in 𝑦 is equal to 𝑥. Or in other words, every set has another set that contains it and only it. This is a fine axiom, but you’ll never see it in any presentation of the axioms of ZFC. This is again an issue of redundancy, which will be made clear in about 30 seconds, so just hang on.

So far we have the existence of the empty set, commonly written ∅. By our new axiom, we also have {∅}, {{∅}}, {{{∅}}}, and so on. But are we guaranteed that these are different objects in every model, or could they be different names for the same objects in some model? For instance, is it possible that ∅ = {∅}? Well, let’s check: Suppose that they are equal. Then, by the substitution property of equality, it would have to be the case that every set inside ∅ is also inside {∅}. But here’s an element of {∅} that isn’t inside ∅: ∅ itself! The empty set contains nothing, whereas the set of the empty set contains something, namely, the empty set! So ∅ and {∅} must be distinct. Similar arguments can be extended to demonstrate that this entire infinite chain of constructions ∅, {∅}, {{∅}}, {{{∅}}}, and so on has no duplicates. So we’ve just ruled out all finite models!

On to the next axiom. Take a look at our collection of “guaranteed” sets and think about what’s missing.

∅, {∅}, {{∅}}, {{{∅}}}, {{{{∅}}}}, …

One thing that’s obviously missing is sets consisting of *pairs* of any two of these elements. For example, we don’t ever get {∅, {∅}} or {∅, {{∅}}}.

And of course, conceptually, we should be able to take any *two* sets and form a new set composed of just those two. This will be our next axiom.

**Axiom of Pairing
**∀𝑥∀𝑦∃𝑧∀𝑤 (𝑤 ∈

*𝑧*

*↔ (𝑤 = 𝑥 ∨ 𝑤 = 𝑦))*

For any two sets x and y, there exists another set z, such that any w is an element of z if and only if it’s either x or y. In other words, z contains just x and y and nothing else.

Take a look at what happens when we pair a set 𝑥 with itself. We just get back {𝑥}! Now we see why the Axiom of Superset was unnecessary: it was just a special case of the Axiom of Pairing.

Alright, now with this new axiom our collection of sets that must exist in every model is becoming pretty complicated. We have those same ones as before:

∅, {∅}, {{∅}}, {{{∅}}}, {{{{∅}}}}, …

But also, now:

{∅, {∅}}, {∅, {{∅}}), {{∅}, {{∅}}), …

Taking pairs and pairs of pairs and so on, we can get arbitrarily complicated sets. So *now* do we have everything that we want to guarantee exist as sets? I want you to think about if there are any sets we can form out of ∅ that we don’t have yet.

(…)

How about {∅, {∅}, {{∅}}}? Interestingly, we can’t yet get this. We could try pairing {∅, {∅}} with {{∅}}, but that would get us {{∅, {∅}}, {{∅}}}, {∅, {∅}, {{∅}}}. Essentially, the issue is that any application of the pairing axiom always results in a set with *at most* two members (even though each of those members might contain more sets). You can never get a set with three or more elements!

The immediate thought might be to add a *tripling* axiom to complement our pairing axiom. But then we’re just kicking the can down the road, moving the problem one level higher. We’d need a quadrupling axiom as well, as well as a quintupling, and so on forever. That’s not really elegant… is there a way that we can get all of these axioms in one step? Well, the obvious way to do this (quantifying over the n in “n-tupling”) isn’t really accessible to us without second-order logic. So we have to be cleverer.

I encourage you to think about this for a minute. Can you think of a way to guarantee the existence of sets with any numbers of members like {∅, {∅}, {{∅}}} and {∅, {∅}, {{∅}}, {{{∅}}}}?

(…)

It turns out that there is! We just have to take a little bit of an indirect approach. Here’s the axiom that will do the trick:

**Axiom of Union
**∀𝑥∀𝑦∃𝑧∀𝑤 (𝑤 ∈

*𝑧*

*↔ (𝑤 ∈ 𝑥 ∨ 𝑤 ∈ 𝑦))*

In English: Take any two sets 𝑥 and 𝑦. There exists a new set 𝑧 consisting of everything that’s in either 𝑥 or 𝑦. In other words, the *union* of 𝑥 and 𝑦 exists! We write this as 𝑥 ∪ 𝑦. (Strictly speaking, the axiom of union is typically formalized as saying that the union of any set of sets exists, not just any set of two sets, but this simpler formulation will do for the moment. This other version will only become relevant when we start talking about infinity.)

Say we want to form the set {𝑥, 𝑦, 𝑧}. We first pair 𝑥 with y to get the set {𝑥, 𝑧}. Then we pair 𝑧 with itself to get {𝑧}. Now we just union {𝑥, 𝑦} with {𝑧}, and we’re there! You can easily see how this can be extended to arbitrarily complicated combinations of sets.

If you take a few minutes to try to think about everything we can get from just these two axioms, you might convince yourself that we’ve gotten anything. No matter how complicated it looks, you can take any finite set of ∅-members, and see how you can bring together its components using just pairing and union. Each of these components is simpler than the set that contained them all, so we can do the same to them. And eventually we can retrace the whole tree of pairings and unions back to ∅.

But there’s an important qualification there that’s easy to miss: we’ve gotten all the *finite* sets. Did we get the infinite ones? What about the following set?

{∅, {∅}, {{∅}}, {{{∅}}}, …}

Sure, we can construct any arbitrarily long but finite subset of this infinite set. But are we also guaranteed the whole thing?

It turns out that the answer is no. There is a perfectly consistent model of our axioms that contains no infinite sets, even though it contains arbitrarily large finite sets. There’s even a name for this collection of sets: the *hereditarily finite sets!* We can recursively define this collection by saying that ∅ is hereditarily finite, and that if a_{1}, …, a_{k} are hereditarily finite, then so is {a_{1},…,a_{k}}.

This is essentially the smallest possible model compatible with the axioms we’ve laid down so far. This model is denoted as V_{ω}, and amazingly, there exists a one-to-one correspondence between V_{ω} and the natural numbers! That’s a topic for a later time though.

For now, just note that since our current axioms allow a model in which no infinite sets exist, we cannot prove the existence of any infinite sets from them (if we could do so, then we could rule out V_{ω} as a model). So if we *want* to be able to talk about infinity, we have to manually add it in as an axiom! (It’s at this point that the finitists get off board. It’s actually pretty interesting to me that to be a finitist, you don’t necessarily have to believe that there’s a largest set. You could think that sets get unboundedly large, and just refuse to make the additional assumption that sets can be infinitely large.)

ZFC has an axiom that assures the existence of an infinite set. But it isn’t the infinite set that I reference earlier ({∅, {∅}, {{∅}}, {{{∅}}}, …}). Instead, it’s the following:

∅

{∅}

{∅, {∅}}

{∅, {∅}, {∅, {∅}}}

{∅, {∅}, {∅, {∅}}, {∅, {∅}, {∅, {∅}}}}

(…)

The coloring should help to see what’s going on here. Each set is just composed of one copy of every previous set in the sequence. And since each set contains all the previous sets, we can easily define the next set in the sequence: the set after 𝑥 is 𝑥 ∪ {𝑥}. We’ll call this the *successor to *𝑥. This sequence is known as the *Von Neumann ordinals*, and it’s the standard set-theoretic construction of the natural numbers:

0 = ∅

1 = {0} = {∅}

2 = {0, 1} = {∅, {∅}}

3 = {0, 1, 2} = {∅, {∅}, {∅, {∅}}}

4 = {0, 1, 2, 3} = {∅, {∅}, {∅, {∅}}, {∅, {∅}, {∅, {∅}}}}

(…)

*n* + 1 = {0, 1, 2, …, *n*} = *n* ∪ {*n*}

When we make this association between Von Neumann ordinals and numbers, and define a “successor set” this way, we can show that all the axioms of Peano arithmetic (translated to their set-theoretic version) hold for the set of all Von Neumann ordinals. This means that if we could guarantee the existence of the set {0, 1, 2, …} = {∅, {∅}, {∅, {∅}}, …}, then we could *prove the consistency of PA within set theory!*

Okay, so let’s write down the axiom that we need to give us an infinite set.

**Axiom of Infinity
**∃𝑥 (∅ ∈ 𝑥 ∧ ∀𝑦 (𝑦 ∈ 𝑥 → 𝑦 ∪ {𝑦} ∈ 𝑥))

(I’ve cheated a little bit here by using symbols like ∅, ∪, and {}, which aren’t technically in the language we’re provided. But for convenience, we’ll accept them as an abbreviation of the full formula, since we know how to translate back to ∈ in principle.)

This says that there exists a set that contains ∅, as well as 𝑦 ∪ {𝑦} for every 𝑦 inside it. So we’re guaranteed all the Von Neumann ordinals. But notice that this *doesn’t* tell us that *only* the Von Neumann ordinals are in this set! We could have all the Von Neumann ordinals, plus a bunch of other objects, as long as each of these objects has its “successor” in the set. In a sense, this is exactly like the problem we had before with PA; try as we might, we inevitably ended up with nonstandard numbers.

The amazing thing about doing this all in terms of sets is that now we *actually can* remove all the nonstandards! The crucial difference is that when doing PA, we couldn’t quantify over sets of numbers, as that would require second-order logic. But in the context of set theory, *everything *is a set, including numbers. So talking about all sets of numbers is totally do-able in first order!

We want to pare down our set somehow, by ruling out all the other possible sets in our infinite set. But how can we do this? So far, all of our axioms have been about creating new sets, or generating larger sets from existing ones. We don’t have anything that allows us to start with a set that’s too large and then remove some elements to get a smaller set.

Let’s design a new axiom to accomplish this goal. For any set 𝑥 that we have, we should be able to form a new set 𝑦 consisting of only members of 𝑥 that satisfy some property 𝜑 of our choice. Which property? Any of them! If we can define a predicate 𝜑 using some first-order expression, then we should be able to take the subset of any set that we have which satisfy this predicate. Here’s that written formally:

**Axiom Schema of Comprehension/Specification**

∀𝑥∃𝑦∀𝑧 (𝑧* *∈ 𝑦 ↔ (𝑧* *∈ 𝑥 ∧ 𝜑(𝑧)))

This is an axiom *schema*, because 𝜑 is just a stand-in for any formula that we choose. We could have 𝜑(𝑥) be the formula 𝑥 = 𝑥, in which case our subset will be the same as the starting set. Or we could have 𝜑(𝑥) be the formula 𝑥 ≠ 𝑥, in which case our subset would be the empty set. (And now you see why the axiom of empty set is redundant; it’s entailed by the existence of any set and the axiom of comprehension!) Basically, for any formula you can write, there’s a special case of the axiom of comprehension. We also can now use the useful set-builder notation: 𝑦 = {𝑧 ∈ 𝑥: 𝜑(𝑧)}.

We presented this axiom as a way to narrow down our infinite set to just the Von Neumann ordinals, so let’s see how exactly that works. We start with the set that we are granted by our axiom of infinity. Then we consider the subset of this set that satisfies the following property: 𝑥 is the empty set or a successor set, and each of its elements is either zero or a successor of another of its elements. In math, we write the property 𝜑(𝑥) as:

[𝑥 = ∅ ∨ ∃𝑦 (𝑥 = 𝑦 ∪ {𝑦})] ∧ ∀𝑦 (𝑦 ∈ 𝑥 → (𝑦 = ∅ ∨ ∃𝑧 (𝑧 ∈ 𝑥 ∧ 𝑦 = 𝑧 ∪ {𝑧})))

Now the set of Von Neumann ordinals is just the set obtained from our infinite set by specifying 𝜑(𝑥)! Okay, so we can get the set of Von Neumann ordinals using our axiom of comprehension. But it should also be clear that comprehension can do a whole lot more than just this. We already saw that it gives us the axiom of empty set, but here’s a new example of something that we can do now with comprehension:

Earlier we introduced the axiom of union, and some of you that have familiarity with set theory might have also been expecting an axiom for *intersection*. The intersection of two sets is just the set of elements that are in both. But this will actually come out as a special case of comprehension! Namely, define 𝜑(𝑤,𝑦) to be the formula 𝑤 ∈ 𝑦. Then the set 𝑧 = {𝑤 ∈ 𝑥: 𝜑(𝑤,𝑦)} corresponds to the subset of 𝑥 that is also in 𝑦, i.e. the intersection of 𝑥 and 𝑦!

(Interestingly, you can’t use the axiom of comprehension to get unions. Do you see why?)

Historically, most of the mathematicians working on set theory in the early days of its axiomatization used a similar-sounding but much stronger axiom, known as *unrestricted comprehension*.

**Axiom Schema of Unrestricted Comprehension**

∃𝑥∀𝑦 (𝑦 ∈ 𝑥 ↔ 𝜑(𝑦))

This allows us to form a set out of the collection of ALL sets satisfying some property 𝜑 (not just those elements that you can find in some other existing sets). This is a much prettier and simpler axiom, and seems quite intuitive. Why *wouldn’t* we be able to just collect all the sets in any given model that satisfy a well-defined formula, and call that thing a set?

Amazingly, this very innocent-sounding axiom ends up *destroying the consistency of set theory!* This was the astonishing discovery of Bertrand Russell, known as Russell’s paradox. So why does this axiom lead us to a contradiction? Well, consider what you get when 𝜑(𝑦) is the first-order formula 𝑦 ∉ 𝑦. You get this set: 𝑥 = {𝑦: 𝑦 ∉ 𝑦}, which is the set of sets that don’t contain themselves. Now, if 𝑥 is a set, then either it’s in 𝑥 or it’s not in 𝑥. But hold on: if 𝑥 is inside 𝑥, then by the definition of 𝑥, it can’t contain itself! So it must not be inside 𝑥. But then it satisfies the condition for the sets that are in 𝑥! So since 𝜑(𝑥) is true, 𝑥 must be inside 𝑥 after all! This is a contradiction, and we are necessarily led to the conclusion that 𝑥 can’t have been a set in the first place. But 𝑥 being a set was guaranteed by unrestricted comprehension! And all we used was this and extensionality. So any theory of sets with unrestricted comprehension is inconsistent!

Needless to say, this is a big violation of our intuitive concept of sets. It turns out that you can’t just collect together all objects satisfying some well-defined property and form a set out of them! If you allow this, then you walk straight into a paradox. I think it’s fair to say that this proves that our intuitive concept of sets is actually incoherent. A mathematically consistent theory of sets must have more limitations than the way we imagine sets. In particular, all we will be allowed is *restricted* comprehension, the axiom from earlier, and we will *not* in general be allowed to form a new set by combing through the whole universe of sets for those that satisfy a certain property.

Notice, by the way, that restricted comprehension *only* gets us out of hot water if we don’t have a set that contains all sets. Otherwise, we could just do the exact same trick as before, by forming our paradoxical set as a subset of the set of all sets! So interestingly, the moment that we added restricted comprehension as an axiom, we ruled out any models that included a set that contained everything. Again, this is pretty unintuitive! Our theory of sets allows us to construct a bunch of well-defined sets, but if we collect those sets together, we get something that *cannot* be called a set, on pain of contradiction!

So, if we look at what we have now, we have all the hereditarily finite sets, as well as the set of all Von Neumann ordinals and many other infinite sets given by the axiom of comprehension (all the definable subsets of the Von Neumann ordinals). However, we’re still not done. Consider the set of all subsets of the Von Neumann ordinals. Can we construct this set somehow from our existing axioms? Cantor proved that we cannot with an ingenious argument that I would totally go through here if this post wasn’t already way too long. He showed that this set is strictly *larger* in cardinality than the set of Von Neumann ordinals (in particular, that it’s uncountably large), which entailed that we couldn’t get there by any application of the axioms we have so far.

So we need another axiom, to match our feeling that the power set of any set (the set of all its subsets) should itself be a set:

**Axiom of Power Set
**∀𝑥∃𝑦∀𝑧 (𝑧

*∈ 𝑦 ↔ 𝑧*

*⊆ 𝑥)*

I’ve cheated again by using the subset symbol ⊆. But just like before, this is only for prettiness of presentation; we can always translate back from 𝑧* *⊆ 𝑥 to ∀𝑤 (𝑤 ∈ 𝑧 → 𝑤 ∈ 𝑥) if necessary. The combination of the power set axiom and the axiom of infinity give us what’s called the *cumulative hierarchy* (strictly speaking, we’re actually missing a few pieces of the hierarchy, but we’ll patch that in a few minutes). The power set of a set is strictly larger than it in cardinality, so by saying that every power set is a set, we get an infinite chain of increasingly large infinite sets. We have sets the size of the real numbers, as well as the size of all functions from the real numbers to themselves, and so on forever. (You might now start to get a sense of why this particular first-order theory is thought to be powerful enough to serve as a playing field for almost all of mathematics!)

However, despite having guaranteed the existence of this whole enormous universe of sets, we still are not done. We’ve been slightly careless in how we’ve been talking so far; in particular, we have kind of been assuming that the only sets that exist are the ones that we are guaranteeing exist with our axioms. But we haven’t taken care to examine the possibility of models that include perverse objects that we don’t want to call sets *in addition *to our whole cumulative hierarchy of nice sets.

What do I mean by “perverse”? That’s a good question, and I don’t have a strict definition. The intuitive idea is that we might be able to construct weird nonstandard sets (like we did with numbers) that satisfy our axioms but have “un-set-like properties”, such as containing themselves or having an infinite descending membership chains.

For a simple example, we can imagine a model of our axioms so far that includes the entire cumulative hierarchy (the hereditarily finite sets, the set V_{ω}, and the chain of power sets), plus another set ζ that contains only itself and nothing else. (You can tell that this is a perverse set because of the ugly symbol I used for it.) In other words, ζ = {ζ}. For this to be consistent with our axioms, we’d have to also include all the products of pairing and unioning ζ with our nice sets, like {ζ, ∅} and {ζ, ∅, {∅}}. But once we’ve done that, it’s not clear what we can say to rule out this model using our current axioms.

This turns out to be a big problem when we try using our axiom of restricted comprehension to get the set of Von Neumann ordinals (which is *very* important for getting our set theory to be able to talk about the natural numbers). I lied a little bit earlier (sorry!) when I said that we could get this set by talking about the sets in of our infinite set that were either zero or a successor, and whose every element was either zero or a successor of another of their elements. If you think about this for a little bit, it makes sense why the Von Neumann ordinals fit this description. But that’s not the *only* thing that fits this description! We also get the set of all Von Neumann ordinals, *plus* an infinite descending chain of sets ζ = {ζ} = {{ζ}} = {{{…}}} and all their super sets! So to *actually* pin down the Von Neumann ordinals using comprehension, we need a new axiom that bans these perverse sets.

Here’s another weird thing you can get without banning perverse sets: suppose you have two sets ζ_{1} and ζ_{2}, such that ζ_{1} ≠ ζ_{2}, ζ_{1} = {ζ_{1}}, and ζ_{2} = {ζ_{2}}. Substituting in ζ_{1} for itself and ζ_{2} for itself, we have that ζ_{1} = {{{…}}} and ζ_{2} = {{{…}}}.

These two sets appear to have identical structure in terms of what elements they contain, but since ζ_{1} ≠ ζ_{2}, they contain different elements, so they aren’t in violation of the Axiom of Extensionality! It feels like sets like these are slipping by on a technicality, while still violating the *spirit* of the Axiom of Extensionality. We can think about the following axiom as the extension of the principle that sets’ identities are solely determined by their elements to these perverse cases:

**Axiom of Regularity/Foundation
**∀𝑥 (𝑥 ≠ ∅ → ∃𝑦 (𝑦 ∈ 𝑥 ∧ 𝑥 ⋂ 𝑦 = ∅))

This says that every non-empty set must contain a set that shares no elements with it. At first glance it might not be obvious why this would help, so let’s take a closer look.

This axiom is automatically true of all of the sets in our cumulative hierarchy, because they all contain the empty set ∅, so that’s good. (If it was false of any of our guaranteed sets, then we’d have an inconsistency on our hands.) And it also rules out the set ζ that contains just itself. Why? Well, ζ = {ζ}, so the only element of ζ is itself. But ζ obviously shares elements with itself, because it is non-empty. So ζ doesn’t contain any sets that it shares no elements with! Therefore ζ is not a set.

What about something like ζ = {∅, ζ}? Now that ζ contains the empty set, it’s not in *obvious* violation of the axiom of regularity. But it is in fact in violation. Here’s the argument: Start with ζ. Pair it with itself to form {ζ}. Now ask, does {ζ} contain any sets that it shares no elements with? It’s easier to see if we expand this using the definition of ζ: {ζ} = {{∅, ζ}}. The problem is that {ζ} contains just one set, ζ, and ζ and {ζ} share an element: namely, ζ itself! So in fact, even adding the empty set won’t help us. The axiom of regularity rules out *any* sets that contain themselves.

In fact, the axiom of regularity rules out *all* sets besides those in the cumulative hierarchy. Why is this? Well, take any set. Either it doesn’t contain elements, in which case it’s the empty set, or it does contain elements. If it does contain elements, then those elements either don’t contain elements (i.e. are the empty set), or do contain elements. If the second, then we look at those elements, and so on. By foundation, we don’t have infinite descending chains of sets, so this process will eventually terminate. And the only way for it to terminate is that each branch ends in the empty set!

So we’ve basically pinned down our universe of sets! It’s just ∅ and all the things we can make out of it. Wait… what? All sets are ultimately just formed from the empty set? That seems a little weird, right?

Yes, there is something a little strange about this formalization of sets. Ultimately, this comes down to our starting desire to have our theory be about sets and *only* sets. From the start we made the decision to not talk about “urelements”, the term for non-set elements of sets. And by making a strong requirement that sets are defined by their elements, we ended up concluding that all sets are ultimately built out of ∅. The nice thing is that our universe of sets is so large that we can find within it substructures that are isomorphic to models of most of the mathematical structures that we are interested in understanding, even though it all just comes from ∅.

Okay, let’s take a step back and summarize everything that we have so far. Here are our axioms (with some redundancy):

**Axiom of Extensionality
**∀𝑥∀𝑦 (∀𝑧(𝑧 ∈ 𝑥 ↔ 𝑧 ∈ 𝑦) → 𝑥 = 𝑦)

**Axiom of Empty Set**

∃𝑥∀𝑦 (𝑦 ∉ 𝑥)

**Axiom of Pairing**

∀𝑥∀𝑦∃𝑧∀𝑤 (𝑤 ∈

*𝑧*

*↔ (𝑤 = 𝑥 ∨ 𝑤 = 𝑦))*

**Axiom of Union**

∀𝑥∀𝑦∃𝑧∀𝑤 (𝑤 ∈

*𝑧*

*↔ (𝑤 ∈ 𝑥 ∨ 𝑤 ∈ 𝑦))*

**∃𝑥 (∅ ∈ 𝑥 ∧ ∀𝑦 (𝑦 ∈ 𝑥 → 𝑦 ∪ {𝑦} ∈ 𝑥))**

Axiom of Infinity

Axiom of Infinity

**Axiom Schema of Comprehension**

∀𝑥∃𝑦∀𝑧 (𝑧

*∈ 𝑦 ↔ (𝑧*

*∈ 𝑥 ∧ 𝜑(𝑧)))*

**Axiom of Power Set**

∀𝑥∃𝑦∀𝑧 (𝑧

*∈ 𝑦 ↔ 𝑧*

*⊆ 𝑥)*

**Axiom of Regularity**

∀𝑥 (𝑥 ≠ ∅ → ∃𝑦 (𝑦 ∈ 𝑥 ∧ 𝑥 ⋂ 𝑦 = ∅))

At this point we’ve reached the ‘Z’ in ‘ZFC’. The other two letters stand for one axiom each. The ‘F’ is for Abraham Fraenkel, who realized that these axioms still don’t allow us to prove the existence of all the infinite sets we want. In particular, he proved that we aren’t yet guaranteed the existence of the set {ℕ, 𝒫(ℕ), 𝒫(𝒫(ℕ)), …}, where ℕ is the set of Von Neumann ordinals, and 𝒫(𝑥) is the power set of 𝑥. To fill in this set and the rest of the cumulative hierarchy, he proposed adding an immensely powerful axiom schema:

**Axiom Schema of Replacement
**∀𝑥∃𝑦∀𝑧 (𝑧

*∈ 𝑦 ↔ ∃𝑤(𝑤 ∈ 𝑥 ∧ 𝑧 =*

*f*(𝑤)))

This is another axiom schema, because there is one instantiation for every definable function *f*. In plain English, it says that the image of any set 𝑥 under any definable function *f* is a set. With this axiom, we have what’s called ZF set theory.

Replacement is *really* strong, and its addition obviates a few of our existing axioms (pairing and comprehension, in particular). Here’s ZF set theory, with only the independent axioms:

**Axiom of Extensionality
**∀𝑥∀𝑦 (∀𝑧(𝑧 ∈ 𝑥 ↔ 𝑧 ∈ 𝑦) → 𝑥 = 𝑦)

**Axiom of Union**

∀𝑥∀𝑦∃𝑧∀𝑤 (𝑤 ∈

*𝑧*

*↔ (𝑤 ∈ 𝑥 ∨ 𝑤 ∈ 𝑦))*

**∃𝑥 (∅ ∈ 𝑥 ∧ ∀𝑦 (𝑦 ∈ 𝑥 → 𝑦 ∪ {𝑦} ∈ 𝑥))**

Axiom of Infinity

Axiom of Infinity

**Axiom of Power Set**

∀𝑥∃𝑦∀𝑧 (𝑧

*∈ 𝑦 ↔ 𝑧*

*⊆ 𝑥)*

**Axiom of Regularity**

∀𝑥 (𝑥 ≠ ∅ → ∃𝑦 (𝑦 ∈ 𝑥 ∧ 𝑥 ⋂ 𝑦 = ∅))

**Axiom Schema of Replacement**

∀𝑥∃𝑦∀𝑧 (𝑧

*∈ 𝑦 ↔ ∃𝑤(𝑤 ∈ 𝑥 ∧ 𝑧 =*

*f*(𝑤)))

These six axioms give us ALMOST everything. There’s just one final axiom to fill in, which is historically the most controversial of all: the axiom of choice.

Here’s what the axiom of choice says: There’s always a way to pick exactly one element from each of any collection of non-empty disjoint sets, and form a new set consisting of these elements. Visually, something like this is always possible:

Here’s the common analogy given when explaining the axiom of choice, due to Bertrand Russell:

A millionaire possesses an infinite number of pairs of shoes, and an infinite number of pairs of socks. One day, in a fit of eccentricity, the millionaire summons his valet and asks him to select one shoe from each pair. When the valet, accustomed to receiving precise instructions, asks for details as to how to perform the selection, the millionaire suggests that the left shoe be chosen from each pair. Next day the millionaire proposes to the valet that he select one

sockfrom each pair. When asked as to howthisoperation is to be carried out, the millionaire is at a loss for a reply, since, unlike shoes, there is no intrinsic way of distinguishing one sock of a pair from the other. In other words, the selection of the socks must be truly arbitrary.

Of course, with socks in the real world we could always just say something like “from each pair, choose the sock that is closer to the floor”. In other words, we can describe the resulting set in terms of some arbitrary relational property of the sock to the external world. But in the abstract world of set theory, all that we *have* are the intrinsic features of sets. If a set’s elements are intrinsically indistinguishable to us, then how can we choose one out of them?

This is a *very* imperfect analogy. For one thing, it suggests that we shouldn’t be able to choose one sock from each of a finite collection of pairs of socks. But this is actually do-able in *every* first-order theory! It’s a basic rule of first-order logic that from ∃𝑥 𝜑(𝑥) we can derive 𝜑(𝑦), so long as 𝑦 is a new variable that has not been used before. So if we are told that a set has two members, then we are guaranteed by the underlying logic the ability to choose one of them, even if we are told nothing at all that distinguishes these two members from each other. The problem really only arises when we have an *infinite* set of pairs of socks, and thus have to make an infinity of arbitrary choices. Proofs are finite, so any statement we can derive will only have a finite number of applications of existential instantiation. So even though we can choose from arbitrarily large but finite sets of sets with just ZF, we need the axiom of choice to choose from infinite sets.

**Axiom of Choice
**∀𝑥 ([∀𝑧 (𝑧 ∈ 𝑥 → (𝑧 ≠ ∅ ∧ ∀𝑤 ((𝑤 ∈ 𝑥 ∧ 𝑧 ≠ 𝑤) → (𝑧 ∩ 𝑤) = ∅)))] → ∃𝑦 ∀𝑧 (𝑧 ∈ 𝑥 → |𝑧 ∩ 𝑦| = 1))

That looks complicated as heck, but all it’s saying is that for any set 𝑥 whose elements are all nonempty and disjoint sets, there exists a set 𝑦 such that for all 𝑧 in 𝑥, 𝑧 and 𝑦 share exactly 1 element. |𝑤| = 1 is shorthand for ∃𝑢 (𝑢 ∈ 𝑤 ∧ ∀𝑣 (𝑣 ∈ 𝑤 → 𝑢 = 𝑣)); 𝑤 contains at least one element 𝑢 and any other elements of 𝑤 are equal to 𝑢.

Here are some statements that are equivalent to the axiom of choice in first-order logic:

- Any first-order theory with at least one model of infinite cardinality κ also has at least one model of any infinite cardinality smaller than κ.
- The Cartesian product of non-empty sets is non-empty.
- The cardinalities of any two sets are comparable.
- For any set, you can construct a strict total order such that all of its subsets have a least element.
- The cardinality of a set is equal to the cardinality of its square.
- Every vector space has a basis.
- Every commutative ring with identity has a maximal ideal.
- For any set 𝑥 and any onto function
*f*: 𝑥 → 𝑥,*f*has an inverse.

And here are some statements that are not equivalent to AC, but implied by it:

- Compactness Theorem for first-order logic
- Completeness Theorem for first-order logic
- The law of the excluded middle
- Every infinite set has a countable subset.
- Every field has an algebraic closure.
- There is a non-measurable set of real numbers.
- A solid sphere can be split into finitely many pieces which can be reassembled to form two solid spheres of the same size.
- This craziness

That’s right, to prove the Compactness Theorem for first-order logic, you actually need to use the axiom of choice! This means that you can take ZF, add the negation of the axiom of choice, and end up with a first-order theory in which compactness doesn’t hold! (A construction of just such a theory can be found here).

And indeed, you can also construct models of ZF¬C in which the Completeness Theorem doesn’t hold! It’s no exaggeration to say that the Compactness Theorem and Completeness Theorem are two of the most important results about first-order logic, so their reliance on the axiom of choice gives us pretty good reason to accept it. (Although technically we only need to accept a slightly weaker version of the axiom of choice tailor-made for these theorems.)

And there you have it, ZFC set theory! Here’s a list of all the axioms we discussed:

**Axiom of Extensionality
**∀𝑥∀𝑦 (∀𝑧(𝑧 ∈ 𝑥 ↔ 𝑧 ∈ 𝑦) → 𝑥 = 𝑦)

**Axiom of Empty Set**

∃𝑥∀𝑦 (𝑦 ∉ 𝑥)

**Axiom of Pairing**

∀𝑥∀𝑦∃𝑧∀𝑤 (𝑤 ∈

*𝑧*

*↔ (𝑤 = 𝑥 ∨ 𝑤 = 𝑦))*

**Axiom of Union**

∀𝑥∀𝑦∃𝑧∀𝑤 (𝑤 ∈

*𝑧*

*↔ (𝑤 ∈ 𝑥 ∨ 𝑤 ∈ 𝑦))*

**∃𝑥 (∅ ∈ 𝑥 ∧ ∀𝑦 (𝑦 ∈ 𝑥 → 𝑦 ∪ {𝑦} ∈ 𝑥))**

Axiom of Infinity

Axiom of Infinity

**Axiom Schema of Comprehension**

∀𝑥∃𝑦∀𝑧 (𝑧

*∈ 𝑦 ↔ (𝑧*

*∈ 𝑥 ∧ 𝜑(𝑧)))*

**Axiom of Power Set**

∀𝑥∃𝑦∀𝑧 (𝑧

*∈ 𝑦 ↔ 𝑧*

*⊆ 𝑥)*

**Axiom of Regularity**

∀𝑥 (𝑥 ≠ ∅ → ∃𝑦 (𝑦 ∈ 𝑥 ∧ 𝑥 ⋂ 𝑦 = ∅))

**Axiom Schema of Replacement**

∀𝑥∃𝑦∀𝑧 (𝑧

*∈ 𝑦 ↔ ∃𝑤(𝑤 ∈ 𝑥 ∧ 𝑧 =*

*f*(𝑤)))

**Axiom of Choice**

∀𝑥 ([∀𝑧 (𝑧 ∈ 𝑥 → (𝑧 ≠ ∅ ∧ ∀𝑤 ((𝑤 ∈ 𝑥 ∧ 𝑧 ≠ 𝑤) → (𝑧 ∩ 𝑤) = ∅)))] → ∃𝑦 ∀𝑧 (𝑧 ∈ 𝑥 → |𝑧 ∩ 𝑦| = 1))

And here’s a list with redundant axioms removed:

**Axiom of Extensionality
**∀𝑥∀𝑦 (∀𝑧(𝑧 ∈ 𝑥 ↔ 𝑧 ∈ 𝑦) → 𝑥 = 𝑦)

**Axiom of Union**

∀𝑥∀𝑦∃𝑧∀𝑤 (𝑤 ∈

*𝑧*

*↔ (𝑤 ∈ 𝑥 ∨ 𝑤 ∈ 𝑦))*

**∃𝑥 (∅ ∈ 𝑥 ∧ ∀𝑦 (𝑦 ∈ 𝑥 → 𝑦 ∪ {𝑦} ∈ 𝑥))**

Axiom of Infinity

Axiom of Infinity

**Axiom of Power Set**

∀𝑥∃𝑦∀𝑧 (𝑧

*∈ 𝑦 ↔ 𝑧*

*⊆ 𝑥)*

**Axiom of Regularity**

∀𝑥 (𝑥 ≠ ∅ → ∃𝑦 (𝑦 ∈ 𝑥 ∧ 𝑥 ⋂ 𝑦 = ∅))

**Axiom Schema of Replacement**

∀𝑥∃𝑦∀𝑧 (𝑧

*∈ 𝑦 ↔ ∃𝑤(𝑤 ∈ 𝑥 ∧ 𝑧 =*

*f*(𝑤)))

**Axiom of Choice**

∀𝑥 ([∀𝑧 (𝑧 ∈ 𝑥 → (𝑧 ≠ ∅ ∧ ∀𝑤 ((𝑤 ∈ 𝑥 ∧ 𝑧 ≠ 𝑤) → (𝑧 ∩ 𝑤) = ∅)))] → ∃𝑦 ∀𝑧 (𝑧 ∈ 𝑥 → |𝑧 ∩ 𝑦| = 1))

I was planning on going into some of the fun facts about and implications of ZFC set theory in this post, but since I’ve gone on for way too long just defining the theory, I will save this for the next post! For now, then, I leave you with a pretty picture: