The Hypergame Paradox

Credit to Joel David Hamkins, who I heard discussing this paradox on an episode of the podcast My Favorite Theorem.

Define a finite game to be any two-player turn-based game such that every possible playthrough ends after finitely many turns. For example, tic-tac-toe is a finite game because every game ends in at most nine turns.. So is chess with the 50-move-rule enforced (if 50 moves are taken without any pawn advances or captures, then the game ends in a draw). In both these examples, there’s an upper bound to how long the game can last, but this is not required. A game of transfinite Nim would count as finite; every game lasts for only finitely many turns, even though there is no upper bound on the number of turns it takes.

Now, consider the game Hypergame. To play Hypergame, Player 1 begins by choosing any finite game G. Then Player 2 plays the first move of G, Player 1 plays the second move of G, and so on until the game is completed. (Since Player 1 chose a finite game, this will always happen after some finite amount of time.)

Is Hypergame a finite game? Yes, we can easily see that it must be. Whatever game Player 1 chooses will be over after n steps, for some finite n. So that playthrough of Hypergame will have taken n+1 steps.

But if Hypergame is a finite game, then it is a valid choice for the first move of Hypergame! So we can now imagine the following playthrough of Hypergame:

A Troubling Playthrough of Hypergame
Player 1: For the finite game that we shall play, I pick Hypergame.
Player 2: Hm, okay. So now I’m playing the first move of Hypergame. So I must now choose any finite game. I’ll choose Hypergame!
Player 1: Alright so I’m again playing the first move of this new game of Hypergame. I’ll choose Hypergame again.
Player 2: And I choose Hypergame again as well.
So on forever…

At no point does either player violate the rules of Hypergame. And yet, we ended up with an infinite playthrough of Hypergame, which we proved was impossible! So we have a contradiction. What is the resolution?

✵✵✵

Here’s one possible resolution, analogous to the resolutions of similar set-theoretic paradoxes.

We can think about a game as a directed rooted tree. The vertices of the tree correspond to game states, and the edges correspond to the allowed moves. The root of the tree corresponds to the starting game state, and Player 1 gets to choose which edge to travel along first. From the new vertex, Player 2 decides the next edge to travel along. And so on. The tree’s leaves correspond to ending states of the game, and each leaf is labelled according to which player won in that ending state.

In this framework, what is a finite game? As I defined it above, a finite game is just any directed rooted tree such that every path starting at the root ends at a leaf after passing through finitely many edges. This corresponds perfectly to the idea that every possible playthrough of the game takes only finitely many turns. Notice that a finite game is not necessarily a finite tree! The game tree of a finite game is only finite if it’s also finitely branching. In other words, for a game to have a finite tree requires not just that every playthrough is finitely long, but that each player always has only finitely many choices on their turn.

For instance, the game tree of Hypergame is not a finite tree, because Player 1 has infinitely many possible finite games to choose from on his first turn. How big exactly is the game tree of Hypergame? We know that we have to have a vertex corresponding to the start of any finite game, so it must be at least as large as the set of all finite games. But how large is this set?

This is where we run into problems. The game tree of a finite game can be arbitrarily large. Consider the game which starts by Player 1 choosing any real number and then immediately losing. The height of the game tree is 1, but its width is the cardinality of the continuum. Similarly for any set X we can find a game whose tree has cardinality |X|. This means that there are finite games of arbitrary cardinalities. But then as a corollary to the nonexistence of a largest cardinality, we know that there is no set of all finite games! And this implies that Hypergame has no game tree! More precisely, there is no set corresponding to the game tree of Hypergame as we defined it.

Couldn’t we instead think about Hypergame as a proper class? Sure! But then when we choose a finite game in our first move, we couldn’t be picking Hypergame, as the property “is a finite game” would only apply to sets and not proper classes. This means that we can’t actually select Hypergame as our first move! And so we avoid the paradoxical conclusion that we can keep picking Hypergame ad infinitum.

I find it quite fascinating that the seemingly innocent notion of a finite game can lead us into paradoxes involving proper classes!

ZFC as One of Humankind’s Great Inventions

Recently I told a friend that I thought ZFC was one of humankind’s greatest inventions. He pointed out that it was pretty bold to claim this about something that most of mankind has never heard of, which I thought was a fair objection. After thinking for a bit, I reflected that the sense of greatness I meant wasn’t really consequentialist, and thus it was independent of how many people know what ZFC is, or even how many people’s lives are affected in any way by it. Instead I intended greatness in a sort of aesthetic and intellectual sense.

The closest analogy to ZFC outside of math is the idea of a “theory of everything” for physics. If we found a theory of everything for physics, it’d likely have a bunch of important practical consequences, and that’d be part of what makes it a great invention. But it would also be a great invention in an intellectual sense, as a discovery of something fundamental and unifying of many seemingly disparate phenomena we observe. This is what ZFC is like: a mathematical theory of everything. One reason this analogy is imperfect is that due to the incompleteness theorems, we know that there can be no “theory of everything” for mathematics. (Any theory of everything will have at least one thing it can’t prove, namely its own consistency.) So ZFC’s greatness can’t come from being a perfect theory of everything, because we know that it is not. Nonetheless, ZFC serves as a foundation for virtually all known mathematics, and this is what I think is so incredible about it.

What does it mean for something to “serve as a foundation” for math? ZFC is a foundation in (at least) three ways: (1) in terms of its ability to define virtually all mathematical concepts, (2) in terms of its structures being rich enough to contain objects that come from virtually all fields of math, and (3) in terms of being an axiom system that suffices to prove virtually every result in known mathematics.

Syntax

Virtually every mathematical concept you can think of has a definition in the language of ZFC. For example, we have definitions for numbers like “π” and “√2”, sets like ℕ and ℝ, algebraic objects like the group S5 and the ring ℚ[x], geometric objects like Platonic solids and differential manifolds, computational objects like Turing machines and cellular automata, and even logical entities like models of first order theories and proofs within formal systems. What makes this especially impressive is the simplicity of the language: it uses nothing besides the basic symbols of first order logic and one binary relation symbol: ∈. So one thing that ZFC teaches us is that virtually every concept in mathematics can be defined just in terms of the set membership relation, and all mathematics can be understood as exploring the properties of this relation.

Semantics

Models of ZFC are insanely richly structured. You can navigate within them to find sets corresponding to every object that mathematicians study. π has a representative set within any model of ZFC, as does the Monster group or the torus. These representative sets are not always perfect: there are models of ZFC where ℝ is countable, for instance. But within the model, they nonetheless share enough similarities with the original objects that virtually everything you can prove about the original object, remains true of the ZFC-representative.

Proof

Finally, ZFC is a computable set of sentences, and we may inquire about what can be proven from it. Keeping up the ambition of the previous two sections, we might want to claim that all mathematical truths can be proven from ZFC. But due to the limitations of first order logic discovered over the last century, we now know that this goal is not achievable. The set of all first order truths of arithmetic is not computable, and so there must be some such truths that aren’t logical consequences of ZFC. Nonetheless, it is commonly claimed that virtually all mathematical truths can be derived from ZFC using the usual proof system for first order logic.

This is especially remarkable given the simplicity of ZFC. I believe that the intuitive content of each axiom could be explained to a smart middle schooler. Additionally, these axioms are extremely intuitively appealing. the most controversial of them has been choice, which is equivalent to the statement that the Cartesian product of non-empty sets is also non-empty. Second most controversial is probably the axiom of infinity, which just says that there’s an infinite set. The rest are even less hard to accept than these.

Now, the fact that you can prove virtually everything from ZFC doesn’t mean that you should. So don’t interpret me as saying that ZFC is of practical use to the daily work of mathematicians trying to prove things outside of set theory and logic. Again, an analogy to physics: we might discover a theory of everything that we know reproduces all the known phenomena of GR and QM, but find that it’s so hard to prove things that we are practically never better off using this theory to calculate things. Nonetheless, ZFC as a theory of everything teaches us that most of math can be understood as conceptually quite simple: the logical consequences of a fairly simple and computable set of sentences about sets. People make a big deal out of Euclid’s axiomatization of geometry, but this is a small feat relative to the axiomatization of all of mathematics.

Metamath

And not only can ZFC prove virtually everything in ordinary mathematics, but ZFC can prove much of what we know in metamathematics and logic itself. When logicians are studying model theory, or even when set theorists are studying ZFC, they are almost always working with ZFC as their meta-theory, meaning that they are making sure that all of their proofs could ultimately be expanded out as ZFC proofs. So the big results of logic, like the completeness theorem, the compactness theorem, the incompleteness theorems, the Löwenheim-Skolem theorems, are all theorems of ZFC.

The fact that ZFC can even talk about these model theoretic notions means that models of ZFC are able to talk about models of ZFC, which is where things get very meta. One can prove that every model of ZFC – every one of these crazily richly-structured universes containing virtually all of mathematics – contains another such model of ZFC. This follows from the reflection theorem, which again can be proven in ZFC!

Hopefully I have now roused enough interest in you to get you to take a look at some of the actual mathematics. You might be curious to know what exactly this theory is. And you’re in luck, it’s simple enough that I can write the whole theory in just nine lines!

Note that with the exception of the final axiom, Choice, the only symbols I’ve used are logical symbols and ∈. I used shorthand for Choice for the sake of readability, but this could be expanded out just like the others. I’m also using a convention where any free variables are considered to be universally quantified over, which shortens things further.

I’ll close with a one-sentence description for each axiom.

Extensionality: No two distinct sets have all the same elements.
Pairing: For any two sets, there’s a set containing just those two.
Union: The union of any set of sets exists.
Powerset: There is a set of all subsets of any set.
Specification: For any property Φ and any set x, you can form a set out of just those elements of x with that property.
Replacement: For any definable function and any set, the image of that set under the function exists.
Infinity: There’s an infinite set.
Regularity: Every non-empty set has a member that it shares nothing with.
Choice: For any set of nonempty sets, there is a function that picks out one element from each.

The fact that you can prove everything from the infinitude of primes to Fermat’s Last Theorem from just these basic principles, is really quite mind-blowing.

Forcing and the Independence of CH (Part 2)

Part 1 here

Part 2: How big is 𝒫(ω)?

Now the pieces are all in place to start applying forcing to prove some big results. Everything that follows assumes the existence of a countable transitive model M of ZFC.

First, a few notes on terminology.

The language of ZFC is very minimalistic. All it has on top of the first-order-logic connectives is a single binary relation symbol ∈. Nonetheless, people will usually talk about theorems of ZFC as if it has a much more elaborate syntax, involving symbols like ∅ and ω and 1 and terms like “bijection” and “ordinal”. For instance, the Continuum Hypothesis can be written as “There’s a bijection between 𝒫(ω) and 1“, which is obviously using much more than just ∈. But these terms are all simply shorthand for complicated phrases in the primitive language of ZFC: for instance a sentence “φ(∅)” involving ∅ could be translated as “∃x (∀y ¬(y ∈ x) ∧ φ(x))”, or in other words “there’s some set x that is empty and φ(x) is true”.

Something interesting goes on when considering symbols like ∅ and ω and 1 that act like proper names. Each name picks out a unique set, but for some of these names, the set that gets picked out differs in different models of ZFC. For example, the interpretation of the name “1” is model-relative; it’s meant to be the first uncountable cardinal, but there are countable models of ZFC in which it’s actually only countably large! If this makes your head spin, it did the same for Skolem: you can find more reading here.

On the other hand, the name “∅” always picks out the same set. In every model of ZFC, ∅ is the unique set that contains nothing at all. When a name’s interpretation isn’t model-relative, it’s called absolute. Examples include “∅” and “1,729”. When a name isn’t absolute, then we need to take care to distinguish between the name itself as a syntactic object, and the set which it refers to in a particular model of ZFC. So we’ll write “1” when referring to the syntactic description of the first uncountable cardinal, and 1M when referring to the actual set that is picked out by this description in the model M.

With that said, let’s talk about a few names that we’ll be using throughout this post:

In M, ω is the conventional name for the set that matches the description “the intersection of all inductive sets”, where inductive sets are those that contain the ordinal 0 and are closed under successor. In any transitive model of ZFC like M, ω is exactly ℕ, the set of natural numbers. Since we’re restricting ourselves to only considering transitive models, we can treat ω as if it’s unambiguous; its interpretation won’t actually vary across the models we’re interested in.

ω1 corresponds to the description “the smallest uncountable ordinal”. This description happens to perfectly coincide with the description “the first uncountable cardinal” in ZFC, which has the name 1. But unlike with ω, different transitive models pick out different sets for ω1 and 1. So in a model M, we’ll write ω1M for the set that M believes to be the smallest uncountable ordinal and 1M for the first uncountable cardinal.

ω2 is the name for the first ordinal for which there’s no injection into ω1 (i.e. the first ordinal that’s larger than ω1). Again, this is not absolute: ω2M depends on M (although again, ω2M is the same as 2M no matter what M is). And in a countable model like those we’ll be working with, ω2M is countable. This will turn out to be very important!

𝒫(ω) is the name for the the set that matches the description “the set of all subsets of ω”. Perhaps predictably at this point, this is not absolute either. So we’ll have to write 𝒫(ω)M when referring to M’s version of 𝒫(ω).

Finally, ZFC can prove that there’s a bijection from 𝒫(ω) to , meaning that this bijection exists in every model. Thus anything we prove about the size of 𝒫(ω) can be carried over to a statement about the size of . Model-theoretically, we can say that in every model M, |𝒫(ω)M| = |M|. It will turn out to be more natural to prove things about 𝒫(ω) than .

Okay, we’re ready to make the continuum hypothesis false! We’ll do this by choosing a partially ordered set (P, ≤) whose extension as a Boolean algebra (B, ≤) has the property that no matter what M-generic filter G in B you choose, M[G] will interpret its existence as proof that |𝒫(ω)| ≥ ℵ2.

Making |𝒫(ω)| ≥ 2

Let P = { f ∈ M | f is a finite partial function from ω2M × ω to {0,1} }. Let’s have some examples of elements of P. The simplest is the empty function ∅. More complicated is the function {((ω+1, 13), 1)}. This function’s domain has just one element, the ordered pair (ω+1, 13), and this element is mapped to 1. The function {((14, 2), 0), ((ω1M•2, 2), 1)} is defined on two elements: (14, 2) and (ω1M•2, 2). And so on.

We’ll order P by reverse inclusion: f ≤ g iff f ⊇ g. Intuitively, f ≤ g if f is a function extension of g: f is defined everywhere that g is and they agree in those places. For instance, say f = {((ω, 12), 0)}, g = {((ω, 12), 0), ((13, 5), 1)}, h = {((ω, 12), 1)}, and p = {((ω, 11), 1)}. Check your understanding by verifying that (1) g ≤ f, (2) f and h are incomparable and have no common lower bound, and (3) f and p are incomparable but do have a common lower bound (what is it?).

The empty function ∅ is a subset of every function in P, meaning that ∅ is bigger than all functions. Thus ∅ is the top element of this partial order. And every f in P is finite, allowing a nice visualization what the partial order (P, ≤) looks like:

Now, G will be an M-generic ultrafilter in the Boolean extension (B, ≤) of (P, ≤). G is the set that we’re ultimately adding onto M, so we want to know some of its properties. In particular, how will we use G to show that |𝒫(ω)| = ℵ2?

What we’re going to do is construct a new set out of G as follows: first take the intersection of P with G. (remember that G is an ultrafilter in B, so it doesn’t only contain elements of P). P⋂G will be some set of finite partial functions from ω2M × ω to {0,1}. We’ll take the union of all these functions, and call the resulting set F. What we’ll prove of F is the following:

(1) F := ⋃(P⋂G) is a total function from ω2M × ω to {0,1}
(2) For every α, β ∈ ω2M, F(α, •) is a distinct function from F(β, •).

A word on the second clause: F takes as input two things: an element of ω2M and an element of ω. If we only give F its first input, then it becomes a function from ω to {0,1}. For α ∈ ω2M, we’ll give the name Fα to the function we get by feeding α to F. Our second clause says that all of these functions are pairwise distinct.

Now, the crucial insight is that each Fα corresponds to some subset of ω, namely {n ω | Fα(n) = 1}. So F defines |ω2M|-many distinct subsets of ω. So in M[G], it comes out as true that |𝒫(ω)M| ≥ |ω2M|. It’s also true in M[G] that |ω2M| = 2M, and so we get that |𝒫(ω)M| ≠ 1M. This is ¬CH!

Well… almost. There’s one final subtlety: |𝒫(ω)M| ≠ 1M is what ¬CH looks like in the model M. For ¬CH to be true in M[G], it must be that that |𝒫(ω)M[G]| ≠ ℵ1M[G]. So what if |𝒫(ω)M| ≠ |𝒫(ω)M[G]|, or 1M ≠ 1M[G]? This would throw a wrench into our proof: it would mean that M[G] believes that M’s version of 𝒫(ω) is bigger than M’s version of 1, but M[G] might not believe that its own version of 𝒫(ω) is bigger than its version of 1. This is the subject of cardinal collapse, which I will not be going into. However, M[G] does in fact believe that 𝒫(ω)M = 𝒫(ω)M[G] and that 1M1M[G].

Alright, so now all we need to do to show that M[G] believes ¬CH is to prove (1) that F is a total function from ω2M × ω to {0,1}, and (2) that for every α, β ∈ ω2M, Fα is distinct from Fβ. We do this in three steps:

  1. F is a function.
  2. F is total.
  3. For every α, β ∈ ω2M, Fα is distinct from Fβ.

Alright so how do we know that F is a function? Remember that F = ⋃(P⋂G). What if some of the partial functions in P⋂G are incompatible with each other? In this case, their union cannot be a function. So to prove step 1 we need to prove that all the functions in P⋂G are compatible. This follows easily from two facts: that G is an ultrafilter in B and P is dense in B. The argument: G is an ultrafilter, so for any f, g ∈ P there’s some element f∧g ∈ B that’s below both f and g. All we know about f∧g is that it’s an element of B, but we have no guarantee that it’s also a member of P, our set of partial functions. In other words, we can’t say for sure that f∧g is actually a partial function from ω2M × ω to {0,1}. But now we use the fact that P is dense in B! By the definition of density, since f∧g is an element of B, P must contain some h ≤ f∧g. Now by transitivity, h ≤ f and h ≤ g. So f and g have a common function extension, meaning they must be compatible functions! Pretty magical right?

So F is a function. But how do we know that it’s total? We prove this by looking closely at the dense subsets of P. In particular, for any α ∈ ω2M and any n ∈ ω, define Dα,n := {f ∈ P | (α, n) ∈ dom(f)}. This is a dense subset of P. Why? Well, regardless of what α and n are, any function f in P is either already defined on (α, n), in which case we’re done, or it’s not, in which case f has a function extension with (α, n) in its domain. So from any element of P, you can follow the order downwards until you find a function with (α, n) in its domain. Since Dα,n is dense in P and G is M-generic, G must have some element in common with Dα,n. Thus G contains an element f of P that has (α, n) in its domain. This f is one of the functions that we union over to get F, so F must have (α, n) in its domain as well! And since α and n were totally arbitrary, F must be total.

Finally, why are the “Fα”s pairwise distinct? Again we construct dense subsets for our purposes: for any α, β ∈ ω2M, define Dα,β := {f ∈ P | fα ≠ fβ}. This is clearly dense (you can always extend any f in P to make fα and fβ disagree somewhere). So G contains an element f of P for which fα ≠ fβ. And thus it must also be true of F that Fα ≠ Fβ!

And we’re done! We’ve shown that once we add G to M, we can construct a new set F = ⋃(P⋂G) such that F encodes |ω2M|-many distinct subsets of ω, and thus that M[G] ⊨ |𝒫(ω)| ≥ ℵ2.

Making |𝒫(ω)| ≥ ℵ420

What’s great is that this argument barely relied on the “2” in ω2M. We could just as easily have started with P = {f ∈ M | f is a finite partial function from ω420M × ω to {0,1}}, constructed an M-generic filter G in the Boolean extension of P, then defined F to be ⋃(P⋂G).

G is still an ultrafilter in B and P is still dense in B, so F is still a function.

Dα,n := {f ∈ P | (α, n) ∈ dom(f)} is still a dense subset of P for any α ∈ ω420M and any n ∈ ω, so F is still total.

And Dα,β := {f ∈ P | fα ≠ fβ} is still a dense subset of P for any α, β ∈ ω420M, so all the “Fα”s are pairwise distinct.

And now F encodes |ω420M|-many distinct subsets of ω! The final step, which I’m going to skip over again, is showing that M[G] believes that |ω420M| = |ω420M[G]|, i.e. that cardinal collapse doesn’t occur.

And there we have it: this choice of P gives us a new model M[G] of ZFC that believes that |𝒫(ω)| ≥ ℵ420! Note how easy and quick this argument is now that we’ve gone through the argument for how to make M[G] believe |𝒫(ω)| ≥ ℵ2. This is a great thing about forcing: once you really understand one application, other applications become immensely simpler and easier to understand.

Making |𝒫(ω)| = ℵ1

Okay, we’ve made CH false. Now let’s make it true! This time our choice for (P, ≤) will be the following:

P = {f ∈ M | f is an M-countable partial function from ω1M to 𝒫(ω)M}. Once more we order by reverse inclusion: f ≤ g iff f ⊇ g.

A crucial thing to notice is that I’ve merely required the functions in P to be M-countable, rather than countable. For a set to be M-countable is for there to exist an injection in M from set to ω. Any M-countable set is countable, but some countable sets may not be M-countable (like M itself!). In particular, we know that ω1M is actually a countable set, meaning that if we required true countability instead of just M-countability, then P would include some total functions! But M has no injection from ω1M to ω, so no function in P can be total. This will be important in a moment!

We get our M-generic ultrafilter G and define F := ⋂(P⋂G). Now we want to show that F is a total and surjective function from ω1M to 𝒫(ω). As before, we proceed in three steps:

  1. F is a function
  2. F is total
  3. F is surjective

Step 1: G is an ultrafilter and P is dense in G, so the same argument works to show that all elements of P⋂G are compatible: (1) ultrafilter implies that any f, g in P⋂G have a least upper bound f∧g in B, (2) density of P in G implies that f∧g has a lower bound h in P, so (3) f and g have a common function extension . Thus F is a function.

Step 2: Dα := {f ∈ P | α ∈ dom(f)} is dense in P for any α ∈ ω1M, so F is total.

Step 3: DA := {f ∈ P | A ∈ image(f)} is dense in P for any A 𝒫(ω)M. Why? No f in P is total, so any f in P can be extended by adding one more point (α, A) where α ∉ dom(f). Thus A ∈ image(F) for any A 𝒫(ω), so F is surjective.

So F is a total surjective function from ω1M to 𝒫(ω)M, meaning that in M[G] it’s true that |𝒫(ω)M| ≤ ℵ1M. And since ZFC proves that |𝒫(ω)| > ℵ0, it follows that M[G] ⊨ |𝒫(ω)| = ℵ1.

You might have noticed that each of these arguments could just as easily have been made if we had started out with defining P as the finite partial functions in M from ω1M to 𝒫(ω)M. This is true! If we had done this, then we still would have ended up proving that F was a total surjective function from ω1M to 𝒫(ω)M, and therefore that |𝒫(ω)M| = ℵ1M. However, this is where cardinal collapse rears its ugly head: the final step is to prove that |𝒫(ω)M| = |𝒫(ω)M[G]| and that 1M = 1M[G]. This requires that P be the countable partial functions rather than just the finite ones.

And with that final caveat, we see that M[G] believes that |𝒫(ω)| = ℵ1.

Making |𝒫(ω)| ≤ ℵ314

Let P = {f ∈ M | f is an M-countable partial function from ω314M to 𝒫(ω)} and order by reverse inclusion: f ≤ g iff f ⊇ g.

⋂(P⋂G) is again a total and surjective function from ω314M to 𝒫(ω), following the exact same arguments as in the previous section. Cardinal collapse doesn’t occur here either, so M[G] believes that |𝒫(ω)| ≤ ℵ314.

✯✯✯

We’ve proven the independence of the Continuum Hypothesis from ZFC! We’ve done more: we’ve also shown how to construct models of ZFC in which we have lots of control over the size of 𝒫(ω). Want a model of ZFC in which |𝒫(ω)| is exactly 1729? Just force with the right P and you’re good to go! There’s something a bit unsatisfactory about all this though, which is that in each application of forcing we’ve had to skim over some details about cardinal collapse that were absolutely essential to the proof going through. I hope to go into these details in a future post to close these last remaining gaps.

Forcing and the Independence of CH (Part 1)

Part 2 here.

Part 1: What is Forcing?

Forcing is a set-theoretic technique developed by Paul Cohen in the 1960s to prove the independence of the Continuum Hypothesis from ZFC. He won a Fields Medal as a result, and to this day it’s the only Fields Medal to be awarded for a work in logic. It’s safe to say that forcing is one of the most powerful techniques that set theorists have at their disposal as they explore the set theoretic multiverse.

Forcing has an intimidating reputation, which is somewhat deserved. The technique involves many moving parts and it takes some time to see how they all fit together. I found these three resources to be excellent for learning the basics of the technique:

Each of these sources takes a slightly different approach, and I think that they complement each other quite well. I’ve produced a high-level outline of the argument for why forcing works, as well as how to use forcing to prove the independence of the CH. I don’t feel quite competent to present all of forcing from the ground up, so this outline is not meant to be self-contained. In particular, I’m gonna assume you’re familiar with the basics of first order logic and ZFC and are comfortable with Boolean algebras. I’m also going to skim over the details of cardinal collapse, which are still fairly opaque to me.

Forcing in 5 lines!

  1. Fix a countable transitive model M ⊨ ZFC and a Boolean algebra (B, ≤) ∈ M.
  2. Define MB := { f ∈ M | f: MB ⇀ B } ⊆ M.
  3. Choose an M-generic ultrafilter G ⊆ B.
  4. For every n in MB, define valG(n) := { valG(m) | (m, b) ∈ n for some b in G }.
  5. Define M[G] := { valG(n) | n ∈ MB }. This is a countable transitive model of ZFC containing all of M as well as G!

We start with a countable transitive model M of ZFC, and then construct a larger countable transitive model M[G] of ZFC that contains a new set G. Let me say a few words on each step.

Step 1
1. Fix a countable transitive model M ⊨ ZFC and a Boolean algebra (B, ≤) ∈ M.

There’s not much to say about Step 1. I suppose it’s worth noting that we’re assuming the existence of a countable transitive model of ZFC, meaning that the results of forcing are all conditioned not just on the consistency of ZFC, but on the existence of transitive models of ZFC, which is significantly stronger. (The existence of countable transitive models is no stronger than the existence of transitive models, by the downwards Lowenheim-Skolem theorem.)

Moving on!

Step 2
2. Define MB := { f ∈ M | f: MB ⇀ B }

MB is sometimes called a Boolean-valued model of ZFC. The elements of MB are called “B-names” or “B-valued sets” or just “B-sets” (as I’ll call them). These B-sets are partial functions in M that go from MB to B. “Huh? Hold on,” I hear you saying, “Isn’t that a circular definition?”

It might look circular, but it’s not; it’s inductive. First of all, regardless of what MB is, the empty set counts as a partial function from MB to B: all its inputs are in MB, all its outputs are in B, and no input gets sent to multiple outputs! ∅ is also in M, so ∅ is an element of MB. But now that we know that ∅ is in MB, we can also construct the function {(∅, b)} for each b in B. This satisfies the criterion of “being a partial function from MB to B”, and it is in M because B is in M. So for any b in B, {(∅, b)} is in MB. And now we can construct more partial functions with these new elements of MB, like {(∅, b), ({(∅, b’)}, b’’)} and so on.

If you’re still not convinced that the definition of MB is sensible, we can define it without any apparent circularity, making the induction explicit:

  • V0 =
  • Vα+1 = { f ∈ M | f is a partial function from Vα to B } for any ordinal α in M.
  • Vλ = ⋃{Vα | α ∈ λ} for any limit ordinal λ in M.
  • Finally, MB = ⋃{Vα | α ∈ M}.

Okay, so now we know the definition of MB. But what is it? How do we intuitively think about the elements of MB? One way is to think of them as “fuzzy sets”, sets that contain each other to varying degrees. In this way of seeing things, the Boolean algebra B is our extension of the trivial Boolean algebra {True, False} to a more complicated Boolean algebra that allows intermediate truth values as well. So for instance, of the B-set {(∅, b)} we can say “it contains ∅ to degree b, and contains nothing else”. Of the B-set {(∅, b), ({(∅, b’)}, b’’)} we may say “it contains ∅ to degree b, and contains [the set that contains ∅ to degree b’] to degree b’’, and nothing else.” And so on.

Every Boolean algebra has a top element ⊤ corresponding to “definite truth” and a bottom element ⊥ corresponding to “definite falsity”, so our B-sets don’t have to be entirely fuzzy. For instance the B-set {(∅, ⊤)} definitely contains ∅ and definitely doesn’t contain anything else. It can be thought of as intuitively similar to the set {∅} in M. And the B-set {(∅, ⊥)} definitely doesn’t contain ∅, or anything else for that matter. This brings up an interesting question: how is the B-set {(∅, ⊥)} any different from the B-set ∅? They have the same intuitive properties: both contain nothing (or said differently, contain each thing to degree ⊥). Well that’s right! You can think of {(∅, ⊥)} and ∅ as two different names for the same idea. They are technically distinct as names, but in Steps 3 and 4 when we collapse MB back down to an ordinary model of ZFC, the distinction between them will vanish. (The details of how exactly we assign B-values to sentences about MB are interesting in their own right, and might deserve their own post.)

So intuitively what we’ve done is take our starting model M and produce a new structure that looks a lot like M in many ways except that it’s highly ambiguous: in addition to all the old sharp sets we have all these new fuzzy sets that are unsure about their members. In the next two steps we collapse all this ambiguity back down, erasing all of the fuzziness. It turns out that there are many ways to collapse the ambiguity (one for each ultrafilter in B in fact!), but we’ll want to choose our ultrafilter very carefully so that we (1) don’t end up back where we started, (2) end up with a model of ZFC, and (3) end up with an interesting model of ZFC.

Step 3
3. Choose an M-generic ultrafilter G ⊆ B.

There’s some terminology here that you might be unfamiliar with.

Firstly, what’s an ultrafilter? It’s a nonempty proper subset of B that’s closed upwards, closed under intersection, and contains one of b or ¬b for each b in B. I talked about ultrafilters here in the context of power sets, but it generalizes easily to Boolean algebras (and there are some pretty pictures to develop your intuitions a bit).

Okay, so that’s what ultrafilters are. What is it to be a generic ultrafilter? A generic ultrafilter is one that intersects every dense subset of B. What’s a dense subset of B? A dense subset of a Boolean algebra (B, ≤) is any subset of B\{⊥} that lower-bounds all of B\{⊥}. In other words, D is a dense subset of B if for every element of B\{⊥} there’s a lower element of D. Intuitively, D is like a foundation that the rest of B\{⊥} rests upon (where “upwards” corresponds to “greater than”). No matter where you are in B\{⊥}, you can follow the order downwards and eventually find yourself at an element of D. And a generic ultrafilter is an ultrafilter that shares something in common with each of these foundational subsets.Why have we excluded the bottom element of B, ⊥? If we hadn’t done so then the dense subsets would just be any and all sets containing ⊥! Then any generic ultrafilter would have to contain ⊥, but this is inconsistent with the definition of an ultrafilter!

Finally, what’s an M-generic ultrafilter? Recalling that a generic ultrafilter has to intersect every dense subset of B, an M-generic ultrafilter only needs to intersect every dense subset of B that’s in M.

I hear a possible objection! “But you already told us that B is in M! And since ZFC has the power set axiom, don’t we know that M must also contain all of B’s subsets? If so then M-genericity is no different from genericity!” The subtlety here is that the power set axiom guarantees us that M contains a set 𝒫(B) that contains all the subsets of B that are in M. Nowhere in ZFC is there a guarantee that every subset of B exists, and in fact such a guarantee is not possible in any first-order theory whatsoever! (Can you see how this follows from the downward Lowenheim-Skolem theorem?) The power set axiom doesn’t create any subsets of B, it merely collects together all the subsets of B that already exist in M.

Now, why do we care about G being M-generic? A few steps down the line we’ll see why M-genericity is such an important property, but in all honesty my intuition is murky here as to how to motivate it a priori. For now, take it on faith that M-genericity ends up being exactly the right property to require of G for our purposes. We’ll end up making extensive use of the fact that G intersects every dense subset of B in M.

Very keen readers might be wondering: how do we know that an M-generic ultrafilter even exists? This is where the countability of M saves us: since M is countable, it only contains countably many dense subsets of B: (D0, D1, D2, …). We construct G as follows: first we choose any b0 ∈ D0. Then for any n ∈ ℕ, since Dn+1 is dense in B, we can find a dn+1 ∈ Dn+1 such that dn+1 ≤ dn. Now we have an infinite descending chain of elements from dense subsets. Define G to be the upwards closure of {dn | n ∈ ℕ}, i.e. G = {b ∈ B | b ≥ dn for some n ∈ ℕ}. This is M-generic by construction, and it’s an ultrafilter: for any b ∈ B, either {b’ ∈ B | b’ ≤ b} or {b’ ∈ B | b’ ≤ ¬b} is dense, implying that G contains either b or ¬b. (Convince yourself that it’s also closed upwards and under ∧.)

Really important note: this proof of G’s existence relies on the countability of M. But from M’s perspective, it isn’t countable; i.e. M doesn’t believe that there’s a bijection that puts each set in correspondence with an element of omega. So M can’t prove the existence of G, and in fact G is a set that doesn’t exist within M at all.

Step 4
4. For every n in MB, define valG(n) := {valG(m) | (m, b) ∈ n for some b in G}.

Ok, next we use G to define a “valuation function” valG(•). This function takes the B-sets created in Step 2 and “collapses” them into ordinary sets. It’s defined inductively: the G-valuation of a B-set n is defined in terms of the G-valuations of the elements of n. (As a side note, the fact that this definition works relies on the transitivity of our starting model! Can you see why?)

Let’s work out some simple examples. Start with n = ∅. Then valG(∅) = {valG(m) | (m, b) ∈ ∅ for some b in G} = ∅, because there is no (m, b) in ∅. Thus G evaluates ∅ as ∅. So far so good! Now for a slightly trickier one: n = {(∅, b)}, where b is some element of B.

valG( {(∅,b)} ) = {valG(m) | (m, b) ∈ {(∅, b)} for some b in G} = {valG(m) | m = ∅ & b ∈ G}

There are two cases: either b is in G or b is not in G. In the first case, {(∅, b)} is evaluated to be {∅}. In the second, {(∅,b)} is evaluated to be {}. What’s the intuition here? G is a subset of B, and we can think of it as a criterion for determining which Boolean values will be collapsed to True. And since G is an ultrafilter, every Boolean value will either be collapsed to True (if b ∈ G) or to False (if ¬b ∈ G). Recall that we thought of {(∅, b)} as a B-set that “contained ∅ to degree b”. Our G-evaluation of this set removed all fuzziness: if b was included in G then we evaluate {(∅, b)} as actually containing ∅. And if not, then we said that it didn’t. Either way, all fuzziness has been removed and we’ve ended up with an ordinary set!

The same happens for every B-set. When we feed it into the function valG(•), we collapse all its fuzziness and end up with an ordinary set, whose members are determined by our choice of B.

Step 5
5. Define M[G] := {valG(n) | n ∈ MB}. M[G] is a countable transitive model of ZFC such that M ⊆ M[G] and G ∈ M[G].

Now we simply evaluate all the B-sets and collect them together, and give the result a name: M[G]. This part is easy: the hard work has already been done in defining MB and valG. What remains is to show that this new set is a countable transitive model of ZFC containing all of M as well as G. Let’s do this now.

First of all, why is M[G] countable? Well, M is countable and MB is a subset of M, so MB is countable. And valG is a surjective function from MB into M[G], so |M[G]| ≤ |MB|.

Second, why is M[G] transitive? This follows immediately from its definition: the elements of M[G] are G-valuations, and elements of G-valuations are themselves G-valuations. So elements of elements of M[G] are themselves elements of M[G]. That’s transitivity!

Third, why is M[G] a model of ZFC? Er, we’ll come back to this one.

Fourth, why is M ⊆ M[G]? To show this, we’ll take any arbitrary set x in M and show that it exists in M[G]. To do this, we first define a canonical name for x: a B-set Nx that acts as the surrogate of x in MB. Define Nx to be {(Ny, ⊤) | y ∈ x}. (Make sure that Nx is actually in MB!) What’s the G-valuation of Nx? It’s just x! Here’s a proof by ∈-induction:

First, N = {(Ny,⊤) | y ∈ ∅} = ∅.
So valG(N) = valG(∅) = {valG(m) | (m, b) ∈ ∅ for some b in G} = ∅.

Now, assume that valG(Ny) = y for every y ∈ x.
Then valG(Nx) = {valG(Ny) | (Ny, b) ∈ Nx for some b in G} = {y | (Ny, ⊤) ∈ Nx} = {y | y ∈ x} = x.

Thus by ∈-induction, for all x, valG(Nx) = x.

Finally, why is G ∈ M[G]? We prove this by finding a B-set that G-valuates to G itself. Define A := {(Nb, b) | b ∈ B}. Then valG(A) = { valG(m) | (m, b) ∈ A for some b in G } = {valG(Nb) | b ∈ G} = {b | b ∈ G} = G.

And there it is! We’re done!

Ha ha, tried to pull a fast one on you. We still have one crucial step remaining: proving that M[G] is actually a model of ZFC! As far as I know, there’s no concise way to do this… you have to actually go through each axiom of ZFC and verify that M[G] satisfies it. I’m not going to do that here, but I will provide proofs of four axioms to give you a flavor.

Infinity: M contains a set I satisfying the axiom of infinity and M ⊆ M[G]. So I ∈ M[G], and it still satisfies infinity (still contains ∅ and is closed under successor).

Pairing: Let x and y be any two sets in M[G]. Then for some n1 and n2 in MB, x = valG(n1) and y = valG(n2). Now define n := {(n1, ⊤), (n2, ⊤)} ∈ MB. Then valG(n) = {x, y}.

Union: Let x be any element of M[G]. Since M[G] is transitive, every element y of an element of x is in M[G]. So for each such y there’s a B-set ny such that valG(ny) = y. Define n := {(ny,⊤) | y ∈ z for some z ∈ x}. Then valG(n) = {valG(ny) | y ∈ z for some z ∈ x} = {y | y ∈ z for some z ∈ x} = ⋃x.

Comprehension: Let φ(y) be any first-order formula in the language of ZFC, and x ∈ M[G]. For every element y of x, let ny be any B-set that G-valuates to y. Define n := {(ny,⊤) | y ∈ x & φ(y)}. Then valG(n) = {y ∈ x | φ(y)}.

And I leave you to fill in the rest of the axioms of ZFC for yourself. Much of it looks very similar to the proofs of pairing, union, and comprehension: make the natural choice for a B-set which G-valuates to the particular set whose existence you’re trying to prove. And THEN you’re done!

✯✯✯

Okay! If you’ve made it this far, take a breather and congratulate yourself. You now understand how to adjoin a new set to an existing model of ZFC so long as this new set is an M-generic ultrafilter in a Boolean algebra B ∈ M). And this process works equally well no matter what Boolean algebra you pick!

In fact, the choice of the Boolean algebra in step 1 is the key degree of freedom we have in this whole process. B isn’t the set that we end up adjoining to M, and in fact B is always chosen to be an element of our starting model. But the fact that G is always chosen to be an M-generic filter in B means that the structure of B greatly influences what properties G has.

You might notice that we also technically have another degree of freedom in this process: namely step 3 where I said “choose an M-generic filter G in B”. While this is technically true, in practice much of forcing is about finding really clever choices of B so that any M-generic filter in B has whatever interesting property we’re looking for.

There’s another subtlety about the choice of B, namely that in applications of forcing one generally starts with a partially ordered set (P, ≤) and then extends it to a Boolean algebra B. A really cool result with an even cooler topological proof is that this extension is always possible. For any choice of P one can construct a complete Boolean algebra B such that P is order-isomorphic to a dense subset of B. (In a sentence, we turn P into a topological space whose open sets are the downwards-closed subsets and then let B be the set of regular open sets ordered by inclusion. More on this below.)

What’s great about this is that since P is dense in B, all dense subsets of P are also dense subsets of B. This means that we can analyze properties of our M-generic ultrafilter G in B solely by looking at the dense subsets of P! This means that for many purposes you can simply ignore the Boolean algebra B and let it do its work in the background, while focusing on the partially ordered set (P, ≤) you picked out. The advantage of this should be obvious: there are many more partially ordered sets than there are Boolean algebras, so we have much more freedom to creatively choose P.

Let’s summarize everything by going over the outline we started with and filling in a few details:

Forcing in slightly more than 5 lines

  1. Fix a countable transitive model M ⊨ ZFC and a partial order (P, ≤) ∈ M.
  2. Extend P to a Boolean algebra (B, ≤) ∈ M.
    1. Define T := {A ⊆ P | A is closed downwards}.
    2. Define B := {S ∈ T | Int(Cl(S)) = S}.
    3. (B, ⊆) is a Boolean algebra: ¬A = Int(Ac), A∧B = A⋂B, A∨B = Int(Cl(A∪B)).
    4. P is order-isomorphic to a dense subset of B
      1. For p ∈ P, define S(p) := {q ∈ P | q ≤ p} ∈ B.
      2. S is an order-isomorphism from P to B and {S(p) | p ∈ P} is dense in B.
  3. Define MB := { f ∈ M | f: MB ⇀ B }.
  4. Choose an M-generic ultrafilter G ⊆ B.
    1. Since M is countable, we can enumerate the dense subsets of B in M: (D0, D1, D2, …)
    2. Choose any b0 ∈ D0.
    3. For any n ∈ ℕ, since Dn+1 is dense in B, we can find a dn+1 ∈ Dn+1 s.t. dn+1 ≤ dn.
    4. G := { b ∈ B | b ≥ dn for some n ∈ ℕ } is an M-generic ultrafilter in B.
  5. For every n in MB, define valG(n) := { valG(m) | (m, b) ∈ n for some b in G }.
  6. Define M[G] := { valG(n) | n ∈ MB }.
    1. M[G] is a countable transitive model of ZFC.
    2. M ⊆ M[G]
      1. For any x ∈ M, define Nx := {(Ny,⊤) | y ∈ x} ∈ MB.
      2. Then valG(Nx) = x, so x ∈ M[G].
    3. G ∈ M[G]
      1. Define A := {(Nb, b) | b ∈ B} ∈ MB.
      2. valG(A) = G, so G ∈ M[G].

We are now ready to apply forcing to prove the independence of CH from ZFC! In Part 2 you will learn exactly what choice of P makes the cardinality of the continuum ≥ ℵ2 (thus making CH false) and what choice of P makes the cardinality of the continuum exactly ℵ1 (thus making CH true). In fact, I’ll do you one better: in Part 2 you’ll learn what to make P in order to make the cardinality of the continuum larger than ℵ69, ℵ420, or virtually any other cardinality you like!

End of Part 1. Part 2 here.

ZFC, and getting the right answer by accident

There’s something I’m confused about with regards to ZFC. We already know that no first order theory allows you to perfectly capture the semantics of concepts like the natural numbers or finitely many. The best we can do if restricted to first-order logic (which is where most of modern mathematics tries to stay) is to have theories with surrogates for those concepts.

For instance, there’s this object called ω that stands in for the natural numbers in ZFC, even though there are all these models of ZFC in which ω contains many more objects besides the natural numbers (in fact, uncountably many in some models). And “X is finite” is replaced by “there’s a bijection from X to an element of ω.”

In formalizing these surrogate concepts, we make sure to not include any false statements about the concepts, which gives us a type of soundness of our surrogate concept. I.e., ZFC isn’t going to prove things about ω that are false of the set of natural numbers, because in one model ω is the set of natural numbers.

But it doesn’t give us completeness. We aren’t going to be able to prove ALL the true first-order sentences about the natural numbers, or about the concept of finiteness. (This is of course just a product of the first-order nature of the theories in which we’re defining these surrogates.)

So in each of these cases there will be true statements involving the concept that our theory will be agnostic about. We can look for “really good” surrogates, surrogates which allow us to prove most of the important and interesting true statements about the concept, and only fail in very abstract and unusual settings. The degree to which we can make good surrogates is the degree to which a first-order thinker can make sense of and usefully apply non-first-order concepts. (A first-order thinker being one that has no built-in understanding of non-first-order concepts.)

So one general question is: how good is a given surrogate? And another is, how do we know based on the axioms how good of a surrogate we’ll be getting? This is the thing I’m confused about.

In ZFC, there’s this weird phenomenon of the theory “getting the right answers accidentally.” It’s a little tough to put into words, but here’s an example:

ZFC can prove that |P(X)| > |X| for all X. So for instance ZFC can prove that |P(ω)| > |ω|. Meaning that ZFC can prove that the power set of the naturals is uncountable. But ZFC has countable models! (As does every first-order theory.) In those models, the power set of the naturals is NOT uncountable.

First order logic is sound, so what’s going on ISN’T that ZFC is proving a sentence that’s false in some of its models. It’s that the sentence is false in that model, if interpreted to be about the actual concept, and true in that model if interpreted to be about the surrogate concept. The surrogate for “P(ω) is uncountable” in ZFC is “there’s no bijection from P(ω) to ω”. And in the countable models of ZFC, the bijection that would prove the equinumerosity of ω and P(ω) is missing! So in those models, it’s actually TRUE that “there’s no bijection from P(ω) to ω”, even though P(ω) and ω really do have the same number of elements.

This is the sort of subtle nonsense you get used to in mathematical logic. Two accidents cancel out: first ZFC makes the mistake of having models where P(ω) is countable, and second it makes the mistake of losing track of the bijection from P(ω) and ω. And as a result, ZFC is able to prove the correct thing.

This seems really weird to me, and it’s an unsettling way for a supposed foundations of mathematics to be getting the right answers. This type of thing happens a lot, and it feels like we keep getting “lucky” in that our unintended interpretations of a sentence interfere with each other and cancel out their problematic features. It makes me wonder why we should be confident that ZFC will continue giving us the right answers (as opposed to being agnostic on important basic questions). And in fact we do have some examples of important and basic questions that ZFC is agnostic on, most dramatically that if |X| < |Y|, then |P(X)| < |P(Y)|!

It’s not that I doubt that ZFC’s surrogates for non-first order concepts end up allowing us to prove an enormous amount of the true facts about these concepts, because they clearly do. But is there some principled reason we can give for WHY the axioms of ZFC lead us to such a robust framework for mathematics in which we can prove many true statements?

One thing that suggests this is the power of the axiom of replacement. It’s just an extremely strong and useful axiom that allows us to prove a whole lot. But that doesn’t seem to help explain the “right-by-accident” phenomenon. So what does?

A Self-Interpreting Book

A concept: a book that starts by assuming the understanding of the reader and using concepts freely, and as you go on it introduces a simple formal procedure for defining words. As you proceed, more and more words are defined in terms of the basic formal procedure, so that halfway through, half of the words being used are formally defined, and by the end the entire thing is formally defined. Once you’re read through the whole book, you can start it over and read from the beginning with no problem.

I just finished a set theory textbook that felt kind of like that. It started with the extremely sparse language of ZFC: first-order logic with a single non-logical symbol, ∈. So the alphabet of the formal language consisted of the following symbols: ∈ ( ) ∧ ∨ ¬ → ↔ ∀ ∃ x ‘. It could have even started with a sparser formal language if it was optimizing for alphabet economy: ∈ ( ∧ ¬ ∀ x ‘ would suffice. As time passed and you got through more of the book, more and more things were defined in terms of the alphabet of ZFC: subsets, ordered pairs, functions from one set to another, transitivity, partial orders, finiteness, natural numbers, order types, induction, recursion, countability, real numbers, and limits. By the last chapter it was breathtaking to read a sentence filled with complex concepts and realize that every single one of these concepts was ultimately grounded in this super simple formal language we started with, with a finitistic sound and complete system of rules for how to use each one.

But could it be possible to really fully define ALL the terms used by the end of the book? And even if it were, could the book be written in such a way as to allow an alien that begins understanding nothing of your language to read it and, by the end, understand everything in the book? Even worse, what if the alien not only understands nothing of your language, but starts understanding nothing of the concepts involved? This might be a nonsensical notion; an alien that can read a book and do any level of sophisticated reasoning but doesn’t understand concepts like “and” and “or“.

One way that language is learned is by “pointing”: somebody asks me what a tree is, so I point to some examples of trees and some examples of non-trees, clarifying which is and which is not. It would be helpful if in this book we could point to simple concepts by means of interactive programs. So, for instance, an e-book where an alien reading the book encounters some exceedingly simple programs that they can experiment with, putting in inputs and seeing what results. So for instance, we might have a program that takes as input either 00, 01, 10, or 11, and outputs the ∧ operation applied to the two input digits. Nothing else would be allowed as inputs, so after playing with the program for a little bit you learn everything that it can do.

One feature of such a book would be that it would probably use nothing above first-order logical concepts. The reason is that the semantics of second-order logic cannot be captured by any sound and complete proof system, meaning that there’s no finitistic set of rules one could explain to an alien so that they know how to use the concepts involved correctly. Worse, the set of second-order tautologies is not even recursively enumerable (worse than the set of first-order tautologies, which is merely undecidable), so no amount of pointing-to-programs would suffice. First-order ZFC can define a lot, but can it define enough to write a book on what it can define?

Transfinite Nim: uncomputable games and games whose winner depends on the Continuum Hypothesis

In the game of Nim, you start with piles of various (whole number) heights. Each step, a player chooses one pile and shrinks it by some non-zero amount. Once a pile’s height has been shrunk to zero, it can no longer be selected by a player for shrinking. The winner of the game is the one that takes the last pile to zero.

Here’s a sample game of Nim:

Starting state
3, 2
After Frank’s move
2, 2
After Marie’s move
2, 1
After Frank’s move
0, 1
After Marie’s move
0, 0

Marie takes the last pile to zero, so she is the winner. Frank’s second-to last move was a big mistake; by reducing the first pile from 2 to 0, he left the only remaining pile free to be taken by Marie. In a game of Nim, you should never leave only one pile remaining at the end of your turn. If Frank had instead shrunk the first pile from 2 to 1, then the state of the piles would be (1, 1). Marie would be forced to shrink one of the two piles to zero, leaving Frank to take the final pile and win.

The strategy of Nim with two piles is extremely simple: in your turn you should always even out the two piles if possible. This is only possible if the heights are different at the start of your turn. See if you can figure out why this strategy guarantees a win!

Transfinite Nim is a version of Nim where the piles are allowed to take infinite ordinal values. So for instance, a game might have the following starting position:

Starting state
ω2 + ω, ω1 + ε0

If Marie is moving first, then can she guarantee a win? What move should she make?

It turns out that the strategy for two-pile Transfinite Nim is exactly the same as for two-pile Finite Nim. Marie has a guaranteed win, because the two piles are different values. Each move she’ll just even the piles out. So for her first move, she should do the following:

Starting state
ω2 + ω, ω1 + ε0
After Marie’s move
ω2 + ω, ω2 + ω

No matter what Frank does next, Marie can just “copy” that move on the other pile, guaranteeing that Marie always has a move as long as Frank does. This proves that Marie must have the last move, and therefore win.

One important feature of Transfinite Nim is that even though we’re dealing with infinitely large piles, every game can only last finitely long. In other words, Frank has no strategy for delaying his loss infinitely long, and thus forcing a sort of “stalemate by exhaustion.” This is because the ordinals are well-ordered, and any decreasing sequence of well-ordered items must terminate. (Why? Just consider the definition of a well-ordered set: every subset has a least element. If the game were to continue infinitely long, each step decreasing the state but never terminating, then the sequence of states would be a subset of the ordinals which has no least element!)

Although the strategy of Transfinite Nim is in one sense no more interesting than Finite Nim, the game does have some interesting features that it inherits from the ordinals. For instance, there are sets of ordinal numbers such that the ordering between them is uncomputable. For such sets, the ability to compute the winning strategy is called into question.

For instance, the set of all countable ordinals is uncomputable. The quick proof is that there are uncountably many countable ordinals – otherwise in ZFC the set of countable ordinals would itself be a countable ordinal and would thus contain itself – and any Turing machine can only compare countably many things. However, there are also uncomputable ordinals that are countable! If α is a countable ordinal, then we can find some bijection (not necessarily order-preserving) between α and ω, meaning that we can meaningfully ask if a Turing machine can compare any two of α’s elements (each represented by some natural number). And for an uncomputable countable ordinal, we know that no Turing machine can successfully compute its order type.

The smallest uncomputable ordinal (which, in ZFC, is exactly the set of all computable ordinals) is called the Church Kleene ordinal and written ω1CK. Imagine the starting state of the game is two different ordinals that are both larger than ω1CK. If you’re moving first, then you have to determine which of the two ordinals is larger, in order to even them out. But this is not in general possible! So even if you go first and the two piles are different sizes, you might not be able to guarantee a win.

Suppose Marie is allowed uncomputable strategies, and Frank is only allowed computable strategies. Suppose further that the starting state involves two countable ordinals A and B, both larger than the Church-Kleene, and that the ordinals are expressed in some standard notation (so that you can’t write the same ordinal two different ways). There are a few cases.

Case 1: A = B, Marie goes first.
Marie decreases one of the two ordinals. Despite not being able to compute the order on the ordinals, Frank can just mimic her move. This will continue until Frank wins.

Case 2: A = B, Frank goes first.
Frank decreases one of the two ordinals, and Marie mimics. Marie eventually wins.

Case 3: A ≠ B, Marie goes first.
Marie can tell which of the ordinals is larger, and decreases that one to even out the two piles. Marie wins.

Case 4: A ≠ B, Frank goes first.
Frank can’t tell which of the ordinals is larger and can’t try to even them out, as doing so might result in an invalid move (trying to increase the smaller pile to the height of the larger one). So Frank does some random move, after which Marie is able to even out the two piles. Marie wins.

There’s a subtlety in Case 4, which is that Frank could gamble by guessing that B is the bigger ordinal and then decreasing it to A. If he has no other information, then half the time he’ll end up successfully evening them out, in which case he continues to win the game. But the other half of the time he’ll have made an invalid move. If we assume that players cannot run strategies that have some chance of choosing invalid moves (for instance, if each player has to be able to prove that their move is valid in advance), then Frank’s gamble would not be allowed and he would go on to lose.

Finally, here’s a starting state for a game of Transfinite Nim:

ω1, ℶ1

ω1 is the first uncountable ordinal, and ℶ1 is the first ordinal with continuum cardinality. Frank goes first. Does he have a winning strategy?

The answer to this question depends on whether ω1 = ℶ1, or in other words the Continuum Hypothesis! If the two are equal, then Frank can’t win, because he’s starting with two even piles. And if ω1 < ℶ1, then Marie can’t win, because Frank can decrease the ℶ1 pile to ω1.

If we suppose that the players must be able to prove a move’s validity in ZFC before playing that move, then the first player couldn’t decrease the ℶ1 pile to ω1. The first player still has to do something, and whatever he does will change the state to two ordinals that are comparable by ZFC. What about larger starting ordinals whose size comparison is independent of ZFC, like ω15 and ℶ15? If the new state after the first player’s move move also involves two ordinals whose size comparison is independent of ZFC, then the second player will also be unable to even them out. This continues until one of them eventually decreases a pile to an ordinal whose size is comparable by ZFC to the other pile. So the winner will depend on who knows more pairs of ordinals less than the starting values with values that ZFC can’t compare. In fact, each player wants to force the other player to make the values ZFC-comparable, so they’ll be able to even the piles out on their turn.

What if our players are allowed to use different proof systems from each other? Then adjudication of whether a move is valid requires that we fix some meta-theoretic proof system as our judge. For instance, suppose our meta-theory is ZFC + V=L (in which case ω1 does equal ℶ1). In this case, if a player is using a theory from which they can prove that ω1 < ℶ1, they might end up making a move that we judge as invalid, even though in their view it’s perfectly valid. Presumably then each player has to reason within their own theory about what is valid according to the judge’s meta-theory. But perhaps these judgements will be fallible! If so, then the victor may end up depending on who has a better theory of the judge’s meta-theory!