Hilbert-type Infinitary Logics

November 23, 2021November 24, 2021 ~ ~ Leave a comment

I want to describe a hierarchy of infinitary logics, and show some properties of one of these logics in particular.

First, a speedy review of first order logic. In the language of first order logic we have access to parentheses {(, )}, the propositional connectives {∧, ∨, ¬, →}, the equals sign {=}, quantifiers {∀, ∃}, and a countably infinite store of variables for quantification over {x₁, x₂, x₃, …}. These are the logical symbols common to any first-order language, but to complete the language we additionally specify a set of constant symbols {c₁, c₂, c₃, …}, function symbols {f₁, f₂, f₃, …}, and relation symbols {R₁, R_₂, R₃, …}. These sets can be any cardinality whatsoever. We can then define the set of grammatical sentences (“well-formed formulas”) of this language. We also interpret these symbols in a fairly straightforward way: a first-order structure has a universe of objects, and constants are assigned referents within that universe, function symbols are assigned n-ary functions on the universe, and relation symbols are assigned n-ary relations on the universe.

One consequence of the construction is that all of our sentences are finite, which puts an important cap on expressive power. Consider a language of arithmetic with constants for every natural number: {0, 1, 2, 3, …}. We might naturally want to say that the set of constants exhausts all the objects in our universe. But this takes an infinitely long sentence: ∀x (x=0 ∨ x=1 ∨ x=2 ∨ x=3 ∨ …). You might think you could be clever and find a finite expression of this idea, but it turns out that you can’t. (This is a consequence of Gödel’s first incompleteness theorem.)

So a natural extension of first-order-logic is to allow infinitely long sentences. For any two cardinal numbers α and β, define L_α,β to be first-order logic, but with conjunctions of any length < α allowed and blocks of quantifiers of any length < β. (Note that infinite conjunctions implies infinite disjunctions as well.) For example, L_ω,ω is ordinary first-order logic: conjunctions and quantifier blocks of any finite length. L_ω1,ω is first-order logic plus countably infinite conjunctions, but only finite quantifiers. L_ω,ω1 has finite conjunctions but countably infinite quantifiers. Question: what logic is this equivalent to?

Notice that countably infinitely many quantifiers use countably infinitely many variables, but if you only have finite conjunctions you can only use finitely many of them. So this ends up being equivalent to L_ω,ω. For this same reason, if β > α then L_α,β is no different from L_α,α.

L_ω1,ω is especially interesting to logicians, because it’s significantly stronger than first-order logic, but not so much stronger as to lose all nice proof-theoretic properties. In particular, there’s a sound and complete proof system for L_ω1,ω! It’s also quite simple: just the FOL proof system plus one new axiom and one infinitary deduction rule:

New axiom
For any m ∈ ω and any set of sentences {φ_n | n ∈ ω},
∧_n∈ω φ_n → φ_m

New inference rule
φ₀, φ₁, φ₂, … ⊢ ∧_n∈ω φ_n

If we allow deductions of any countably infinite (successor) ordinal type, then we get a sound and complete proof system. This means that for any countable set of L_ω1,ω-sentences Σ and any L_ω1,ω-sentence φ, you can deduce φ from Σ just in case Σ actually (ω₁,ω)-logically implies φ.

Infinite conjunctions give you a massive boost in expressive power going from FOL to L_ω1,ω. You can categorically define the natural numbers: Take Peano arithmetic and add to it the axiom: ∀x ∨_n∈ω (x = n). See why this works? We’ll refer back to this theory as PA*.

So L_ω1,ω is powerful enough to be able to categorically define the natural numbers. We might wonder if we can also categorically define all other countable structures. It turns out that while we can’t do that, we can do something slightly weaker. For any countable structure, there’s a single L_ω1,ω-sentence that defines it up to isomorphism among countable structures. The complexity of this sentence is a way of measuring the complexity of the structure, and has close connections to other measures of structure complexity. (The key word to learn more about this is Scott rank.)

What else can you do with L_ω1,ω? Here’s something really cool: Tarski’s theorem on the undefinability of truth tells us that in first-order logic you cannot define a truth predicate. But in L_ω1,ω, you can! Take a countable first-order language L. Add on the language of arithmetic (0, S, +, ×, ≤) to L. L is still countable, so enumerate all its sentences {φ₁, φ₂, φ₃, …}. Now, the sentence Tr(x) := ∨_n∈ω (x=n & φ_n) is a truth predicate: for any n, if φ_n is true then Tr(n) is true and if φ_n is false then Tr(n) is false.

You can express the idea that “infinitely many things satisfy φ(x)” with the following sentence:

∧_n∈ω ∀x₀∀x₁…∀x_n ∃y (φ(y) ∧ ∧_k∈ω(y ≠ x_k)).

Expanding this out:

∀x₀ ∃y (φ(y) ∧ y≠x₀)
∧
∀x₀∀x₁ ∃y (φ(y) ∧ y≠x₀ ∧ y≠x₁)
∧
∀x₀∀x₁∀x₂ ∃y (φ(y) ∧ y≠x₀ ∧ y≠x₁ ∧ y≠x₂)
∧
…

The first line says “there’s at least two things satisfying φ”, the second says “there’s at least three things satisfying φ”, and so on forever.

Finally, unlike FOL, L_ω1,ω is not compact. This means that there are sets of sentences without models, where every finite subset has a model. But it’s actually even worse than that! L_ω1,ω is strongly non-compact. You can have an unsatisfiable set of sentences, every countable subset of which has a model! Once again, this pretty remarkable fact has a simple proof:

Take the language of Peano arithmetic and add on ℵ₁-many constant symbols: {c_α | α < ω₁}. Now add to PA* the set of sentences {c_α ≠ c_β | α ≠ β, both in ω₁}. Let’s call this theory Σ. Every countable subset of Σ has a model, but Σ itself doesn’t. You can always take countably many constant symbols and assign them distinct referents in ℕ such that some natural numbers are left out. However, you can’t assign uncountably many distinct referents in ℕ while still leaving out some natural numbers, by a simple cardinality argument.

There’s an interesting twist here: Consider the set of sentences {c_α ≠ c_β | α ≠ β, both < ω₁^CK} ⊂ Σ, where ω₁^CK is the Church-Kleene ordinal, the smallest uncomputable ordinal. ω₁^CK is countable, so this is a countable set of sentences. This set of sentences has a model, but to obtain it you must choose a bijection from the set of constants {c_α | α < ω₁^CK} to the natural numbers. By the definition of ω₁^CK, this bijection is uncomputable! There are uncountably many countable ordinals above ω₁^CK (there are countably many computable ordinals, and uncountably many countable ordinals), so while every countable subset of Σ has a model, uncountably many of these models will be uncomputable!

Forcing and the Independence of CH (Part 2)

October 2, 2021April 16, 2026 ~ ~ 1 Comment

Part 1 here

Part 2: How big is 𝒫(ω)?

Now the pieces are all in place to start applying forcing to prove some big results. Everything that follows assumes the existence of a countable transitive model M of ZFC.

First, a few notes on terminology.

The language of ZFC is very minimalistic. All it has on top of the first-order-logic connectives is a single binary relation symbol ∈. Nonetheless, people will usually talk about theorems of ZFC as if it has a much more elaborate syntax, involving symbols like ∅ and ω and ℵ₁ and terms like “bijection” and “ordinal”. For instance, the Continuum Hypothesis can be written as “There’s a bijection between 𝒫(ω) and ℵ₁“, which is obviously using much more than just ∈. But these terms are all simply shorthand for complicated phrases in the primitive language of ZFC: for instance a sentence “φ(∅)” involving ∅ could be translated as “∃x (∀y ¬(y ∈ x) ∧ φ(x))”, or in other words “there’s some set x that is empty and φ(x) is true”.

Something interesting goes on when considering symbols like ∅ and ω and ℵ₁ that act like proper names. Each name picks out a unique set, but for some of these names, the set that gets picked out differs in different models of ZFC. For example, the interpretation of the name “ℵ₁” is model-relative; it’s meant to be the first uncountable cardinal, but there are countable models of ZFC in which it’s actually only countably large! If this makes your head spin, it did the same for Skolem: you can find more reading here.

On the other hand, the name “∅” always picks out the same set. In every model of ZFC, ∅ is the unique set that contains nothing at all. When a name’s interpretation isn’t model-relative, it’s called absolute. Examples include “∅” and “1,729”. When a name isn’t absolute, then we need to take care to distinguish between the name itself as a syntactic object, and the set which it refers to in a particular model of ZFC. So we’ll write “ℵ₁” when referring to the syntactic description of the first uncountable cardinal, and ℵ₁^M when referring to the actual set that is picked out by this description in the model M.

With that said, let’s talk about a few names that we’ll be using throughout this post:

In M, ω is the conventional name for the set that matches the description “the intersection of all inductive sets”, where inductive sets are those that contain the ordinal 0 and are closed under successor. In any transitive model of ZFC like M, ω is exactly ℕ, the set of natural numbers. Since we’re restricting ourselves to only considering transitive models, we can treat ω as if it’s unambiguous; its interpretation won’t actually vary across the models we’re interested in.

ω₁ corresponds to the description “the smallest uncountable ordinal”. This description happens to perfectly coincide with the description “the first uncountable cardinal” in ZFC, which has the name ℵ₁. But unlike with ω, different transitive models pick out different sets for ω₁ and ℵ₁. So in a model M, we’ll write ω₁^M for the set that M believes to be the smallest uncountable ordinal and ℵ₁^M for the first uncountable cardinal.

ω₂ is the name for the first ordinal for which there’s no injection into ω₁ (i.e. the first ordinal that’s larger than ω₁). Again, this is not absolute: ω₂^M depends on M (although again, ω₂^M is the same as ℵ₂^M no matter what M is). And in a countable model like those we’ll be working with, ω₂^M is countable. This will turn out to be very important!

𝒫(ω) is the name for the the set that matches the description “the set of all subsets of ω”. Perhaps predictably at this point, this is not absolute either. So we’ll have to write 𝒫(ω)^M when referring to M’s version of 𝒫(ω).

Finally, ZFC can prove that there’s a bijection from 𝒫(ω) to ℝ, meaning that this bijection exists in every model. Thus anything we prove about the size of 𝒫(ω) can be carried over to a statement about the size of ℝ. Model-theoretically, we can say that in every model M, |𝒫(ω)^M| = |ℝ^M|. It will turn out to be more natural to prove things about 𝒫(ω) than ℝ.

Okay, we’re ready to make the continuum hypothesis false! We’ll do this by choosing a partially ordered set (P, ≤) whose extension as a Boolean algebra (B, ≤) has the property that no matter what M-generic filter G in B you choose, M[G] will interpret its existence as proof that |𝒫(ω)| ≥ ℵ₂.

Making |𝒫(ω)| ≥ ℵ₂

Let P = { f ∈ M | f is a finite partial function from ω₂^M × ω to {0,1} }. Let’s have some examples of elements of P. The simplest is the empty function ∅. More complicated is the function {((ω+1, 13), 1)}. This function’s domain has just one element, the ordered pair (ω+1, 13), and this element is mapped to 1. The function {((14, 2), 0), ((ω₁^M•2, 2), 1)} is defined on two elements: (14, 2) and (ω₁^M•2, 2). And so on.

We’ll order P by reverse inclusion: f ≤ g iff f ⊇ g. Intuitively, f ≤ g if f is a function extension of g: f is defined everywhere that g is and they agree in those places. For instance, say f = {((ω, 12), 0)}, g = {((ω, 12), 0), ((13, 5), 1)}, h = {((ω, 12), 1)}, and p = {((ω, 11), 1)}. Check your understanding by verifying that (1) g ≤ f, (2) f and h are incomparable and have no common lower bound, and (3) f and p are incomparable but do have a common lower bound (what is it?).

The empty function ∅ is a subset of every function in P, meaning that ∅ is bigger than all functions. Thus ∅ is the top element of this partial order. And every f in P is finite, allowing a nice visualization what the partial order (P, ≤) looks like:

Now, G will be an M-generic ultrafilter in the Boolean extension (B, ≤) of (P, ≤). G is the set that we’re ultimately adding onto M, so we want to know some of its properties. In particular, how will we use G to show that |𝒫(ω)| = ℵ₂?

What we’re going to do is construct a new set out of G as follows: first take the intersection of P with G. (remember that G is an ultrafilter in B, so it doesn’t only contain elements of P). P⋂G will be some set of finite partial functions from ω₂^M × ω to {0,1}. We’ll take the union of all these functions, and call the resulting set F. What we’ll prove of F is the following:

(1) F := ⋃(P⋂G) is a total function from ω₂^M × ω to {0,1}
(2) For every α, β ∈ ω₂^M, F(α, •) is a distinct function from F(β, •).

A word on the second clause: F takes as input two things: an element of ω₂^M and an element of ω. If we only give F its first input, then it becomes a function from ω to {0,1}. For α ∈ ω₂^M, we’ll give the name F_α to the function we get by feeding α to F. Our second clause says that all of these functions are pairwise distinct.

Now, the crucial insight is that each F_α corresponds to some subset of ω, namely {n ∈ ω | F_α(n) = 1}. So F defines |ω₂^M|-many distinct subsets of ω. So in M[G], it comes out as true that |𝒫(ω)^M| ≥ |ω₂^M|. It’s also true in M[G] that |ω₂^M| = ℵ₂^M, and so we get that |𝒫(ω)^M| ≠ ℵ₁^M. This is ¬CH!

Well… almost. There’s one final subtlety: |𝒫(ω)^M| ≠ ℵ₁^M is what ¬CH looks like in the model M. For ¬CH to be true in M[G], it must be that that |𝒫(ω)^M[G]| ≠ ℵ₁^M[G]. So what if |𝒫(ω)^M| ≠ |𝒫(ω)^M[G]|, or ℵ₁^M ≠ ℵ₁^M[G]? This would throw a wrench into our proof: it would mean that M[G] believes that M’s version of 𝒫(ω) is bigger than M’s version of ℵ₁, but M[G] might not believe that its own version of 𝒫(ω) is bigger than its version of ℵ₁. This is the subject of cardinal collapse, which I will not be going into. However, M[G] does in fact believe that 𝒫(ω)^M = 𝒫(ω)^M[G] and that ℵ₁^M = ℵ₁^M[G].

Alright, so now all we need to do to show that M[G] believes ¬CH is to prove (1) that F is a total function from ω₂^M × ω to {0,1}, and (2) that for every α, β ∈ ω₂^M, F_α is distinct from F_β. We do this in three steps:

F is a function.
F is total.
For every α, β ∈ ω₂^M, F_α is distinct from F_β.

Alright so how do we know that F is a function? Remember that F = ⋃(P⋂G). What if some of the partial functions in P⋂G are incompatible with each other? In this case, their union cannot be a function. So to prove step 1 we need to prove that all the functions in P⋂G are compatible. This follows easily from two facts: that G is an ultrafilter in B and P is dense in B. The argument: G is an ultrafilter, so for any f, g ∈ P there’s some element f∧g ∈ B that’s below both f and g. All we know about f∧g is that it’s an element of B, but we have no guarantee that it’s also a member of P, our set of partial functions. In other words, we can’t say for sure that f∧g is actually a partial function from ω₂^M × ω to {0,1}. But now we use the fact that P is dense in B! By the definition of density, since f∧g is an element of B, P must contain some h ≤ f∧g. Now by transitivity, h ≤ f and h ≤ g. So f and g have a common function extension, meaning they must be compatible functions! Pretty magical right?

So F is a function. But how do we know that it’s total? We prove this by looking closely at the dense subsets of P. In particular, for any α ∈ ω₂^M and any n ∈ ω, define D_α,n := {f ∈ P | (α, n) ∈ dom(f)}. This is a dense subset of P. Why? Well, regardless of what α and n are, any function f in P is either already defined on (α, n), in which case we’re done, or it’s not, in which case f has a function extension with (α, n) in its domain. So from any element of P, you can follow the order downwards until you find a function with (α, n) in its domain. Since D_α,n is dense in P and G is M-generic, G must have some element in common with D_α,n. Thus G contains an element f of P that has (α, n) in its domain. This f is one of the functions that we union over to get F, so F must have (α, n) in its domain as well! And since α and n were totally arbitrary, F must be total.

Finally, why are the “F_α”s pairwise distinct? Again we construct dense subsets for our purposes: for any α, β ∈ ω₂^M, define D_α,β := {f ∈ P | f_α ≠ f_β}. This is clearly dense (you can always extend any f in P to make f_α and f_β disagree somewhere). So G contains an element f of P for which f_α ≠ f_β. And thus it must also be true of F that F_α ≠ F_β!

And we’re done! We’ve shown that once we add G to M, we can construct a new set F = ⋃(P⋂G) such that F encodes |ω₂^M|-many distinct subsets of ω, and thus that M[G] ⊨ |𝒫(ω)| ≥ ℵ₂.

Making |𝒫(ω)| ≥ ℵ₄₂₀

What’s great is that this argument barely relied on the “2” in ω₂^M. We could just as easily have started with P = {f ∈ M | f is a finite partial function from ω₄₂₀^M × ω to {0,1}}, constructed an M-generic filter G in the Boolean extension of P, then defined F to be ⋃(P⋂G).

G is still an ultrafilter in B and P is still dense in B, so F is still a function.

D_α,n := {f ∈ P | (α, n) ∈ dom(f)} is still a dense subset of P for any α ∈ ω₄₂₀^M and any n ∈ ω, so F is still total.

And D_α,β := {f ∈ P | f_α ≠ f_β} is still a dense subset of P for any α, β ∈ ω₄₂₀^M, so all the “F_α”s are pairwise distinct.

And now F encodes |ω₄₂₀^M|-many distinct subsets of ω! The final step, which I’m going to skip over again, is showing that M[G] believes that |ω₄₂₀^M| = |ω₄₂₀^M[G]|, i.e. that cardinal collapse doesn’t occur.

And there we have it: this choice of P gives us a new model M[G] of ZFC that believes that |𝒫(ω)| ≥ ℵ₄₂₀! Note how easy and quick this argument is now that we’ve gone through the argument for how to make M[G] believe |𝒫(ω)| ≥ ℵ₂. This is a great thing about forcing: once you really understand one application, other applications become immensely simpler and easier to understand.

Making |𝒫(ω)| = ℵ₁

Okay, we’ve made CH false. Now let’s make it true! This time our choice for (P, ≤) will be the following:

P = {f ∈ M | f is an M-countable partial function from ω₁^M to 𝒫(ω)^M}. Once more we order by reverse inclusion: f ≤ g iff f ⊇ g.

A crucial thing to notice is that I’ve merely required the functions in P to be M-countable, rather than countable. For a set to be M-countable is for there to exist an injection in M from set to ω. Any M-countable set is countable, but some countable sets may not be M-countable (like M itself!). In particular, we know that ω₁^M is actually a countable set, meaning that if we required true countability instead of just M-countability, then P would include some total functions! But M has no injection from ω₁^M to ω, so no function in P can be total. This will be important in a moment!

We get our M-generic ultrafilter G and define F := ⋂(P⋂G). Now we want to show that F is a total and surjective function from ω₁^M to 𝒫(ω). As before, we proceed in three steps:

F is a function
F is total
F is surjective

Step 1: G is an ultrafilter and P is dense in G, so the same argument works to show that all elements of P⋂G are compatible: (1) ultrafilter implies that any f, g in P⋂G have a least upper bound f∧g in B, (2) density of P in G implies that f∧g has a lower bound h in P, so (3) f and g have a common function extension . Thus F is a function.

Step 2: D_α := {f ∈ P | α ∈ dom(f)} is dense in P for any α ∈ ω₁^M, so F is total.

Step 3: D_A := {f ∈ P | A ∈ image(f)} is dense in P for any A ∈ 𝒫(ω)^M. Why? No f in P is total, so any f in P can be extended by adding one more point (α, A) where α ∉ dom(f). Thus A ∈ image(F) for any A ∈ 𝒫(ω), so F is surjective.

So F is a total surjective function from ω₁^M to 𝒫(ω)^M, meaning that in M[G] it’s true that |𝒫(ω)^M| ≤ ℵ₁^M. And since ZFC proves that |𝒫(ω)| > ℵ₀, it follows that M[G] ⊨ |𝒫(ω)| = ℵ₁.

You might have noticed that each of these arguments could just as easily have been made if we had started out with defining P as the finite partial functions in M from ω₁^M to 𝒫(ω)^M. This is true! If we had done this, then we still would have ended up proving that F was a total surjective function from ω₁^M to 𝒫(ω)^M, and therefore that |𝒫(ω)^M| = ℵ₁^M. However, this is where cardinal collapse rears its ugly head: the final step is to prove that |𝒫(ω)^M| = |𝒫(ω)^M[G]| and that ℵ₁^M = ℵ₁^M[G]. This requires that P be the countable partial functions rather than just the finite ones.

And with that final caveat, we see that M[G] believes that |𝒫(ω)| = ℵ₁.

Making |𝒫(ω)| ≤ ℵ₃₁₄

Let P = {f ∈ M | f is an M-countable partial function from ω₃₁₄^M to 𝒫(ω)} and order by reverse inclusion: f ≤ g iff f ⊇ g.

⋂(P⋂G) is again a total and surjective function from ω₃₁₄^M to 𝒫(ω), following the exact same arguments as in the previous section. Cardinal collapse doesn’t occur here either, so M[G] believes that |𝒫(ω)| ≤ ℵ₃₁₄.

✯✯✯

We’ve proven the independence of the Continuum Hypothesis from ZFC! We’ve done more: we’ve also shown how to construct models of ZFC in which we have lots of control over the size of 𝒫(ω). Want a model of ZFC in which |𝒫(ω)| is exactly ℵ₁₇₂₉? Just force with the right P and you’re good to go! There’s something a bit unsatisfactory about all this though, which is that in each application of forcing we’ve had to skim over some details about cardinal collapse that were absolutely essential to the proof going through. I hope to go into these details in a future post to close these last remaining gaps.

Transfinite Nim: uncomputable games and games whose winner depends on the Continuum Hypothesis

August 15, 2020December 15, 2021 ~ ~ 2 Comments

In the game of Nim, you start with piles of various (whole number) heights. Each step, a player chooses one pile and shrinks it by some non-zero amount. Once a pile’s height has been shrunk to zero, it can no longer be selected by a player for shrinking. The winner of the game is the one that takes the last pile to zero.

Here’s a sample game of Nim:

Starting state
3, 2
After Frank’s move
2, 2
After Marie’s move
2, 1
After Frank’s move
0, 1
After Marie’s move
0, 0

Marie takes the last pile to zero, so she is the winner. Frank’s second-to last move was a big mistake; by reducing the first pile from 2 to 0, he left the only remaining pile free to be taken by Marie. In a game of Nim, you should never leave only one pile remaining at the end of your turn. If Frank had instead shrunk the first pile from 2 to 1, then the state of the piles would be (1, 1). Marie would be forced to shrink one of the two piles to zero, leaving Frank to take the final pile and win.

The strategy of Nim with two piles is extremely simple: in your turn you should always even out the two piles if possible. This is only possible if the heights are different at the start of your turn. See if you can figure out why this strategy guarantees a win!

Transfinite Nim is a version of Nim where the piles are allowed to take infinite ordinal values. So for instance, a game might have the following starting position:

Starting state
ω² + ω, ω₁ + ε₀

If Marie is moving first, then can she guarantee a win? What move should she make?

It turns out that the strategy for two-pile Transfinite Nim is exactly the same as for two-pile Finite Nim. Marie has a guaranteed win, because the two piles are different values. Each move she’ll just even the piles out. So for her first move, she should do the following:

Starting state
ω² + ω, ω₁ + ε₀
After Marie’s move
ω² + ω, ω² + ω

No matter what Frank does next, Marie can just “copy” that move on the other pile, guaranteeing that Marie always has a move as long as Frank does. This proves that Marie must have the last move, and therefore win.

One important feature of Transfinite Nim is that even though we’re dealing with infinitely large piles, every game can only last finitely long. In other words, Frank has no strategy for delaying his loss infinitely long, and thus forcing a sort of “stalemate by exhaustion.” This is because the ordinals are well-ordered, and any decreasing sequence of well-ordered items must terminate. (Why? Just consider the definition of a well-ordered set: every subset has a least element. If the game were to continue infinitely long, each step decreasing the state but never terminating, then the sequence of states would be a subset of the ordinals which has no least element!)

Although the strategy of Transfinite Nim is in one sense no more interesting than Finite Nim, the game does have some interesting features that it inherits from the ordinals. For instance, there are sets of ordinal numbers such that the ordering between them is uncomputable. For such sets, the ability to compute the winning strategy is called into question.

For instance, the set of all countable ordinals is uncomputable. The quick proof is that there are uncountably many countable ordinals – otherwise in ZFC the set of countable ordinals would itself be a countable ordinal and would thus contain itself – and any Turing machine can only compare countably many things. However, there are also uncomputable ordinals that are countable! If α is a countable ordinal, then we can find some bijection (not necessarily order-preserving) between α and ω, meaning that we can meaningfully ask if a Turing machine can compare any two of α’s elements (each represented by some natural number). And for an uncomputable countable ordinal, we know that no Turing machine can successfully compute its order type.

The smallest uncomputable ordinal (which, in ZFC, is exactly the set of all computable ordinals) is called the Church Kleene ordinal and written ω₁^CK. Imagine the starting state of the game is two different ordinals that are both larger than ω₁^CK. If you’re moving first, then you have to determine which of the two ordinals is larger, in order to even them out. But this is not in general possible! So even if you go first and the two piles are different sizes, you might not be able to guarantee a win.

Suppose Marie is allowed uncomputable strategies, and Frank is only allowed computable strategies. Suppose further that the starting state involves two countable ordinals A and B, both larger than the Church-Kleene, and that the ordinals are expressed in some standard notation (so that you can’t write the same ordinal two different ways). There are a few cases.

Case 1: A = B, Marie goes first.
Marie decreases one of the two ordinals. Despite not being able to compute the order on the ordinals, Frank can just mimic her move. This will continue until Frank wins.

Case 2: A = B, Frank goes first.
Frank decreases one of the two ordinals, and Marie mimics. Marie eventually wins.

Case 3: A ≠ B, Marie goes first.
Marie can tell which of the ordinals is larger, and decreases that one to even out the two piles. Marie wins.

Case 4: A ≠ B, Frank goes first.
Frank can’t tell which of the ordinals is larger and can’t try to even them out, as doing so might result in an invalid move (trying to increase the smaller pile to the height of the larger one). So Frank does some random move, after which Marie is able to even out the two piles. Marie wins.

There’s a subtlety in Case 4, which is that Frank could gamble by guessing that B is the bigger ordinal and then decreasing it to A. If he has no other information, then half the time he’ll end up successfully evening them out, in which case he continues to win the game. But the other half of the time he’ll have made an invalid move. If we assume that players cannot run strategies that have some chance of choosing invalid moves (for instance, if each player has to be able to prove that their move is valid in advance), then Frank’s gamble would not be allowed and he would go on to lose.

Finally, here’s a starting state for a game of Transfinite Nim:

ω₁, ℶ₁

ω₁ is the first uncountable ordinal, and ℶ₁ is the first ordinal with continuum cardinality. Frank goes first. Does he have a winning strategy?

The answer to this question depends on whether ω₁ = ℶ₁, or in other words the Continuum Hypothesis! If the two are equal, then Frank can’t win, because he’s starting with two even piles. And if ω₁ < ℶ₁, then Marie can’t win, because Frank can decrease the ℶ₁ pile to ω₁.

If we suppose that the players must be able to prove a move’s validity in ZFC before playing that move, then the first player couldn’t decrease the ℶ₁ pile to ω₁. The first player still has to do something, and whatever he does will change the state to two ordinals that are comparable by ZFC. What about larger starting ordinals whose size comparison is independent of ZFC, like ω₁₅ and ℶ₁₅? If the new state after the first player’s move move also involves two ordinals whose size comparison is independent of ZFC, then the second player will also be unable to even them out. This continues until one of them eventually decreases a pile to an ordinal whose size is comparable by ZFC to the other pile. So the winner will depend on who knows more pairs of ordinals less than the starting values with values that ZFC can’t compare. In fact, each player wants to force the other player to make the values ZFC-comparable, so they’ll be able to even the piles out on their turn.

What if our players are allowed to use different proof systems from each other? Then adjudication of whether a move is valid requires that we fix some meta-theoretic proof system as our judge. For instance, suppose our meta-theory is ZFC + V=L (in which case ω₁ does equal ℶ₁). In this case, if a player is using a theory from which they can prove that ω₁ < ℶ₁, they might end up making a move that we judge as invalid, even though in their view it’s perfectly valid. Presumably then each player has to reason within their own theory about what is valid according to the judge’s meta-theory. But perhaps these judgements will be fallible! If so, then the victor may end up depending on who has a better theory of the judge’s meta-theory!

What ordinals can be embedded in ℚ and ℝ?

May 24, 2020April 16, 2026 ~ ~ Leave a comment

Last time we talked a little bit about some properties of the order type of ℚ. I want to go into more detail about these properties, and actually prove them to you. The proofs are nice and succinct, and ultimately rest heavily on the density of ℚ.

Every Countable Ordinal Can Be Embedded Into ℚ

Take any countable well-ordered set (X, ≺). Its order type corresponds to some countable ordinal. Since X is countable, we can enumerate all of its elements (the order in which we enumerate the elements might not line up with the well-order ≺). Let’s give this enumeration a name: (x₁, x₂, x₃, …).

Now we’ll inductively define an order-preserving bijection from X into ℚ. We’ll call this function f. First, let f(x₁) be any rational number. Now, assume that we’ve already defined f(x₁) through f(x_n-1) in such a way as to preserve the original order ≺. All we need to do to complete the proof is to assign to f(x_n) a rational number such that the ≺ is still preserved.

Here’s how to do that. Split up the elements of X that we’ve already constructed maps for as follows: A = {x_i | x_i ≺ x_n} and B = {x_i | x_i > x_n}. In other words, A is the subset of {x₁, x₂, …, x_n-1} consisting of elements less than x_n and B is the subset consisting of elements greater than x_n. Every element of B is strictly larger than every element of A. So we can use the density of the rationals to find some rational number q in between A and B! We define f(x_n) to be this rational q. This way of defining f(x_n) preserves the usual order, because by construction, f(x_n) < f(x_i) for any i less than n exactly in the case that x_n < x_i.

By induction, then, we’ve guaranteed that f maps X to ℚ in such a way as to preserve the original order! And all we assumed about X was that it was countable and well-ordered. This means that any countable and well-ordered set can be found within ℚ!

No Uncountable Ordinals Can Be Embedded Into ℝ

In a well-ordered set X, every non-maximal element of X has an immediate successor (i.e. a least element that’s greater than it.) Proof: Take any non-maximal x ∈ X. Consider the subset of X consisting of all elements greater than x: {y ∈ X | x < y}. This set is not empty because α is not maximal. Any non-empty subset of a well-ordered set has a least element, so this subset has a least element. I.e, there’s a least element greater than x. Call this element S(x), for “the successor of x”,

Now, take any well-ordered subset X ⊆ ℝ (with the usual order). Since it’s well-ordered, every element has an immediate successor (by the previous paragraph). We will construct a bijection that maps X to ℚ, using the fact that ℚ is dense in ℝ (i.e. that there’s a rational between any two reals). Call this function f. To each element x ∈ X, f(x) will be any rational such that x < f(x) < S(x). This maps every non-maximal element of X to a rational number. To complete this, just map the maximal element of X to any rational of your choice. There we go, we’ve constructed a bijection from X to ℚ!

The implication of this is that every well-ordered subset of the reals is only countably large. In other words, even though ℝ is uncountably large, we can’t embed uncountable ordinals inside it! The set of ordinals we can embed within ℝ is exactly the set of ordinals we can embed within ℚ! (This set of ordinals is exactly ω₁: the set of all countable ordinals).

Final Note

Notice that the previous proof relied on the fact that between any two reals you can find a rational. So this same proof would NOT go through for the hyper-reals! There’s no rational number (or real number, at that!) in between 1 and 1+ϵ. And in fact, you CAN embed ω₁ into the hyperreals! This is especially interesting because the hyperreals have the same cardinality as the reals! So the embeddability of ω₁ here is really a consequence of the order type of the hyperreals being much larger than the reals. And if we want to take a step towards even crazier extensions of ℝ, EVERY SINGLE ordinal can be embedded within the surreal numbers!