The Mindblowing Goodstein Sequences

Goodstein sequences are amazingly strange and unintuitive. I want to tell you all their unusual properties up-front, but I also don’t want to spoil the surprise. So let me start by defining the Goodstein sequence of a number.

We start by defining the hereditary base-n representation of a number m. We first write m as a sum of powers of n (e.g. for m = 266 and n = 2, we write 266 = 28 + 23 + 21). Now we write each exponent as a sum of powers of n. (So now we have 266 = 223 + 22+1 + 21). And we continue this until all numbers in the representation are ≤ n. For 266, our representation stabilizes at 266 = 222+1 + 22+1 + 21.

Now we define Gn(m) as follows: if m = 0, then Gn(m) = 0. Otherwise, Gn(m) is the number produced by replacing every n in the hereditary base-n representation of m by n+1 and subtracting 1. So G2(266) = G2(222+1 + 22+1 + 21) = 333+1 + 33+1 + 31 – 1 ≈ 4.4 × 1038.

Finally, we define the Goodstein sequence for a number n as the following:

a0 = n
a1 = G2(a0)
a2 = G3(a1)
a3 = G4(a2)
And so on.

Let’s take a small starting number and look at its Goodstein sequence. Suppose we start with 4.

a0 = 4 = 22
a1 = 33 – 1 = 2⋅32 + 2⋅3 + 2 = 26
a2 = 2⋅42 + 2⋅4 + 1 = 41
a3 = 2⋅52 + 2⋅5 = 60
a4 = 2⋅62 + 2⋅6 – 1 = 2⋅62 + 6 + 5 = 83
a5 = 2⋅72 + 7 + 4 = 109

Now, notice that the sequence always seems to increase. And this makes sense! After all, each step we take, we increase the base of each exponent by 1 (a big increase in the value) and only subtract 1. Here’s a plot of the first 100 values of the Goodstein sequence of 4:

And for further verification, here are the first million values of the Goodstein sequence of 4:

Looks like exponential growth to me! Which, again, is exactly what you’d expect. In addition, the higher the starting value, the more impressive the exponential growth we see. Here we have the first 100 values of the Goodstein sequence of 6:

Notice how after only 100 steps, we’re already at a value of 5 × 1010! This type of growth only gets more preposterous for larger starting values. Let’s look at the Goodstein sequence starting with 19.

a0 = 19
a1 ≈ 7 × 1012
a2 ≈ 1.3 × 10154
a3 ≈ 1.8 × 102184
And so on.

We can’t even plot a sequence like this, as it’ll just look like two perpendicular lines:

Alright, so here’s a conjecture that hopefully you’re convinced of: Goodstein sequences are always growing. In fact, let’s conjecture something even weaker: Goodstein sequences never terminate (i.e. never go to zero).

We’re finally ready for our first big reveal! Both of these conjectures are wrong! Not only do some Goodstein sequences terminate, it turns out that EVERY Goodstein sequence eventually terminates!

This is pretty mind-blowing. What about all those graphs I showed? Well, the sequences do increase for a very very long time, but they eventually turn around. How long is eventually? The Goodstein sequence of 4 takes 3 × 2402,653,210 – 1 steps to turn around!! This number is too large for me to be able to actually show you a plot of the sequence turning around. But we do know that once the Goodstein sequence stops increasing, it actually stays fixed for 3 × 2402,653,209 steps before finally beginning its descent to zero. Heuristically, it would look something like this:

(Why does it stay constant, incidentally? The reason is that when we get to a number whose hereditary base-B notation just looks like B + n for some n < B, we increase B by 1 and decrease n by 1 each step. This doesn’t change the sequence’s value! So it stays constant until we get to n = 0. At that point, we decrease by 1 until we descend all the way to zero.)

So, how do we prove this? We do it using infinite ordinals. (Now you see why this topic caught my eye recently!) The plan is to associate with Goodstein sequence a “matching sequence” of ordinals. The way to do this is pretty simple: for a number written in hereditary base-n notation, just replace every n in that representation with ω. Now you have an infinite ordinal!

So, for example, here’s the first few numbers in the Goodstein sequence of 4, along with their translation to ordinals:

4 = 22 becomes ωω
26 = 33 – 1 = 2⋅32 + 2⋅3 + 2 becomes ω2⋅2 + ω⋅2 + 2
41 = 2⋅42 + 2⋅4 + 1 becomes ω2⋅2 + ω⋅2 + 1
60 = 2⋅52 + 2⋅5 becomes ω2⋅2 + ω⋅2
83 = 2⋅62 + 2⋅6 – 1 = 2⋅62 + 6 + 5 becomes ω2⋅2 + ω + 5
109 = 2⋅72 + 7 + 4 becomes ω2⋅2 + ω + 4

Examine this sequence of ordinals for a couple of minutes. You might notice something interesting: at each step, the ordinal is decreasing! This turns out to always hold true. Let’s see why.

Notice that there are two cases: either the ordinal is a successor ordinal (like ω2 + 4), or it’s a limit ordinal (like ω2 + ω). If it’s a successor ordinal, then the next ordinal in the sequence is just its predecessor. (So ω2 + 4 becomes ω2 + 3). This is obviously a smaller ordinal.

What if it’s a limit ordinal? Then the smallest limit ordinal in its expanded representation drops to a smaller limit ordinal and some finite number is added. So for instance, in the ordinal ω2 + ω⋅2, ω⋅2 will drop to ω and some finite number will be added on depending on which step in the sequence you’re at. So we’ll end up with ω2 + ω + n, for some finite n. This ordinal is smaller than the original one, because we’ve made an infinitely large jump downwards (from one limit ordinal to a lower limit ordinal), and only increased by a finite amount.

So in either case, we go to a smaller ordinal. And now the key realization is that every decreasing sequence of ordinals MUST terminate! (This is one of my favorite facts about ordinals, by the way. Even though we’re looking at arbitrarily large infinite ordinals, it’s still the case that there is no decreasing sequence that goes on forever. Whenever we hit a limit ordinal, we have to make an infinitely large jump downwards to get to the next element in the sequence, as the ordinal has no predecessor.)

So we’ve paired every Goodstein sequence with a decreasing sequence of ordinals. And every decreasing sequence of ordinals eventually terminates. So the Goodstein sequence must terminate as well! (One easy way to see this is to notice that every value in a Goodstein sequence is ≤ the ordinal assigned to it. Another way is to notice that the only number that could possibly be assigned to the ordinal 0 is 0 itself. So when the decreasing sequence of ordinals hits 0, as it must eventually, so must the Goodstein sequence) And that’s our proof!

That Goodstein sequences always terminate is the big flashy result for this post. But there’s much more that’s cool about them. For instance, the proof that we just used did not rely on the fact that we just increased the base by 1 at each step. In fact, even if we update the base by an arbitrarily large function at each stage, the sequences still must terminate!

Even cooler, the proof that all Goodstein sequences terminate cannot be formalized in Peano arithmetic! And in fact, no proof can be found of this theorem using PA. Laurie Kirby and Jeff Paris proved in 1982 that the statement that Goodstein sequences always terminate could be reduced to a theorem of Gentzen from which the consistency of PA could be deduced. So if PA could prove that Goodstein sequences always terminate, then it could prove its own consistency! Gödel’s second incompleteness theorem tells us that this cannot be, and thus assuming that PA really is consistent, it cannot prove that that these sequences all terminate.

This was historically the third result to be found independent of PA, and it was the first that was purely number-theoretic in character (as opposed to meta-mathematical, like the Gödel sentences). It gives us a very salient way of expressing the limitations of Peano arithmetic: simply write a program that computes the Goodstein sequence of any given input, terminating only when it reaches 0 and outputting the length of the sequence. (Code Golf Stack Exchange shows that this can be done using just 77 characters of Haskell! And here’s a nice way to do it in Python). Now, it happens to be true that this program will always terminate for any natural number input (if run on sufficiently powerful hardware). But Peano arithmetic cannot prove this!

As a final fun suprise, let’s define G(n) to be the length of the Goodstein sequence starting at n. As we’ve already seen, this sequence gets really big really quickly. But exactly how quickly does it grow? Turns out that it grows much much faster than other commonly known fast-growing computable functions. For instance, G(n) grows much faster than Ackermann’s function, and G(12) is already greater than Graham’s number!

Wacky Non-Standard Models of PA

Peano arithmetic is unable to pin down the natural numbers, despite its countable infinity of axioms. In fact, assuming its consistency Peano arithmetic has models of every cardinality, meaning that as far as PA is aware, there might be uncountably many real numbers. (If PA is not consistent then it has no models.) I want to take a look at these non-standard models of PA, especially the countable ones. A natural question is, how many countable non-standard models are there alongside the standard models?

Assuming the consistency of PA (which I will leave out from now on, as it’s assumed in all that follows), there are continuum many non-isomorphic countable models. That’s a lot! That means that there’s a distinct countable non-standard model of arithmetic for every real number. This is our first result: the number of nonstandard models of PA of cardinality ℵ0 is 20. Interestingly, this result generalizes! For every infinite cardinal κ, there are 2κ non-isomorphic nonstandard models of cardinality κ. That’s a lot of nonstandard models! In fact, since any model of cardinality κ involves a specification of some number of constants, binary relations over κ and functions from κ to κ, we know that the maximum number of models of cardinality κ is 2κ. So in this sense, Peano arithmetic has as many nonstandard models of each cardinality as it can have!

Let’s take a closer look at the countable non-standard models. It turns out that we can say a lot about their order type. Namely, all countable non standard models of PA have order type ω + (ω*+ω)·η. What does this notation mean? Let me break down each of the order types involved in that formula:

  • ω: order type of the naturals
  • ω*: order type of the negative integers
  • η: order type of the rationals

So ω*+ω is the order type of the integer line, and (ω*+ω)·η is the order type of a structure that resembles an integer line for each rational number. Thus, every countable non-standard model of PA looks like a copy of natural numbers followed by as many copies of integers as there are rational numbers. The order on this structure is lexicographic: two nonstandards on the same integer line are judged according to their position on the integer line, and two nonstandards on different integer lines are judged according to the position of these integer lines on the rational line. It’s not the easiest thing to visualize, but here’s my attempt:

Visualization of the order type of countable nonstandard models of Peano arithmetic

So if all countable non-standard models have the same order type, then where do they differ? It turns out that they differ in the details of how addition and multiplication work on the non-standards. (After all, a model of PA is defined by the size of its universe and its interpretation of ≤, +, and ×. Same order type means same ≤, so what’s left to vary is + and ×.) In each of the models, + and × work exactly like normal on the naturals. And + and × must operate on the non-standards in such a way as to maintain the truth of all the axioms of PA. So, for instance, since PA can prove that 2x = x + x, the same must be true for nonstandard x. And so on. But even given these restrictions, there are still uncountably many ways to define + and × on the non-standards.

What’s more, we have a theorem known as the Tennenbaum Theorem, which tells us that it’s impossible to give recursive definitions to + and × in non-standard models. Said more simply, addition and multiplication are uncomputable in every non-standard model of arithmetic!

One thing I remain unsure of is how the nonstandard models of Peano arithmetic compare to the structure of ω in nonstandard models of ZFC. We know that there must be models of ZFC where ω is nonstandard, by the compactness theorem (define ZFC* to be ZFC with an extra constant c, with the extra axioms “c ∈ ω”, “c ≠ 0”, “c ≠ 1”, “c ≠ 2”, and so on. ZFC* has a model by compactness, and this model is also a model of ZFC by monotonicity.) But is ZFC “better” at ruling out nonstandard models of the naturals than PA is? Or is there a nonstandard ω for every nonstandard model of Peano arithmetic?

The Continuum Hypothesis is False

Before I read this paper, I thought that the continuum hypothesis probably did not have a truth value, and that ZFC + CH and ZFC + ¬CH are systems that equally well capture what it means to be a set. Now I think that the continuum hypothesis is actually false, and that ZFC + CH would not be an adequate representation of our intuitive notion of sets.

The argument in the paper is not a formal proof of ¬CH. Instead, it’s a refutation of the following form: “The continuum hypothesis is equivalent in ZFC to this statement which is obviously false. So the continuum hypothesis must be false.”

So what is this obviously false statement? It is a claim that an event that happens with 100% probability is impossible. More exactly, for any function that assigns to each real number a countable set of real numbers, the claim is that it’s impossible for there exist two real numbers x and y such that x is not in the countable set assigned to y and y is not in the countable set assigned to x.

Why is this obviously false? Well, imagine throwing darts at the real line, so as to randomly select a real each time. If we throw one dart, there is a probability of zero that we would hit a natural number. Why? Because the cardinality of the naturals is less than the cardinality of the reals. By the same token, there is a probability of zero that our dart hits a rational number (as the rationals are also only countably large).

Now, if we throw a second dart, the probability that we hit a real number that’s some rational distance away from the first real number is also zero. Again, this is just because we’re considering a countable set of candidates from an uncountable set of possibilities. So by the same token, if we assign to the first dart ANY countable set of real numbers, then there is probability zero that the second dart lands within that countable set. And if we also assign a countable set of real numbers to the second dart, there is also probability zero that the first dart happened to fall within the countable set assigned to the second.

So, to summarize in more general terms: for any function f that maps each real number to a countable set of real numbers, a random choice of x and y will give us that x ∉ f(y) and y ∉ f(x) with 100% probability. Now, the statement that is equivalent to the falsity of the continuum hypothesis (i.e. the existence of cardinalities between |ω| and |𝒫(ω)|) is that it’s possible to find two real numbers x and y that are not in each others’ countable sets.

Notice how weak this is! By randomly choosing real numbers, we will end up with two that are not in each others’ countable sets with probability 100%. But we’re not even saying that the probability 100% event will always happen! We’re just saying that it’s POSSIBLE that it happens! I.e. imagine that by some miraculous coincidence it happens that the first two real numbers we choose are within each others’ countable sets (technically, only one of them needs to be in the other’s countable set). After all, probability-zero events do happen. Not a problem! Just pick two new real numbers! And if this fails, pick again!

Pretty cool right? Well, it gets even cooler. We just saw that from the statement “for any function f that maps reals to countable sets of reals, there exists x and y such that x ∉ f(y) and y ∉ f(x)”, you can prove that |𝒫(ω)| > |ω1| (where the cardinality of ω1 is by definition the next cardinality after ω). We can now consider THREE reals x, y, and z. Consider mappings from two real numbers to a countable set of reals. We can now state that for any such mapping, none of the three reals is in the countable set assigned to the others. And this entails that we can prove that |𝒫(ω)| > |ω2|! In other words, we can prove that there are at least TWO cardinalities in between the reals and the naturals!

And for ANY n, if we take the statement that there are n reals such that none is inside the countable set mapped to by the others, then we can use this to prove that |𝒫(ω)| > |ωn-1|. Notice: whatever n we choose, there’s still probability 100% that the n reals don’t “overlap” in this way! So for any n, we’re again only claiming that it’s possible for an event with 100% probability to happen. And if we accept this, then this entails that there’s not just a single cardinality between |ω| and |𝒫(ω)|. There’s definitely at least two. And at least three. And so on for ANY finite value of n. Which means that there are infinitely many cardinalities between the naturals and the reals!

Ok, this is a good point to stop reading if you don’t want to know the technical details of how EXACTLY to prove the result and are satisfied to just know what the result says. But if not, then read on! (The proof is actually really short and simple, though it relies on some background knowledge about ordinals.)

I’ll prove the result for n = 2. Suppose that |𝒫(ω)| = ℵ1 = |ω1|. Then by the axiom of choice, there’s some well-ordering of the reals of order type ω1. Call this well-ordering <. We define f(x) to be {y | y ≤ x}. Notice that since ω1 is the FIRST uncountable ordinal, f(x) is always a countable set, no matter what x is! Then, since for any two reals, x ≤ y or y ≤ x, this entails that for any two reals, x ∈ f(y) or y ∈ f(x).

And we’re done! We’ve shown that if the continuum hypothesis is true, then there exists a function f such that for any two reals, x ∈ f(y) or y ∈ f(x). So if we think that there is no such function, then we must deny the continuum hypothesis.

Kolmogorov Complexity, Undecidability, and Incompleteness

Fix some programming language. For any string x, we define K(x) (the Kolmogorov complexity of x) to be the bit length of the shortest program that outputs x and halts, when fed the empty string as input. I’m going to prove some neat results about Kolmogorov complexity.

K(x) ≤ |x|

Our first result: for every x, K(x) ≤ |x|. This is because in the worst case, you can always just literally have your program write out the string. (Technically, there might be a constant overhead, like writing “print()” in Python, but we will subtract out this constant overhead from our definition of K(x) so that the above equality holds.)

Most binary strings have high Kolmogorov complexity

In particular, at most 1/2k of strings have K(x) < |x| – k

  • Half of all binary strings have K(x) < |x| – 1
  • A quarter have K(x) < |x| – 2
  • One eighth have K(x) < |x| – 3

Here’s the proof. Define A to be the set of all binary strings of length n with K(x) ≤ n – k. Define B to be the set of all binary strings of length n. So |A|/|B| is the proportion of all length-n strings with K(x) < n – k.

Now, |B| = 2n. And there’s an injection from A to programs of binary length < n – k. So |A| ≤ the number of programs of binary length < n – k. But this is at most 2n-k.

So |A|/|B| < 2n-k/2n = 1/2k.

This means that if we’re looking at strings of length 100, the proportion of them that have K(x) < 90 is at most 1/210 = 1/1024 ≈ 0.1%. Said another way, more than 99.9% of length 100 strings have Kolmogorov complexity 90 or up (i.e. can only be compressed by at most 10 bits).

K(x) is uncomputable

There is no program that takes in a pair (x, k) and halts, outputting yes if K(x) = k and no if K(x) ≠ k. Let’s prove it.

Suppose that P is such a program. We use P to create a program Q that takes as input a number k and generates the first binary string y such that K(y) = k. Here’s how Q works: Enumerate all programs in lexicographic order (0, 1, 00, 01, 10, 11, 000, and so on). For each program x, apply P to check if K(x) = k. Output the first one for which the answer is yes.

Let’s give the name y to the output of Q on input k. Now, K(y) = k, because P says so. So the shortest program that outputs y has bit-length k. But notice that Q is also a program that outputs y! So the bit-length of Q has to be at least k. But what is the bit-length of Q? Well, Q contains P as a subroutine, and writing Q requires that you also write out k. So |Q| = |P| + |k| + c (c is some constant overhead). But |k| is just the number of bits required to write out the number k, which is at most log2(k).

So, K(y) = k ≤ |Q| ≤ log2(k) + |P| + c. And this identity must hold no matter WHAT k we choose. So it must be that for all k, k ≤ log2(k) + some constant. But this can’t be true! After all, k – log2(k) goes to infinity as k goes to infinity! Contradiction.

The uncomputability of Kolmogorov complexity allows us to easily prove some of the most important and fun results in theoretical computer science. First of all:

The halting problem is undecidable

In other words, there is no program that takes as input a program P and outputs whether P will halt when run on the empty string.

Proof: Suppose that H is such a program. We can design a program using H to compute Kolmogorov complexity. Here’s how we do it: On input (x, k), enumerate all programs in lexicographic order. On each program Q, apply H to check if Q halts on empty string as input. If so, move on to the next program. If not, run Q to see if it outputs x.

This finds the shortest program y that outputs x. Now just look at the bit-length of y! If y has length k, then output yes, and otherwise output no. So if we can solve the halting problem, then we can also compute Kolmogorov complexity. But we just showed that you can’t do this! Contradiction.

The Busy Beaver numbers are uncomputable

Define BB(N) to be the maximum number of steps run by a program with length N before halting (only looking at those length-N programs that do halt, of course).

We prove that there is no program that takes as input a number N and outputs BB(N). Suppose Y is such a program. We can use Y to design a program H that solves the halting problem. Here’s how:

H takes as input a program P, uses Y to evaluate BB(|P|), and then runs P for BB(|P|) steps. If P hasn’t halted by the end of this, then P will never halt. So H outputs yes if P halts after BB(|P|) steps, and otherwise outputs no. This solves the halting problem, but we just showed that you can’t do this! Contradiction.

The Busy Beaver numbers grow faster than every computable function.

This is actually a very similar proof to the last one. Suppose f(n) is a computable function that grows faster than BB(n). Use f(n) to construct g(n) = f(n) + c, such that g(n) > BB(n) for all n. g(n) is computable, so let Y be a program that computes g(n).

We can design a program H that solves the halting problem using Y. H takes as input a program P, then uses Y to evaluate g(|P|), then runs P for g(|P|) steps. If P hasn’t halted by the end of this, then P will never halt. So H outputs yes if P halts after g(|P|) steps, and otherwise outputs no.

This solves the halting problem, but we just proved you can’t do this! Contradiction.

Gödel’s first incompleteness theorem

Yes that’s right, we’re going to prove this using Kolmogorov complexity! The theorem says that there is no recursively enumerable proof system which is sound and complete for the natural numbers.

Suppose F is such a proof system. We can use F to design a program P that takes as input x and k and computes whether K(x) = k. First, translate K(x) = k into a question about natural numbers. Then run through every string s in our proof system, checking at each step if s is a valid proof in F of K(x) = k, or of K(x) ≠ k. If either one of these is true, terminate and return which one. Since F is sound and complete, this algorithm eventually terminates and tells us which is true.

So if F existed, then we could compute Kolmogorov complexity. But we can’t compute Kolmogorov complexity! Contradiction. So there can not be any recursively enumerable proof system which is sound and complete for the natural numbers.

There’s an upper bound on what Kolmogorov complexities we can prove a string to have.

The proof of this closely resembles G. G. Berry’s paradox of ‘the first natural number which cannot be named in less than a billion words’ [P. R.: But this sentence names the very number in 14 words!]… The version of Berry’s paradox that will do the trick is ‘that object having the shortest proof that its algorithmic information content is greater than a billion bits.

Chaitin (1982)

For any sound proof system F, there’s a number L such that F cannot prove that the Kolmogorov complexity of any specific string of bits is more than L.

Suppose not. Then there’s some proof system F that is sound for N, such that for any L, F can prove that K(x) = L for some x.

First we design a program P that takes as input L and outputs a string x such that K(x) = L. P searches through proofs in F until it finds a proof that K(x) = L for some x, and then outputs x. F can prove this for any L by assumption.

Okay, so for each L, P(L) gives an x such that K(x) = L. So the shortest program that outputs x must be at least length L. But P(L) is a program that outputs x. So it must be that |P(L)| < L. But |P(L)| = |P| + |L| + c ≤ |P| + log2(L) + c (for the same reasons as before). So it must be that for all L, L < |P| + log2(L) + c. But as L goes to ∞, (L – log2(L)) goes to ∞. So we can find an L such that L > |P| + log2(L) + c. Contradiction!

We just showed that there’s some L for which we cannot prove that the Kolmogorov complexity of any string is L. But notice that if we take this L and add one, L+1 will still be larger than |P| + log2(L+1) + c. And same for L+2, L+3, and so on forever. So really we’ve shown that there’s an L for which we can’t prove that the Kolmogorov complexity of any string is L or greater.

John Baez calls this the “complexity barrier” and argues here that the L is probably not actually that big. He guesses that L is a few thousand digits long at most!

Constructing the Church-Kleene Ordinal

I want to share a way to construct the order type of the Church-Kleene ordinal ω1CK. As far as I’m aware, this construction is original to me, and I think it’s much simpler and more elegant than others that I’ve found on the internet. Please let me know if you see any issues with it!

So, we start with the notion of a computable ordinal. A computable ordinal is a computable well-order on the natural numbers. In other words, it’s a well-order < on ℕ such that there exists a Turing machine that, given any two natural numbers n and m, outputs whether n < m according to that order.

There’s an injection from computable well-orders on ℕ to Turing machines, so there are countably many computable well-orders (at most one for each Turing machine). This means that there’s a list of all such computable well-orders. Let’s label the items on this list in order: {<0, <1, <2, <3, …}. Each <n stands for a Turing machine which takes in two natural number inputs and returns true or false based on the nth computable well-ordering.

Now, we want to construct a set with order type ω1CK. We’re going to have it be an order on the natural numbers, so what we have to work with from the start is the set {0, 1, 2, 3, 4, 5, 6, …}.

We start by splitting up this set into an infinite collection of sets. First, we take every other number, starting with 0: {0, 2, 4, 6, 8, …}. That’ll be our first set. We’ve left behind {1, 3, 5, 7, 9, …}, so next we take every other number from this set starting with 1: {1, 5, 9, 13, …}. Now we’ve left behind {3, 7, 11, 15, …}. Our next set, predictably, will be every other number from this set starting with 3. This is {3, 11, 19, 27, …}. And we keep on like this forever, until we’ve exhausted every natural number.

This gives us the following set of sets:

{{0, 2, 4, 6, …}, {1, 5, 9, 13, …}, {3, 11, 19, 27, …}, {7, 23, 39, 47, …}, …}

Now, we define our order to make everything in the first set less than everything in the second set, and everything in the second set less than everything in the third set, and so on. So far so good, but this only gives us order type ω2. Next we define the order WITHIN each set.

To do this, we’ll use our enumeration of computable orders. In particular, within the nth set, the ith number will be less than the jth number exactly in the case that i <n j. You can think about this as just treating each set “as if” it’s really {0, 1, 2, 3, …}, then ordering it according to <n, and then relabeling the numbers back to what they started as.

This fully defines our order. Now, what’s the order type of this set? Well, it can’t be any particular computable order type, because every computable order type can be found as a sub-order of it! So it must be some uncomputable order type. And by construction, we know it must be exactly the order type that contains all and only computable order types as suborders. But this is just the order type of the Church-Kleene ordinal!

So there we have it! We’ve constructed ω1CK. Now, remember that the Church-Kleene ordinal is uncomputable. But I just described an apparently well-defined process for creating an ordered set with its order! This means that some part of the process I described must be uncomputable. Challenge question for you: Which part is uncomputable, and can you think of a proof of its uncomputability that is independent of the uncomputability of ω1CK?

What ordinals can be embedded in ℚ and ℝ?

Last time we talked a little bit about some properties of the order type of ℚ. I want to go into more detail about these properties, and actually prove them to you. The proofs are nice and succinct, and ultimately rest heavily on the density of ℚ.

Every Countable Ordinal Can Be Embedded Into ℚ

Take any countable well-ordered set (X, ≺). Its order type corresponds to some countable ordinal. Since X is countable, we can enumerate all of its elements (the order in which we enumerate the elements might not line up with the well-order ≺). Let’s give this enumeration a name: (x1, x2, x3, …).

Now we’ll inductively define an order-preserving bijection from X into ℚ. We’ll call this function f. First, let f(x1) be any rational number. Now, assume that we’ve already defined f(x1) through f(xn-1) in such a way as to preserve the original order ≺. All we need to do to complete the proof is to assign to f(xn) a rational number such that the ≺ is still preserved.

Here’s how to do that. Split up the elements of X that we’ve already constructed maps for as follows: A = {xi | xi ≺ xn} and B = {xi | xi > xn}. In other words, A is the subset of {x1, x2, …, xn-1} consisting of elements less than x_n and B is the subset consisting of elements greater than xn. Every element of B is strictly larger than every element of A. So we can use the density of the rationals to find some rational number q in between A and B! We define f(xn) to be this rational q. This way of defining f(xn) preserves the usual order, because by construction, f(xn) < f(xi) for any i less than n exactly in the case that xn < xi.

By induction, then, we’ve guaranteed that f maps X to ℚ in such a way as to preserve the original order! And all we assumed about X was that it was countable and well-ordered. This means that any countable and well-ordered set can be found within ℚ!

No Uncountable Ordinals Can Be Embedded Into ℝ

In a well-ordered set X, every non-maximal element of X has an immediate successor (i.e. a least element that’s greater than it.) Proof: Take any non-maximal x ∈ X. Consider the subset of X consisting of all elements greater than x: {y ∈ X | x < y}. This set is not empty because α is not maximal. Any non-empty subset of a well-ordered set has a least element, so this subset has a least element. I.e, there’s a least element greater than x. Call this element S(x), for “the successor of x”,

Now, take any well-ordered subset X ⊆ ℝ (with the usual order). Since it’s well-ordered, every element has an immediate successor (by the previous paragraph). We will construct a bijection that maps X to ℚ, using the fact that ℚ is dense in ℝ (i.e. that there’s a rational between any two reals). Call this function f. To each element x ∈ X, f(x) will be any rational such that x < f(x) < S(x). This maps every non-maximal element of X to a rational number. To complete this, just map the maximal element of X to any rational of your choice. There we go, we’ve constructed a bijection from X to ℚ!

The implication of this is that every well-ordered subset of the reals is only countably large. In other words, even though ℝ is uncountably large, we can’t embed uncountable ordinals inside it! The set of ordinals we can embed within ℝ is exactly the set of ordinals we can embed within ℚ! (This set of ordinals is exactly ω1: the set of all countable ordinals).

Final Note

Notice that the previous proof relied on the fact that between any two reals you can find a rational. So this same proof would NOT go through for the hyper-reals! There’s no rational number (or real number, at that!) in between 1 and 1+ϵ. And in fact, you CAN embed ω1 into the hyperreals! This is especially interesting because the hyperreals have the same cardinality as the reals! So the embeddability of ω1 here is really a consequence of the order type of the hyperreals being much larger than the reals. And if we want to take a step towards even crazier extensions of ℝ, EVERY SINGLE ordinal can be embedded within the surreal numbers!

Visualizing the rationals

I made some pretty pictures of the rational numbers! Take a look.

This first plot shows the denominators of rational numbers between -1 and 1. It’s obtained by searching through every rational number in fully reduced form with numerator and denominator between -1000 and 1000.

Now here’s a plot of the numerators of rational numbers between -1 and 1.

Here’s the same thing, but only looking at rationals from 0 to 1:

And for kicks, here’s the inverses of the numerators:

The same, for rationals between 0 and 1:

These pictures are quite pretty to look at and interesting to think about. Some features of the image are artifacts of the particular way I chose rational numbers (i.e. looking only at rationals with numerator and denominator in a fixed range), but some other features might reflect some deeper structure of the rationals.

The order type of the rational numbers

Take any set and place an order on it. If the order is a well-order (i.e. if every subset has a least element), then the set has the same order type as some particular ordinal. But the notion of order type can be extended beyond well-ordered sets. Any two ordered sets are said to have the same order type if they are order isomorphic: if there’s a bijective map f from one set to the other such that both f and f-1 preserve the ordering of elements.

One structure that has a very interesting order type is the rational numbers. After all, the rationals are a countable set, but every rational number resembles a “limit ordinal” in the sense that it has no immediate predecessor.

One question that we can ask to get some insight about the order type of the rational numbers is: what ordinals can be found within the rationals? That is, take some well-ordered subset of the rationals. Look at the order type of this subset. This order type corresponds to some ordinal. And different choices of well-ordered subsets of the rationals give us different ordinals! So which ordinals can we “find” within the rationals in this sense?

First off, every finite ordinal can obviously be found. To find the ordinal n, just take the subset {0, 1, 2, …, n-1}. We can also find ω! You can just take the subset {0, 1, 2, 3, …}. What about ω+1? Try for yourself: can you construct a subset of the rationals with the order type of ω+1?

There’s no unique way to do it, but one easy way is to take the subset {1/2, 3/4, 7/8, …, 1}. This set has a countable infinity of elements, one for each natural number, after all of which comes a single element: exactly the order type of ω+1!

If we want a subset of the rationals with order type ω+2, we can use the same trick: {1/2, 3/4, 7/8, …, 1, 2}. And clearly this extends for any ω+n. But how about ω+ω? Can you construct a subset of the rationals with this order type? (Do it yourself before reading on!)

Here’s one way: {1/2, 3/4, 7/8, …, 1+1/2, 1+3/4, 1+7/8, …}. It should be easy to see how to extend this trick to get ω⋅3, ω⋅4, and indeed ω⋅n for any finite n. You can even naturally extend this to get to ω⋅ω. But then what? Are we finished?

If you guessed no, you’re right! We can find subsets of the rationals with order type ω3, and ωn for any finite n, and ωω. (Try it!) And we can keep going beyond that as well. So how high can you go?

Turns out that EVERY countable ordinal can be embedded into the rationals, under the usual order! So in some sense the order type of the rationals is as complicated as possible while still being countable. Also, remember that the set of countable ordinals is uncountably large. So this means that the rationals are a countable ordered set that has all of the uncountably many countable ordinals embedded within it! Isn’t that great?

(One thing that makes this seem less insane is that when we’re looking at what ordinals can be embedded in the rationals, we’re searching through different subsets of the rationals. And even though the rationals are countable, the set of all subsets of the rationals is uncountably large.)

One more interesting thing: the set of all subsets of the rationals has cardinality ℶ1. But the set of all ordinals that can be embedded into the rationals is ω1, which has cardinality ℵ1. So if we start with the set of all subsets of the rationals, then strip away all but those that are well-ordered, and then choose just a single representative for each order type, we get a set of cardinality ℵ1. And there is no guarantee that this final set has the same cardinality as the original set, because this is the continuum hypothesis! This is a way to think about how to “construct” a real life set with cardinality ℵ1: look at the set of well-ordered subsets of the rationals, and split it into order-type equivalence classes.

Can you compute all the countable ordinals?

One can live without knowing the ordinals, to be sure, but not as well.

Quote from this paper

What is an ordinal number? If you read my previous post, you might be convinced that an ordinal number is a particular type of set that contains all the ordinal numbers less than it. In particular, the ordinal 0 is the empty set, the ordinal 1 is the set {0}, the ordinal 2 is the set {0,1}, and so on. This is a fine way to think about ordinals, but there’s a much deeper view of them that I want to present in this post.

The fundamental concept is that of order types. Order types are to order-preserving bijections as cardinalities are to bijections. If you’ve seen pictures like these before, then they’re good to have in mind when you think about what order types are:

Screen Shot 2020-05-21 at 12.57.01 AM

Let’s get into more detail about order types.

Take any two sets. If there is a bijection between them, then we say that they have the same cardinality. So we can think of each cardinal number as a class of sets, all in bijective correspondence with one another.

Take any two well-ordered sets. (Recall, a set is well-ordered if each of its subsets has a least element.) If there is an order-preserving bijection between them, then we say that the two sets have the same order type. So we can think of each ordinal number as a class of sets, all of the same order type.

For instance, the cardinal number “2” is the class of all sets with two elements (as all such sets are in bijective correspondence with one another). And the ordinal number “2” is the class of all well-ordered two-element sets. In particular, Von Neumann’s ordinal 2 = {0, 1} = {∅, {∅}} has this order type.

The finite cardinalities and finite order types line up nicely: any two well-ordered sets with the same finite cardinality are guaranteed to have the same order type. It’s in the realm of the infinite where order types and cardinalities come apart.

The cardinal number ℵ0 is the class of all sets that can be put into bijective correspondence with the set of natural numbers ω. This includes the set of even numbers and the set of rational numbers. So how many order types are there of sets with cardinality ℵ0?

Consider the following two ordered sets: X = {0 < 1 < 2 < 3 < …} and Y = {1 < 2 < 3 < … < 0}. They both have cardinality ℵ0, but do they have the same order types? If so, then there must be a bijection f: X → Y such that f is order-preserving (for all a ∈ X and b ∈ X, if a < b then f(a) < f(b)). But what element of X is mapped to 0 in Y? In the set Y, 0 is larger than every element besides itself, but there is no element of X that has this property! So no, X and Y do not have the same order types.

X is the order type of ω (a countably infinite list of elements with no elements that have infinitely many predecessors), and Y is the order type of ω+1 (a countably infinite list of elements with a single element coming after all of them). This tells us that there are at least two different order types within the cardinality ℵ0.

Can we construct an order on the set of natural numbers that has the same order type as ω+2? Well, all we need is for there to be two elements that are larger than all the rest. So we can write something like {2 < 3 < 4 < … < 0 < 1}. This generalizes easily to ω+n, for any finite n: {n+1 < n+2 < … < 0 < 1 < … < n} has the same order type as ω+n.

But now what about ω+ω, i.e. ω⋅2? To get a set of naturals with the same order type, we have to use a new trick. We need two countably infinite sequences of numbers that we can place beside one another. An easy way to accomplish this is by using the evens and odds: {0 < 2 < 4 < … < 1 < 3 < 5 < …}.

We can get an order on the naturals with the same order type as ω⋅2 + 1 by just choosing a single natural number to place after all the others. For instance: {2 < 4 < 6 < … < 1 < 3 < 5 < … < 0}. It should be easy to see how to get ω⋅2 + n for any finite n: just do {n < n+2 < n+4 < … < n+1 < n+3 < n+5 < … < 0 < 1 < 2 < … < n-1} And now how about ω⋅3?

(Try it out for yourself before reading on!)

To get ω⋅3, we need to place three countably infinite sequences of naturals side by side. For instance: {0 < 3 < 6 < … < 1 < 4 < 7 < … < 2 < 5 < 8 < …}.

For ω⋅4, we can use a similar trick: {0 < 4 < 8 < … < 1 < 5 < 9 < … < 2 < 6 < 10 < … < 3 < 7 < 11 < …}. Again, we can quite easily generalize this to ω⋅n for any finite n: {0 < n < 2n < … < 1 < n+1 < 2n+1 < … < 2 < n+2 < 2n+2 < … < … < n-1 < 2n-1 < 3n-1 < …}. But how about ω⋅ω? How do we deal with ω2?

This is a fun exercise to try for yourself at home. Construct an order on the set of natural numbers that has the same order type as ω2. Then try to generalize the trick to get ω3, ω4, and ωn for any finite n.

Next construct an order on the naturals with the same order type as ωω! Can you do it for ω^ω^ω? And ω^ω^ω^…? One thing that should become clear is that the larger an ordinal you’re dealing with, the harder it becomes to construct an order on the naturals with the same order type. The natural question is: is it always possible to do this type of construction, or do we at some point run out of clever tricks to use to get to higher and higher countable ordinals?

Well, if an ordinal α is countable, then it is a well-ordered set with the same cardinality as the naturals. So there always exists some order that can be placed on the naturals to mimic the order type of α. But is this order always computable?

We call an order on the naturals computable if there’s some Turing machine which takes as input two naturals x and y and outputs whether x < y according to this order. We call an order type computable if there’s some computable order on the naturals with that order type. The standard order (0 < 1 < 2 < 3 < …) is computable, and it’s easy to see how to compute all the other orders we’ve discussed so far. But are ALL the countable order types computable?

The wonderful and strange answer is no. There are countable order types that are uncomputable! There must exist orders on the naturals with such order types, but these orders are so immensely complicated and strange that they cannot be defined by ANY Turing machine! Here’s a quick and easy proof of this. (The next three paragraphs are the coolest and most important part of this whole post, so pay attention!)

Turn your mind back to Von Neumann’s construction of the ordinals, according to which each ordinal was exactly the set of all smaller ordinals. More technically, a set α is a Von Neumann ordinal iff α is well ordered with respect to ∈ AND every element of α is also a subset of α. Consider now the set of all countable ordinals. It can be seen that this set is itself an ordinal. Can it be a countable ordinal? No, because if it were then it would have to be an element of itself! (This is not allowed in ZFC by the axiom of regularity.) This means that there are uncountably many countable ordinals.

Ok, so far so good. Now we just observe that there are only countably many Turing machines. So there are only countably many Turing machines that compute orders on the naturals. And therefore there are only countably many computable ordinals.

Putting this together, we see that there are uncountably many uncomputable countable order types (say that sentence out loud). After all, there are uncountably many countable ordinals, and only countably many computable ordinals!

And if there’s only countably many computable ordinals, then there’s some countable Von Neumann ordinal consisting of the set of all computable ordinals! This set couldn’t possibly be computable, because then it would be contained within itself. So it’s the smallest uncomputable ordinal! This set is called the Church-Kleene ordinal. Its order type is literally so convoluted and complex that no Turing machine can compute it!

Immoral or Inconsistent?

In front of you sits a button. If you press this button, an innocent person will be subjected to the worst possible forms of torture constantly for the next year. (There will be no “resistance” built up over time – the torture will be just as bad on day 364 as it was on day 1.) If you don’t press it, N people will receive a slight pin-prick in their arm, just sharp enough to be noticeably unpleasant.

Clearly if N is something small, like, say, 10, you should choose the pin-pricks rather than the torture. But what about for very very large N? Is there any N so large that you’d switch to the torture option? And if so, what is it?

I’m going to take it as axiomatic that no value of N is high enough that it’s worth inflicting the year long torture. Even if you were faced with the choice between year long torture for one and momentary pin-prick for a literal infinity of people, the right choice would be the pin-pricks.

I feel safe in taking this as axiomatic for three main reasons. Firstly, it’s the obvious answer to the average person, who hasn’t spent lots of time thinking about normative ethics. (This is just in my experience of posing the problem to lay-people.)

Secondly, even among those that choose the torture, most of them do so reluctantly, citing an allegiance to some particular moral framework. Many of them say that they think that it’s technically what they should do, but that in reality they probably wouldn’t, or would at least be extremely hesitant to.

Thirdly, my own moral intuitions deliver a clear and unambiguous judgement on this case. I would 100% choose an infinity of people having the pin-pricks, and wouldn’t feel the tiniest bit of guilt about it. My intuition here is at the same level of obviousness as something like “Causing unhappiness is worse than causing happiness” or “Satisfying somebody’s preferences is a moral good, so long as those preferences don’t harm anybody else.”

With all that said, there are some major problems with this view. If you already know what these problems are, then the rest of this post will probably not be very interesting to you. But if you feel convinced that the torture option is unambiguously worse, and don’t see what the problem could be with saying that, then read on!

First of all, pain is on a spectrum. This spectrum is not continuous, but nonetheless there’s a progression of pains from “slight pin-prick” to “year long torture” such that each step is just barely noticeably worse than the previous one.

Second of all, for each step along this progression, there’s a trade-off to be made. For instance, a pin-prick for one person is better than a very slightly worse pin-prick for one person. And a pin-prick for each of one million people is worse than a slightly more painful pin-prick for one person. So there’s some number of people N that will cause you to change your choice from “pin-pricks for N people” to “slightly worse pin-prick for one person.”

Let’s formalize this a little bit. We’ll call our progression of pains p1, p2, p3, …, pn, where p1 is the pain of a slight pin-prick and pn is the pain of yearlong torture. And we’ll use the symbol < to mean “is less bad than”. What we’ve just said is that for each k, (pk for one person) < (pk+1 for one person) AND (pk for one million people) > (pk+1 for one person). (The choice of million really doesn’t matter here, all that we need is that there’s some number for which the slighter pain becomes worse. One million is probably high enough to do the job.)

Now, if (pk for N) < (pk+1 for 1), then surely (pk for 2N) < (pk+1 for 2). The second is just the first one, but two times! If the tradeoff was worth it the first time, then the exact same tradeoff should be worth it the second time. But now what this gives us is the following:

p1 for 1,000,000n > p2 for 1,000,000n-1 > … > pn-1 for 1,000,000 > pn for 1

In other words, if we are willing to trade off at each step along the progression, then we are forced on pain of inconsistency to trade off between the slight pin prick and the yearlong torture! I.e. there has to be some number (namely 1000000n) for which we choose the torture for 1 over the pin-prick for that number of people.

Writing this all out:

  1. There’s a progression of pains p1, p2, …, pn-1, pn such that for each k, (pk for one million people) > (pk+1 for one person).
  2. If (p for n) > (q for m), then (p for k⋅n) > (q for k⋅m).
  3. > is transitive.
  4. Therefore, (p1 for 1,000,000n) > (pn for 1)

If you accept the first three premises, then you must accept the fourth. And the problem is, all three premises seem very hard to deny!

For premise 1: It seems just as clearly immoral to choose to inflict a pain on a million people rather than inflict a slightly worse pain on one person as it does to choose to inflict torture on one rather than a pin-prick on an arbitrarily large number of people. Or to make the point stronger, no sane moral system would say that it’s worse to inflict a pain on one person than a very slightly less bad pain on an arbitrarily large number of people. We have to allow some tradeoff between barely different pains.

Premise 2: if you are offered a choice between two options and choose the first, and then before you gain any more information you are offered the exact same choice, with no connection between the consequences of the first choice and the consequences of the second choice, then it seems to me that you’re bound by consistency to choose the first once more. And if you’re offered that choice k times, then you should take the first every time.

(Let me anticipate a potential “counter-example” to this. Suppose you have the choice to either get a million dollars for yourself or for charity. Then you are given that same choice a second time. It’s surely not inconsistent to choose the million for yourself the first time and the million for charity the second time. I agree, it’s not inconsistent! But this example is different than what we’re talking about in Premise 2, because the choice you make the second time is not the same as the choice you made the first. Why? Because a million dollars to a millionaire is not the same as a million dollars to you or me. In other words, the goodness/badness of the consequences of the second choice are dependent on the first choice. In our torture/pin-prick example, this is not the case; the consequences of the second choice are being enacted on an entirely different group of people, and are therefore independent of the first choice.)

Premise 3: Maybe this seems like the most abstract of the three premises, and hence potentially the most easy to deny. But the problem with denying premise 3 is that it introduces behavioral inconsistency. If you think that moral badness is not transitive, then you think there’s an A, B, and C such that you’d choose A over B, B over C, but not A over C. But if you choose A over B and B over C, then you have in effect chosen A over C, while denying that you would do so. In other words, a moral system that denies transitivity of badness cannot be a consistent guide of action, as it will tell you not to choose A over C, while also telling you to take actions that are exactly equivalent to choosing A over C.

And so we’re left with the conclusion that pin-pricks for 1,000,000n people is worse than torture for one person for a year.

Okay, but what’s the take-away of this? The take-away is that no consistent moral system can agree with all of the following judgements:

  • Barely distinguishable pains are morally tradeoffable.
  • There’s a progression of pains from “momentary pin-prick” to “torture for a year”, where each step is just barely distinguishably worse from the last.
  • Torturing one person for a year is worse than inflicting momentary pin-pricks on any number of people.

You must either reject one of these, or accept an inconsistent morality.

This seems like a big problem for anybody that’s really trying to take morality seriously. The trilemma tells us in essence that there is no perfect consistent moral framework. No matter how long you reflect on morality and how hard you work at figuring out what moral principles you should endorse, you will always have to choose at least one of these three statements to reject. And whichever you reject, you’ll end up with a moral theory that is unacceptable (i.e. a moral theory which commits you to immoral courses of action).