How uncomputable are the Busy Beaver numbers?

The Busy Beaver numbers are an undecidable set; if they were decidable, we could figure out BB(n) for each n, enabling us to decide the halting problem. They are also not recursively enumerable, but for a trickier reason. Recursive enumerability would not allow you to figure out BB(n) (as it just gives us the ability to list out the BB numbers in some order, not necessarily in increasing order). But since the sequence is strictly increasing, recursive enumerability would enable us to put an upper bound on BB(n), which is just as good for the purposes of solving the halting problem. Simply enumerate the BB numbers in whatever order your algorithm allows, and once you’ve listed n of them, you know that the largest of those n is at least as big as the BB(n) (after all, it’s a BB number that’s larger than n-1 other BB numbers).

So the Busy Beaver numbers are also not recursively enumerable. But curiously, they’re Turing equivalent to the halting problem, and the halting problem is recursively enumerable. So what gives?

The answer is that the Busy Beaver numbers are co-recursively enumerable. This means that there is an algorithm that takes in a number N and returns False if N is not a Busy Beaver number, and runs forever otherwise. Here’s how that algorithm works:

First, we use the fact that BB(N) is always greater than N. This allows us to say that if N is a Busy Beaver number, then it’s the running time of a Turing machine with at most N states. There are finitely many such Turing machines, so we just run them all in parallel. We wait for N steps, and then see if any machines halt at N steps.

If no machines halt at N steps, then we return False. If some set of machines {M1, M2, …, Mk} halt at N steps, then we continue waiting. If N is not a Busy Beaver number, then for each of these machines, there must be another machine of the same size that halts later. So if N is not a Busy Beaver number, then for each Mi there will be a machine Mi‘ such that Mi‘ has the same number of states as Mi and that halts after some number of steps Ni‘ > N. Once this happens, we rule out Mi as a candidate for the Busy Beaver champion. Eventually, every single candidate machine is ruled out, and we can return False.

On the other hand, if N is a Busy Beaver number, then there will be some candidate machine M such that no matter how long we wait, we never find another machine with the same number of states that halts after it. In this case, we’ll keep waiting forever and never get to return True.

It’s pretty cool to think that for any number that isn’t a Busy Beaver number, we can in principle eventually rule it out. If a civilization ran this algorithm to find the first N busy beaver numbers, they would over time rule out more and more candidates, and after a finite amount of time, they would have the complete list of the first N numbers. Of course, the nature of co-recursive enumerability is that they would never know for sure if they had reached that point; they would forever be waiting to see if one of the numbers on their list would be invalidated and replaced by a much larger number. But in the limit of infinite time, this process converges perfectly to truth.

✯✯✯

Define H to be the set of numbers that encode Turing machines that halt on the empty input, and B to be the set of Busy Beaver numbers. H and B are Turing equivalent. The proof of this is two-part:

  1. H is decidable with an oracle for B

We are given as input a Turing machine M (encoded as some natural number) and asked whether it will halt. We use our oracle for B to find the value of BB(n), where n is the number of states that M has, and run M for BB(n) steps. If M doesn’t halt at this point, then we know that it will never halt and we return False. And if it has already halted, we return True.

2. B is decidable with an oracle for H

We are given as input a number N and asked whether it’s a Busy Beaver number. We collect all Turing machines with at most N states, and apply H to determine which of these halt. We throw out all the non-halting Turing machines and run all the rest. We then run all the remaining Turing machines until each one has halted, noting the number of steps that each runs for and how many states it had. At the end we check if N was the running time of any of the Turing machines, and if so, if there are any other Turing machines with the same number of states that halted later. If so, then we return False, and otherwise we return True.

So H and B have the same Turing degree. And yet B is co-recursively enumerable and H is recursively enumerable (given a Turing machine M, just run it and return True if it ever halts). This is actually not so strange; the difference between recursive enumerability and co-recursive enumerability is not a difference in “difficulty to decide”, it’s just a difference between whether what’s decided is the True instances or the False instances.

As a very simple example of the same phenomenon, consider the set of all halting Turing machines H and the set of all non-halting Turing machines Hc. H is recursively enumerable and Hc is co-recursively enumerable. And obviously given an oracle for either one can also decide the other. More generally, for any set X, consider Xc = {n : n ∉ X}. X and Xc are Turing equivalent, and if X is recursively enumerable then Xc is co-recursively enumerable. What’s more, if X is Σn then Xc is Πn.

The Arithmetic Hierarchy and Computability

In this post you’ll learn about a deep connection between sentences of first order arithmetic and degrees of uncomputability. You’ll learn how to look at a logical sentence and determine the degree of uncomputability of the set of numbers that it defines. And you’ll learn much of the content of and intuitive justification for Post’s theorem, which is perhaps the most important result in computability theory.

The arithmetic hierarchy is a classification system for sets of natural numbers. It relates sentences of Peano arithmetic to degrees of uncomputability. There’s a level of the hierarchy for each natural number, and on each level there are three classes: Σn, Πn, and Δn. Where on the hierarchy a set of naturals lies indicates the complexity of the formulas that define it. Here’s how that works:

Σ0 = Π0 = Δ0

On the zeroth level are all the sets that are definable by a sentence in the language of first-order Peano arithmetic that has no unbounded quantifiers. Examples:

x = 0
(x = 0) ∨ (x = 3) ∨ (x = 200)
∃y < 200 (x = y)
∀y < x ∀z < y (y⋅z = x → z = 1)
∃y < x (x = y + y)

The first two examples have no quantifiers at all. The first, x = 0, is true of exactly one natural number (namely, 0), so the set that it defines is just {0}. The second is true of three natural numbers (0, 3, and 200), so the set it defines is {0, 3, 200}. Thus both {0} and {0, 3, 200} are on the zeroth level of the hierarchy. It should be clear that just by stringing together a long enough disjunction of sentences, we can define any finite set of natural numbers. So every finite set of natural numbers is on the zeroth level of the hierarchy.

The third example has a quantifier in it, but it’s a bounded quantifier. In that sentence, y only ranges over 200 different natural numbers. In some sense, this sentence could be expanded out and written as a long disjunction of the form (x = 0) ∨ (x = 1) ∨ … ∨ (x = 199). It should be easy to see that the set this defines is {0, 1, 2, … 199}.

The fourth example is a little more interesting, for two reasons. First of all, the quantifiers are now bound by variables, not numbers. This is okay, because no matter what value of x we’re looking at, we’re only checking finitely many values of y, and for each value of y, we only check finitely many values of z. Secondly, this formula can be translated as “x is a prime number.” This tells us that the set of primes {2, 3, 5, 7, 11, …} is on the zeroth level of the hierarchy, our first infinite set!

The fifth example also gives us an infinite set; work out which set this is for yourself before moving on!

Notice that each of the sets we’ve discussed so far can be decided by a Turing machine. For each set on the zeroth level, the defining sentence can be written without any quantifiers, involving just the basic arithmetic operations of Peano arithmetic (+, ⋅, and <), each of which is computable. So for instance, we can write an algorithm that decides prime membership for x by looking at the sentence defining the set of primes:

∀y < x ∀z < y (y⋅z = x → z = 1)

for y in range(x):
    for z in range(y):
        if (y⋅z == x) and (z != 1):
            return False
return True

Another example:

(x = 0) ∨ (x = 3) ∨ (x = 200)

if (x = 0) or (x = 3) or (x = 200):
    return True
else:
    return False

Every sentence with only bounded quantifiers can be converted in a similar way to an algorithm that decides membership in the set it defines. We just translate bounded quantifiers as for loops! Thus every set on the zeroth level is decidable. It turns out that every decidable set is also on the zeroth level. We can run the process backwards and, given an algorithm deciding membership in the set, convert it into a sentence of first-order Peano arithmetic with only bounded quantifiers.

On the zeroth level, the three classes Σ0, Π0, and Δ0 are all equal. So every decidable set can be said to be in each of Σ0, Π0, and Δ0. The reason for the three different names will become clear when we move on to the next level of the hierarchy.

Σ1, Π1, Δ1

Entering into the first level, we’re now allowed to use unbounded quantifiers in our definitions of the set. But not just any pattern of unbounded quantifiers, they must all be the same type of quantifier, and must all appear as a block at the beginning of the sentence. For Σ1 sets, the quantifiers up front must all be existential. For example:

∃y (y2 + y + 1 = x)
∃y ∃z (y is prime ∧ z is prime ∧ x = y + z ∧ x is even)

The first defines the infinite set {1, 3, 7, 13, 21, 31, …}, and the second defines the set of all even numbers that can be written as the sum of two primes (whether this includes every even number is unknown!). The shorthand I’ve used in the second sentence is permitted, because we know from the previous section that the sentences “x is prime” and “x is even” can be written without unbounded quantifiers.

It so happens that each of these sets can also be defined by formulas on the zeroth level as well. One easy way to see how is that we can just bound each quantifier by x and everything is fine. But are there any Σ1 sets that aren’t also Σ0?

Yes, there are! Every decidable set is in Σ0, so these examples won’t be as simple. To write them out, I’ll have to use some shorthand that I haven’t really justified:

∃y (the Turing machine encoded by x halts in y steps)
∃x (y Gödel-codes a proof from PA of the sentence Gödel-coded by x)

Both of these rely on the notion of numbers encoding other objects. In the first we encode Turing machines as numbers, and in the second we encode sentences of Peano arithmetic as numbers using Gödel numbering. In general, any finite object whatsoever can be encoded as a natural number, and properties of that object can be discussed as properties of numbers.

Let’s take a close look at the first sentence. The set defined by this sentence is the set of numbers that encode Turing machines that halt. (For readability, in the future I’ll just say “the set of Turing machines such that …” rather than “the set of natural numbers that encode Turing machines with the property …”.) This set is undecidable, by the standard proof of the undecidability of the halting problem. This tells us that the set cannot be defined without any quantifiers!

The second sentence defines the set of natural numbers that encode theorems of PA. Why is this set not decidable? Suppose that it were. Then we could ask “is 0=1 a theorem of PA?” and, assuming the consistency of PA, after a finite amount of time get back the answer “No.” This would allow us to prove the consistency of PA, and crucially, the Turing machine that we use to decide this set could be coded as a number and used by PA to prove the same thing! Then PA would prove its own consistency, making it inconsistent and contradicting our assumption. On the other hand, if PA is inconsistent, then this set actually is decidable, because everything is provable from the axioms of PA and the set of all sentences can be decided! So this is only a good example of “Σ1 but not Σ0” if you have faith in the consistency of PA.

The Σ0 sets were the decidable sets, so what are the Σ1 sets? They are the recursively enumerable sets (also called semi-decidable). These are the sets for which an algorithm exists that takes in an input N and returns True if N is in the set, and runs forever otherwise. And just like before, we can use the sentences of PA to construct such an algorithm! Let’s look at the first example:

∃y (y2 + y + 1 = x)

y = 0
while True:
    if y**2 + y + 1 == x:
        return True
    y += 1

For each unbounded quantifier, we introduce a variable and a while loop that each step increases its value by 1. This program runs until it finds a “yes” answer, and if no such answer exists, it runs forever.

Another example:

∃y (the Turing machine encoded by x halts in y steps)

y = 0
while True:
    if TM(x) halts in y steps:
        return True
    y += 1

I made up some notation here on the fly; TM(x) is a function that when fed a natural number returns the Turing machine it encodes. So what this program does is run TM(x) for increasing lengths of time, returning True if TM(x) ever halts. If TM(x) never halts, then the program runs forever. So this program semi-decides the halting problem.

Moving on to Π1 sets! Σ1 sets were those defined by sentences that started with a block of existential quantifiers, and Π1 sets are defined by sentences that start with a block of unbounded universal quantifiers. For example:

∀y ∀z (x = y⋅z → (y = 1 ∨ x = 1))
∀y > x ∃z < y (x⋅z = y)

The first example is hopefully clear enough. It defines the set of prime numbers in a similar way to before. The second is meant to be slightly deceiving. That first quantifier might look bounded, until you notice that the variable it supposedly binds ranges over infinitely many values. On the other hand, the second quantifier is genuinely bounded, as for any given y it only ranges over finitely many values. See if you can identify the set it defines.

Both of these sets are in fact also in Σ0. Like with Σ1 sets, it’s difficult to find simple examples of Π1 sets that aren’t also Σ0. Here’s a cheap example:

∀y (the Turing machine encoded by x doesn’t halt in y steps)

This defines the set of non-halting Turing machines. If we were to translate this into a program, it would look something like:

y = 0
while True:
    if TM(x) halts in y steps:
        return False
    y += 1

The difference between this and the earlier example is that this program can only successfully determine that natural numbers aren’t in the set, not that they are in the set. So we can only decide the “no” cases rather than the “yes” cases. Sets like this are called co-recursively enumerable.

So far we have that Σ0 = Π0 = Δ0 = {decidable sets of naturals}, Σ1 = {recursively enumerable sets of naturals}, and Π1 = {co-recursively enumerable sets of naturals}. To finish off level one of the hierarchy, we need to discuss Δ1.

A set is Δ1 if it is both Σ1 and Π1. It can be defined by either type of sentence. In other words, if it is both recursively enumerable (can decide the True instances) and co-recursively enumerable (can decide the False instances). Thus a set is Δ1 if and only if it’s decidable! So every Δ1 set is also a Δ0 set.

A diagram might help at this point to see what we’ve learned:

(“Recursive” is another term for “decidable”)

Σ2, Π2, Δ2

We now enter the second level of the hierarchy. Having exhausted the recursively enumerable, co-RE, and decidable sets, we know that the new sets we encounter can no longer be semi-decided by any algorithms. We’ve thus entered a new realm of uncomputability.

A set is Σ2 if it can be defined by a sentence with a block of ∃s, then a block of ∀s, and finally a Σ0 sentence (free of unbounded quantifiers. Start with some simple examples:

∃y ∀z (y⋅z < x)
∃y ∃z ∀w ∀u ((y⋅w < x) ∨ ((z + u) > x))

As with any examples you can write as simply as that, the sets these define are also Σ0. So what about sentences that define genuinely new sets that we haven’t seen before? Here’s one:

∃y ∀z (TM(x) doesn’t halt on input y after z steps)

This defines the set of Turing machines that compute partial functions (i.e. functions that run forever on some inputs).

Something funny happens when you start trying to translate this sentence into a program. We can start by translating the existential quantifier up front:

y = 0
while True:
    if ∀z (TM(x) doesn't halt on input y after z steps):
        return True
    y += 1

Now let’s see what it would look like to expand the universal quantifier:

z = 0
while True:
    if TM(x) halts on input y after z steps:
        return False
    z += 1

There’s a problem here. The first program only halts if the second one returns True. But the second one never returns “True”; it’s Π1, co-recursively enumerable! We’ve mixed a recursively enumerable program with a co-recursively enumerable one, and as a result the program we wrote never halts on any input!

This is what happens when we go above the first level of the hierarchy. We’ve entered the realm of true uncomputability.

On the other hand, suppose that we had access to an oracle for the halting problem. In other words, suppose that we are allowed to make use of a function Halts(M, i) that decides for any Turing machine M and any input i, whether M halts on i. Now we could write the following program:

y = 0
while True:
    if not Halts(TM(x), i):
        return True
    y += 1

And just like that, we have a working program! Well, not quite. This program halts so long as its input x is actually in the set of numbers that encode TMs that compute partial functions. If x isn’t in that set, then the program runs forever. This is the equivalent of recursive enumerability, but for programs with access to an oracle for the halting problem!

This is what characterizes Σ2 sets: they are recursively enumerable using a Turing machine with an oracle for the halting problem.

Perhaps predictably, Π2 sets are those that are co-recursively enumerable using a Turing machine with an oracle for the halting problem. Equivalently, these are sets that are definable with a sentence that has a block of ∀s, then a block of ∃s, and finally a Σ0 sentence. An example of such a set is the set of Turing machines that compute total functions. Another is the set of Busy Beaver numbers.

And just like on level one of the hierarchy, the Δ2 sets are those that are both Σ2 and Π2. Such sets are both recursively enumerable and co-recursively enumerable with the help of an oracle for the halting problem. In other words, they are decidable using an oracle for the halting problem. (Notice that again, Δ2 sets are “easier to decide” than Σ2 and Π2 sets.)

The Busy Beaver numbers are an example of a Δ2 set. After all, if one had access to an oracle for the halting problem, they could compute the value of the nth Busy Beaver number. Simply list out all the n-state Turing machines, use your oracle to see which halt, and run all of those that do. Once they’ve all stopped, record the number of steps taken by the longest-running. With the ability to compute the nth Busy Beaver number, we do the following:

n = 1
while True:
    if x == BB(n):
        return True
    n += 1

This returns True for all inputs that are Busy Beaver numbers, but runs forever on those that aren’t. We can fix this with bounded quantifiers: since the Busy Beaver numbers are a strictly increasing sequence, once we’ve reached an n such that x < BB(n), we can return False. This gives us an algorithm that decides the set of Busy Beaver numbers!

In fact, the Busy Beaver numbers are even better than Δ2; they’re Π1, co-recursively enumerable! But that’s a tale for another time.

Σn, Πn, Δn

Perhaps you’re now beginning to see the pattern.

Σ3 sets are those that can be defined by a sentence that has a block of ∃s, then a block of ∀s, then a block of ∃s, and finally a Σ0 sentence. We could more concisely say that Σ3 sets are those that can be defined by a sentence that has a block of ∃s and then a Π2 sentence.

Π3 sets are those definable by a sentence that starts with a block of ∀s, then alternates twice between groups of quantifiers before ending in a sentence with only bound quantifiers.

And Δ3 sets are those that are Σ3 and Π3. If you’ve picked up the pattern, you might guess that the Δ3 sets are exactly those that are decidable with an oracle for the halting problem for Turing machines that have an oracle for the halting problem. The Σ3 sets are those that are recursively enumerable with such an oracle, and the Π3 sets are co-RE with the oracle.

And so on for every natural number.

Σn sets are definable by a sentence that looks like ∃x1 ∃x2 … ∃xk φ(x, x1, x2, …, xk), where φ is Πn-1.
Πn sets are definable by a sentence that looks like ∀x1 ∀x2 … ∀xk φ(x, x1, x2, …, xk), where φ is Σn-1.
And Δn sets are both Σn and Πn.

At each level of the hierarchy, we introduce new oracle Turing machines with more powerful oracles. We’ll call the Turing machines encountered on the nth level TMns.

Σn sets are recursively enumerable with an oracle for the halting problem for TMns.
Πn sets are co-recursively enumerable with an oracle for the halting problem for TMns.
Δn sets are decidable with an oracle for the halting problem for TMns.

One might reasonably wonder if at some stage we exhaust all the sets of natural numbers. Or do we keep finding higher and higher levels of uncomputability?

Yes, we do! For each n, Σn is a proper subset of Σn+1 and Πn is a proper subset of Πn+1. One easy argument to see why this must be the case is that there are an uncountable infinity of sets of naturals, and only countably many sets of naturals in each level of the hierarchy. (This is because each level of the hierarchy is equivalent to the sets that are computable/recursively enumerable/co-RE using a TM with a particular oracle, and each TM only computes countably many things.)

This tells us that the hierarchy doesn’t collapse at any finite stage. Each step up, we find new sets that are even harder to decide than all those we’ve encountered so far. But (and now prepare yourself for some wildness) we can make the same cardinality argument about the finite stages of the hierarchy to tell us that even these don’t exhaust all the sets. There are only countably many finite stages of the hierarchy, and each stage contains only countably many sets of naturals!

What this means is that even if we combined all the finite stages of the hierarchy to form one massive set of sets of naturals decidable with any number of oracles for halting problems for earlier levels, we would still have not made a dent in the set of all sets of naturals. To break free of the arithmetic hierarchy and talk about these even more uncomputable levels, we need to move on to talk about the set of Turing degrees, the structure of which is incredibly complex and beautiful.

A Self-Interpreting Book

A concept: a book that starts by assuming the understanding of the reader and using concepts freely, and as you go on it introduces a simple formal procedure for defining words. As you proceed, more and more words are defined in terms of the basic formal procedure, so that halfway through, half of the words being used are formally defined, and by the end the entire thing is formally defined. Once you’re read through the whole book, you can start it over and read from the beginning with no problem.

I just finished a set theory textbook that felt kind of like that. It started with the extremely sparse language of ZFC: first-order logic with a single non-logical symbol, ∈. So the alphabet of the formal language consisted of the following symbols: ∈ ( ) ∧ ∨ ¬ → ↔ ∀ ∃ x ‘. It could have even started with a sparser formal language if it was optimizing for alphabet economy: ∈ ( ∧ ¬ ∀ x ‘ would suffice. As time passed and you got through more of the book, more and more things were defined in terms of the alphabet of ZFC: subsets, ordered pairs, functions from one set to another, transitivity, partial orders, finiteness, natural numbers, order types, induction, recursion, countability, real numbers, and limits. By the last chapter it was breathtaking to read a sentence filled with complex concepts and realize that every single one of these concepts was ultimately grounded in this super simple formal language we started with, with a finitistic sound and complete system of rules for how to use each one.

But could it be possible to really fully define ALL the terms used by the end of the book? And even if it were, could the book be written in such a way as to allow an alien that begins understanding nothing of your language to read it and, by the end, understand everything in the book? Even worse, what if the alien not only understands nothing of your language, but starts understanding nothing of the concepts involved? This might be a nonsensical notion; an alien that can read a book and do any level of sophisticated reasoning but doesn’t understand concepts like “and” and “or“.

One way that language is learned is by “pointing”: somebody asks me what a tree is, so I point to some examples of trees and some examples of non-trees, clarifying which is and which is not. It would be helpful if in this book we could point to simple concepts by means of interactive programs. So, for instance, an e-book where an alien reading the book encounters some exceedingly simple programs that they can experiment with, putting in inputs and seeing what results. So for instance, we might have a program that takes as input either 00, 01, 10, or 11, and outputs the ∧ operation applied to the two input digits. Nothing else would be allowed as inputs, so after playing with the program for a little bit you learn everything that it can do.

One feature of such a book would be that it would probably use nothing above first-order logical concepts. The reason is that the semantics of second-order logic cannot be captured by any sound and complete proof system, meaning that there’s no finitistic set of rules one could explain to an alien so that they know how to use the concepts involved correctly. Worse, the set of second-order tautologies is not even recursively enumerable (worse than the set of first-order tautologies, which is merely undecidable), so no amount of pointing-to-programs would suffice. First-order ZFC can define a lot, but can it define enough to write a book on what it can define?

Epsilon-induction and the cumulative hierarchy

The axiom of foundation says that every non-empty set must contain an element that it is disjoint with. One implication of this is that no set can contain itself. (Why? Suppose ∃x (x ∈ x). Then you can pair x with itself to form {x}. {x} is non-empty, so it must contain an element that it’s disjoint with. But its only element is x, and that element is not disjoint with {x}. In particular, they both have as an element x.)

Another implication of foundation is that there can be no infinite descending sequence of sets. A sequence of sets is a function f whose domain is ω. For each k ∈ ω we write f(k) as fk. Suppose that f is a function with the property that for each n ∈ ω, fn+1 ∈ fn. Then by the axiom of replacement, there would exist the set {f0, f1, f2, …}. Each element of this set contains the following element, so none of the elements are disjoint with the entire set. So the sequence we started with could not exist.

This allows us to prove a pretty powerful and surprising result about set theory. The result: For any sentence Φ, if Φ holds of at least one set, then there must exist a set X such that Φ(X) holds, but Φ doesn’t hold for any element of X.

Formally:

(∃A Φ(A)) → ∃X (Φ(X) ∧ ∀Y ∈ X (¬Φ(Y))).

Suppose this was false. Then Φ would be satisfied by at least one set, and every set that satisfied Φ would contain an element that satisfied Φ. We can use this to build an infinite descending sequence of sets. Take any set that satisfies Φ and call it X0. Since X0 satisfies Φ, it must contain at least one element that satisfies Φ. Define X1 to be any such element. X1 satisfies Φ as well, so it must contain at least one element that satisfies Φ, which we’ll call X2. And so on forever. Since no set can contain itself, Xn+1 is always distinct from Xn. We can use recursion and then replacement to construct the set {X0, X1, X2, …}, which is an infinite descending sequence of sets. Contradiction!

This peculiar principle turns out to be useful to prove some big results, and the proofs are always a little funny and enjoyable to me. I’m not aware of any name for it, so I’ll call it the “far-from-the-tree” principle: for any property that holds of at least one set, there’s a set that has that property, none of whose elements have the property. I’ll now show two big results that are proven by this principle.

Everything’s in the Cumulative Hierarchy

The cumulative hierarchy is constructed recursively as follows:

V0 = ∅
Vα+1 = 𝒫(Vα), for each ordinal α
Vλ = U{Vβ : β < λ}, for each limit ordinal λ

V0 = ∅
V1 = {∅}
V2 = {∅, {∅}}
V3 = {∅, {∅}, {{∅}}, {∅, {∅}}}

One can prove that Vα is transitive for each ordinal α (in other words, that every element of Vα is also a subset of Vα). Also, due to the nature of the power-set operation, for any two ordinals α < β, Vα ⊆ Vβ.

Now, the reason the cumulative hierarchy is interesting is that we can prove that every set is on some level of the hierarchy. We do so with the far-from-the-tree principle:

Suppose that there was a set X that wasn’t on the hierarchy. This means that ∀α, X ∉ Vα. We use the far-from-tree principle with the property of “not being on the hierarchy” to conclude that there must be a set Y such that Y is not on the hierarchy, but every element of Y is on the hierarchy. So for each element y ∈ Y there’s some ordinal αy such that y ∈ Vαy. Take the highest such ordinal, and now we have a level of the hierarchy that every element of Y is on. Now just go up by one level! Since every element of Y was in the previous level, the set consisting of each of those elements (i.e. Y) is in the power set of that level. So Y actually is on the hierarchy! Contradiction.

This shows that any set you choose shows up on some level of the cumulative hierarchy. Incidentally, the rank of a set is defined as the first level that the set appears on, and one can prove that for each ordinal α, rank(α) = α.

∈-Induction

∈-induction is a really strong principle of induction, because it is induction over all sets, not just the natural numbers or the ordinals. The principle of ∈-induction is the following:

Suppose that for every set X, if Φ(x) holds for every x ∈ X, then Φ(X) holds.
Then Φ holds of every set.

Suppose that this principle was false. Then it’s true that for every set X, if Φ(x) holds for every x ∈ X, then Φ(X) holds, but false that Φ holds of every set. So there’s some set Y such that ¬Φ(Y). We use the far-from-the-tree principle with the property “¬Φ”. This tells us that there’s some set Z such that ¬Φ(Z), but Φ(z) for every z in Z. But we already said that if Φ(z) for every z in Z, then Φ(Z)! Contradiction.

I hope you like these applications as much as I do. They feel to me like very strong and surprising results, especially given how simply they can be proved.

Nonstandard integers, rationals, and reals

In discussions of first-order logic and set theory, one often hears about nonstandard natural numbers and their unusual properties. But rarely do you hear about nonstandard integers, nonstandard rationals, or nonstandard reals. This post will be devoted to the ways in which these other number systems pick up unintended interpretations in first-order logic, and in particular first-order ZFC.

Nonstandard ℕ

Nonstandard interpretations of the natural numbers are the most well-known, so I won’t spend much time on them. In ZFC, the set of natural numbers is called ω, and is defined to be the intersection of all inductive sets. An inductive set is one that includes zero and is closed under successorship (if n is in the set, then so is S(n)).

This definition feels so right when you think about it for a little while that it’s hard to see how it could go wrong. How could there be anything OTHER than the standard natural numbers in the intersection of all inductive sets?

Well, one can prove from the compactness of first order logic that there must be such nonstandard numbers, and the proof is so simple and pretty that it’s probably already appeared on this blog six or seven times. Add to ZFC a constant K and the axioms “K ∈ ω”, “K > 0”, “K > 1”, “K > 2”, and so on for all the standard naturals (this is a decidable set of sentences). Every finite subset of this new theory has a model, so by compactness the theory as a whole also has a model. This model is nonstandard by construction, and is also a model of ZFC by monotonicity (removing axioms never removes models).

So how do nonstandard naturals appear in the intersection of all inductive sets? The key term to focus on is the “all” in “all inductive sets”. While ZFC guarantees the existence of an inductive set (by the axiom of infinity) and every definable subset of this set (by comprehension/separation), it does not actually guarantee the existence of ALL subsets of this set. (This is actually a general feature of first-order theories, that they are able to guarantee the existence of all definable subsets of a set, but not all subsets.) There are models where the inductive set given to us by the axiom of infinity is enormous (uncountably large), and all of its subsets contain (in addition to the standard naturals) infinitely descending membership chains of sets, each of which contains every standard natural. In these models, omega is not just the standard naturals, but also includes these infinite elements, these numbers with infinitely many predecessors. And the proof from compactness shows us that we can’t eliminate these nonstandard natural numbers.

Nonstandard ℤ

The usual way of defining the integers in ZFC is as equivalence classes of pairs of natural numbers. In particular, we define the equivalence relation ~ on ω×ω as follows: (a, b) ~ (c, d) if and only if a + d = b + c. Two ordered pairs of naturals are equivalent under this relation so long as their difference is the same. So (1, 0) is in the same class as (2, 1), which is in the same class as (15, 14).

The idea here is that the integer represented by the equivalence class of (a, b) is supposed to be what we think of as a – b. So the integer 2 is the equivalence class {(2, 0), (3, 1), (4, 2), (5, 3), …}. And the integer -2 is the equivalence class {(0, 2), (1, 3), (2, 4), (3, 5), …}. For shorthand, we’ll write the equivalence class of (a, b) as [a, b]. Thus for each positive integer n, we can write n = [n, 0], and for each negative integer n, n = [0, n].

Addition and multiplication can be defined on the integers in terms of addition and multiplication on ω.

[a, b] + [c, d] = [a + c, b + d]
[a, b] ⋅ [c, d] = [a⋅c + b⋅d, a⋅d + b⋅c]

Check to make sure that this makes sense in light of our interpretation of [a, b] as a – b. We can also define new operations on the integers that didn’t apply to the naturals, like negation.

-[a, b] = [b, a]

And we can define an order on the integers as follows:

[a, b] < [c, d] if and only if a + d < b + c

Check to make sure that this makes sense.

Now, the integers are built out of ω, so it makes sense that the nonstandard interpretations of ω carry over to nonstandard interpretations of ℤ. But there are in fact two ways in which integers can be nonstandard.

(1) The natural number 0 refers to the same object (the empty set) in every model of ZFC. But the set of natural numbers ω refers to different objects in different models, as we saw a moment ago. In this sense, ω is not categorically defined, while 0 is. But since integers are infinite sets of pairs of elements of ω, individual integers aren’t categorically defined either!

For instance, the integer 0 is the set {(n, m) ∈ ω×ω | n = m}. The number of elements in this set is the same as the number of elements of ω. So the integer 0 has as many elements as ω! This means that in a nonstandard model of ω that has nonstandard numbers like K, 0 will also contain nonstandard ordered pairs like (K, K) and (K+1, K+1).

(2) Consider the nonstandard element of ω that we’re calling K, which is larger than every standard natural number. There’s an integer [K, 0], which is the equivalence class of the ordered pair (K, 0). [K, 0] > [1, 0], because K + 0 > 0 + 1. And [K, 0] > [55, 0], because K + 0 > 0 + 55. And so on for every finite integer. Just like ω, there are models in which ℤ has integers larger than all the standard integers.

But unlike ω, in nonstandard models ℤ also has nonstandard integers less than all standard integers. Consider -[K, 0] = [0, K]. Again, you can prove that this integer is less than the integer 0, the integer -55, and so on for every standard integer you can think of.

From now on I’ll write the integer [n, 0] as n and the integer [0, n] as -n.

Nonstandard ℚ

Just as the integers were equivalence classes of pairs of naturals, the rationals are equivalence classes of pairs of integers. We define a new equivalence relation (which I’ll use the same symbol for) as follows:

For integers a, b, c, and d, (a, b) ~ (c, d) if and only if a⋅d = b⋅c.

This equates any two pairs of integers that have the same ratio. So (1, 2) ~ (2, 4) ~ (15, 30), and (-33, 17) ~ (66, -34) ~ (-99, 51). The equivalence classes of this relationship are the rational numbers. Like before, we’ll write [a, b] to represent the equivalence class that (a, b) belongs to. You can think of [a, b] as representing the rational number a/b.

Defining addition, multiplication, subtraction, and division on the rationals is pretty easy:

[a, b] + [c, d] = [a⋅d + b⋅c, b⋅d]
[a, b] ⋅ [c, d] = [a⋅c, b⋅d]
[a, b] – [c, d] = [a⋅d – b⋅c, b⋅d]
[a, b] / [c, d] = [a⋅d, b⋅c]

And the order is similarly easy to define in terms of the order on integers:

[a, b] < [c, d] if and only if a⋅d < b⋅c

Once we’ve moved to ℤ, there’s an explosion of nonstandard elements. For instance, we still have nonstandard rationals larger than every ordinary rational (like [K, 1]). But now we also have rationals like [1, K]. This rational is smaller than every ordinary positive rational [1, 10], [1, 100], [1, 1000], and so on. And it is greater than the rational number 0! So in other words we now have infinitesimal rational numbers!

What’s more, we can construct a rational like [K+1, K]. This guy is infinitesimally larger than the rational number 1. So is [K+1, K⋅2], but this one is even closer to 1 than the previous. And we also have [K+1, K2], which is even closer!

The nonstandard rational number line is crowded with these infinitesimal and infinite elements. The ordinary rationals are usually thought to be dense, but we’ve just seen that there are nonstandard rationals that are “too close together” to fit any standard rationals in. However, between any two nonstandard rationals K and K’, there’s another nonstandard rational (K + K’) / 2.

If you’ve heard of the hyperreals, you might be wondering if any of the nonstandard rationals have the same structure. The simple reason why they don’t is that the hyperreals are complete – there are no gaps – whereas the rationals are not. For instance, ZFC can prove that there is no rational [a, b] such that [a, b] ⋅ [a, b] = 2, but there is a hyperreal with that property. To get completion of the number line, we have to move on to the next step, the reals.

Nonstandard ℝ

The construction of the reals from the rationals is slightly different from the previous constructions. Instead of considering ordered pairs of rationals, we’ll consider sets of rationals. In particular, we’ll look at nonempty sets of rationals that are closed downwards (if n is in the set and m < n, then m is in the set also), and have no greatest element, but don’t include all rationals.

Each real number is one of these subsets, and ℝ is the set of all such subsets. So for instance, the real number √2 is the set {x ∈ ℚ : x2 < 2}. The order on the reals is simple to describe:

x ≤ y if and only if x ⊆ y

Addition is also easily defined:

x + y = {r + s : (r ∈ x) ∧ (s ∈ y)}

Multiplication of reals has a messier definition, but it’s nothing too crazy. And importantly, ZFC can prove that ℝ has the following feature: any non-empty subset of ℝ which is bounded above has a least upper bound.

Now, in nonstandard models, there are real numbers like K, 1/K, 1 + 1/K2 and so on. But now there’s also real numbers like √K, 2K, log(K), and any other function you can define on the reals, applied to infinite reals and infinitesimals. So in one sense, the nonstandard models have many more reals than the “ordinary reals” we learn about in calculus.

But the way we constructed the reals as subsets of the rationals opened up a new type of nonstandard phenomena, in which there end up being many fewer reals that we expect there to be. Remember that we identified reals with subsets of the rationals, and that earlier we said that there is in general no way to guarantee the existence of all subsets of an infinite set. The same applies here; when we say that ℝ = {A ⊆ ℚ | A is a Dedekind cut}, we are actually only guaranteeing the existence of those reals that can be definable as Dedekind cuts. So for instance, ZFC can prove the existence of √2 as a real number, because ZFC can define the Dedekind cut {x ∈ ℚ : x2 < 2}. Same with most real numbers you’re probably familiar with, like π and e. But considering that there are only countably many definitions in ZFC, and uncountably many reals, there must be uncountably many reals that are undefinable! These undefinable reals cannot be proven to exist, and thus there are models in which they don’t exist.

In fact, the set ℝ is the first of the sets we’ve discussed that not only has nonstandard interpretations in which it’s too large, but also nonstandard interpretations in which it’s too small. There are models of ZFC where ℝ has only countably many items! There’s a subtlety here, which is that ZFC can prove that |ℝ| > |ω|. How is this possible if there are models where ℝ is countable?

It’s possible because these models, which are missing many real numbers, are ALSO missing lots of functions, in particular the functions that would put ℝ and ω in bijective correspondence! So even though ℝ is “in reality” countable in these models, the model itself doesn’t know that, because it’s missing the functions that prove its countability.

Transfinite Nim: uncomputable games and games whose winner depends on the Continuum Hypothesis

In the game of Nim, you start with piles of various (whole number) heights. Each step, a player chooses one pile and shrinks it by some non-zero amount. Once a pile’s height has been shrunk to zero, it can no longer be selected by a player for shrinking. The winner of the game is the one that takes the last pile to zero.

Here’s a sample game of Nim:

Starting state
3, 2
After Frank’s move
2, 2
After Marie’s move
2, 1
After Frank’s move
0, 1
After Marie’s move
0, 0

Marie takes the last pile to zero, so she is the winner. Frank’s second-to last move was a big mistake; by reducing the first pile from 2 to 0, he left the only remaining pile free to be taken by Marie. In a game of Nim, you should never leave only one pile remaining at the end of your turn. If Frank had instead shrunk the first pile from 2 to 1, then the state of the piles would be (1, 1). Marie would be forced to shrink one of the two piles to zero, leaving Frank to take the final pile and win.

The strategy of Nim with two piles is extremely simple: in your turn you should always even out the two piles if possible. This is only possible if the heights are different at the start of your turn. See if you can figure out why this strategy guarantees a win!

Transfinite Nim is a version of Nim where the piles are allowed to take infinite ordinal values. So for instance, a game might have the starting position:

Starting state
ω2 + ω, ω1 + ε0

If Marie is moving first, then can she guarantee a win? What move should she make?

It turns out that the strategy for two-pile Transfinite Nim is exactly the same as for Finite Nim. Marie has a guaranteed win, because the two piles are different values. Each move she’ll just even the piles out. So for her first move, she should do the following:

Starting state
ω2 + ω, ω1 + ε0
After Marie’s move
ω2 + ω, ω2 + ω

No matter what Frank does next, Marie can just “copy” that move on the other pile, guaranteeing that Marie always has a move as long as Frank does. This proves that Marie must have the last move, and therefore win.

One important feature of Transfinite Nim is that even though we’re dealing with infinitely large piles, every game can only last finitely long. In other words, Frank has no strategy for delaying his loss infinitely long, and thus forcing a sort of “stalemate by exhaustion.” This is because the ordinals are well-ordered, and any decreasing sequence of well-ordered items must terminate. (Why? Just consider the definition of a well-ordered set: every subset has a least element. If the game were to continue infinitely long, each step decreasing the state but never terminating, then the sequence of states would be a subset of the ordinals which has no least element!)

Although the strategy of Transfinite Nim is no more interesting than Finite Nim, the game does have some interesting features that it inherits from the ordinals. There are sets of ordinal numbers such that the ordering between them is uncomputable (i.e. that no Turing machine can take as input any two ordinals from that set and correctly assess whether one is larger than the other). For such sets, the ability to compute a winning strategy is called into question.

For instance, the set of all countable ordinals is uncomputable. The quick proof is that there are uncountably many countable ordinals – otherwise in ZFC the set of countable ordinals would itself be a countable ordinal and would thus contain itself – and any Turing machine can only compare countably many things.

The smallest uncomputable ordinal (which, in ZFC, is exactly the set of all computable ordinals) is called the Church Kleene ordinal and written ω1CK. Imagine the starting state of the game is two different ordinals that are both larger than ω1CK. If you’re moving first, then you have to determine which of the two ordinals is larger, in order to even them out. But this is not in general possible! So even if you go first and the two piles are different sizes, you might not be able to guarantee a win.

Suppose Marie is allowed uncomputable strategies, and Frank is only allowed computable strategies. Suppose further that the starting state involves two ordinals A and B, both larger than the Church-Kleene, and that the ordinals are expressed in some standard notation (so that you can’t write the same ordinal two different ways). There are a few cases.

Case 1: A = B, Marie goes first.
Marie decreases one of the two ordinals. Despite not being able to compute the order on the ordinals, Frank can just mimic her move. This will continue until Frank wins.

Case 2: A = B, Frank goes first.
Frank decreases one of the two ordinals, and Marie mimics. Marie eventually wins.

Case 3: A ≠ B, Marie goes first.
Marie can tell which of the ordinals is larger, and decreases that one to even out the two piles. Marie wins.

Case 4: A ≠ B, Frank goes first.
Frank can’t tell which of the ordinals is larger and can’t try to even them out, as doing so might result in an invalid move (trying to increase the smaller pile to the height of the larger one). So Frank does some random move, after which Marie is able to even out the two piles. Marie wins.

Finally, here’s a starting state for a game of Transfinite Nim:

ω1, ℶ1

ω1 is the first uncountable ordinal, and ℶ1 is the first ordinal with continuum cardinality. Frank goes first. Does he have a winning strategy?

It depends on whether ω1 = ℶ1. If the two are equal, then Frank can’t win, because he’s starting with two even piles. And if ω1 < ℶ1, then Marie can’t win, because Frank can decrease the ℶ1 pile to ω1.

If we suppose that the players must be able to prove a move’s validity in ZFC before playing that move, then the first player couldn’t decrease the ℶ1 pile to ω1. The first player still has to do something, and whatever he does will change the state to two ordinals that are comparable by ZFC.

What about larger starting ordinals whose size comparison is independent of ZFC, like ω15 and ℶ15? If the new state after the first player’s move move also involves two ordinals whose size comparison is independent of ZFC, then the second player will also be unable to even them out. This continues until one of them eventually decreases a pile to an ordinal whose size is comparable by ZFC to the other pile. So the winner will depend on who knows more pairs of ordinals less than the starting values with values that ZFC can’t compare. In fact, each player wants to force the other player to make the values ZFC-comparable, so they’ll be able to even the piles out on their turn.

A Coloring Problem Equivalent to the Continuum Hypothesis

Summary
Is there an infinity in between ℵ0 and 20? Is there a way to color the plane with countably many colors so that there are no monochromatic triangles lined up with the x and y axes? If these questions seem unrelated to you, then we had the same initial reaction. But it turns out that an intermediate infinity exists if and only if no such coloring exists.

CH and Coloring

Imagine taking the plane ℝ2 and coloring it entirely. You are allowed infinitely many colors, but only a countable infinity (so no using the whole continuous spectrum). Some example colorings:

The colorings don’t have to be contiguous on the plane like in the ones you just saw. They can have individual points of one color such that neighborhoods of arbitrarily small radius around them are colored differently. So we can get wacky colorings like:

Think about right triangles on the plane that line up with the x and y axes. For instance:

For each of these triangles, of the three sides one must be parallel to the x-axis and another to the y-axis. Now, for a given coloring, we can ask if there exist any triangles whose corners are all the same color. Importantly, we’re only interested in the three corner points, not the entire side length. For the earlier colorings we looked at, we can find such triangles fairly easily:

Even for our wacky coloring, it shouldn’t be too hard to find three points of the same color that meet the criterion.

Now here’s a question for you to ponder: is there any way to color the plane so that no such monochromatic right triangles exist?

I encourage you to pause here and try your best to come up with such a coloring. Use this as a chance to develop some intuitions on whether you think that a coloring like this should exist or not. It’s important to keep in mind that the amount of possible right triangles on the plane is 20, while the number of colors you have available to you is just ℵ0.

(…)

(pause for thought)

(…)

Now, I shall prove to you that the existence of such a coloring is equivalent to the Continuum Hypothesis! The rest of this post will be more involved, and even though I’ve tried to make it less scary with pictures aplenty, it will most likely take a slow and attentive read-through to grasp. But the proof is really cool, so it’s worth it. Oh and also, I have assumed some things as background knowledge for the sake of space. In particular, understanding the proof requires that you understand what ordinals are and why the set of all countable ordinals is the first uncountable ordinal.

If the Continuum Hypothesis is true, then a coloring with no monochromatic right triangles exists.

The proof outline: Consider the first uncountable ordinal, ω1, which is the set of all countable ordinals. We construct a coloring on ω1 × ω1 that satisfies the no-monochromatic-right-triangles property. If the Continuum Hypothesis is true, then |ω1| = |ℝ|, so we form a bijection from ω1 to ℝ. Finally, we use this bijection to map our coloring on ω1 × ω1 to a coloring on ℝ × ℝ, and show that coloring also satisfies the same property.

Alright, so consider ω1 × ω1. If we were to visualize it, it would look something this:

This “plane” is discrete, so in a certain sense it much more closely resembles ℕ2 than ℝ2. The set of colors we’ll be using will be labeled with integers. We split the colors into two categories, the negatives N and the positives P.

N = {-1, -2, -3, …}
P = {0, 1, 2, 3,…}

For each α in ω1, we construct two bijections f: α → N and g: α → P. These bijections exist because every element of ω1 is countable.

We use f and g to construct our coloring. For each α in ω1, we assign the color f(β) to (α, β) for each β < α, and we assign the color g(β) to (β, α), for each β ≤ α.

Since f and g are both bijections, no colors are repeated along the path from (α, 0) to (α, α) to (0, α). And we can easily show that no monochromatic right triangles exist under this coloring!

If a triangle crosses y = x, then it has at least one point colored from N and another from P. N and P are disjoint, so the triangle cannot be monochromatic.

And if a triangle doesn’t cross y = x, then it has two points lined up so as to have different colors:

So we have no monochromatic triangles! Now, assuming the Continuum Hypothesis, |ω1| = |ℝ|, so we can form a bijection M from ω1 to ℝ. We use M to map our original coloring to a new coloring on ℝ × ℝ. If (a, b) ∈ ω1 × ω1 is colored C, then we also color (M(a), M(b)) ∈ ℝ × ℝ. Under this map, non-monochromatic triangles stay non-monochromatic. So we’ve constructed a coloring on ℝ × ℝ with no monochromatic triangles!

If a coloring with no monochromatic right triangles exists, then the Continuum Hypothesis is true.

The proof outline here is to show that any infinite subset A of ℝ either has cardinality ℵ0 or 20. This shows that there are no intermediate cardinalities between these two.

Consider any infinite subset A ⊆ ℝ. For any a, define Ya to be the y-coordinates of points along the vertical line through (a, 0) that have no color repeats on that line. So Ya = {y ∈ ℝ | the color of (a, y) is different from (a, y’) for all y’ ≠ y}.

Notice that Ya must be countable, because all its elements have different colors and there are only countably many different colors to have. So |Ya| = ℵ0.

Define YA to be the union of Ya, for each a in A. Since each Ya is countable and A is at least ℵ0, |YA| ≤ |A|⋅ℵ0 = |A|. There are two cases: either YA = ℝ, or YA ≠ ℝ.

Case 1: YA = ℝ. In this case |ℝ| = |YA| ≤ |A|, so |A| ≥ |ℝ|. Since A is a subset of ℝ, this implies that |A| = |ℝ|.

Case 2: YA ≠ ℝ. In this case, there’s some real number y* in ℝ – YA. By definition y* is outside YA, so for each vertical line {x} × ℝ there is at least one point with the same color as (x, y*) besides itself.

Each (x, y*) with x ∈ A must have a different color. Proof: Suppose not. Then we have two elements of A, x1 and x2, such that the color of (x1, y*) is the same as the color of (x2, y*). By the last paragraph’s result, we can pick a third point (x1, y’) that has the same color as (x1, y*). These three points form a monochromatic right triangle, but we’ve assumed that’s impossible! Contradiction

So each point in A × {y} must have a different color. Thus, the cardinality of A × {y} must be countable. A is an infinite set, so |A| = ℵ0.

So in case one |A| = |ℝ| = 20, and in case two |A| = ℵ0. And this is true for every infinite subset A of ℝ. So every subset of ℝ is either finite, countable, or has the same cardinality as ℝ. And this is the continuum hypothesis!

No known unknowns in Peano arithmetic

Reports that say that something hasn’t happened are always interesting to me, because as we know, there are known knowns; there are things we know we know. We also know there are known unknowns; that is to say we know there are some things we do not know. But there are also unknown unknowns—the ones we don’t know we don’t know. And if one looks throughout the history of our country and other free countries, it is the latter category that tend to be the difficult ones.

Donald Rumsfeld

Rumsfeld’s statement may have been correct about American politics, but he was wrong in the context of Peano arithmetic. In Peano arithmetic, every known is a known known, and every unknown is an unknown unknown. Let me explain.

A paragraph for background: PA expresses enough arithmetic to allow it to encode sentences like “φ is provable from the axiom set T” as arithmetic properties of the natural numbers. But Peano arithmetic also has nonstandard models containing objects that aren’t natural numbers. It’s possible for a sentence about arithmetic to be true of these nonstandard models and false of the standard models. And in particular, the sentences that Gödel-encode proofs of other sentences can take on a different meaning in nonstandard models. For instance, the Gödel encoding of a sentence like “PA is consistent” looks like “There is no number with the property that it encodes a proof of 0=1 from the axioms of PA”. This might be true of the standard natural numbers, but false in some nonstandard model. There might be some nonstandard number that satisfies Gödel’s arithmetic formula for a proof of 0=1 from the axioms of PA, but which doesn’t actually encode any such proof (as only standard natural numbers Gödel-encode valid proofs). Gödel encodings are only logically equivalent to the statements they encode in the standard model of the natural numbers.

Of all the sentences of the form “φ is provable from the axiom set T”, which are provable from the axioms of Peano arithmetic? How much does Peano arithmetic know about what arbitrary formal theories prove? For all of the following results, I will assume that PA is consistent.

Our first result: If PA proves “φ is provable from T”, then φ really is provable from T.

This follows from the soundness of first-order logic. If PA proves “φ is provable from T”, then this sentence must be true in all models. In particular, it’s true in the standard model. And if it’s true in the standard model, then it must actually be true that φ is provable from T.

Second result: If φ is provable from T, then PA proves “φ is provable from T.”

This follows from three facts: (1) that the standard natural numbers are a subset of every nonstandard model, (2) that the sentence “φ is provable from T” is a statement of the form “There exists a number with some property”, and (3) that first-order logic is complete.

The Gödel encoding of “φ is provable from T” is something like “there’s some number n with the property that n encodes a proof of φ from T.” Now, suppose that φ is provable from T. Then there’s a standard natural number that encodes this proof. And since every nonstandard model contains every natural number, this number exists in all these models as well! So the statement is true in all models. So by completeness, it’s provable from PA.

So far we have that PA proves “φ is provable from T” if and only if φ is actually provable from T. Loosely, if a theory can prove something, then Peano arithmetic knows this. Even though ZFC is a stronger theory than Peano arithmetic, there’s nothing that ZFC can prove that PA doesn’t know it can prove. In particular, ZFC proves the consistency of PA, so PA proves that ZFC proves the consistency of PA! This of course doesn’t mean that PA proves its own consistency, because PA doesn’t trust ZFC to be true (it doesn’t accept the axioms of ZFC).

Furthermore, for every sentence that PA can prove, PA can prove that PA can prove it. In this sense, everything that PA knows is a known known. What we’ll see next is that for PA, every unknown is an unknown unknown. For Peano arithmetic, and in fact for ANY consistent theory that expresses enough arithmetic to be subject to Gödel’s theorems, there are no known unknowns.

Third result: If φ is not provable from PA, then “φ is not provable from PA” is not provable from PA.

This follows from Gödel’s second incompleteness theorem and the principle of explosion.

Now, the proof of our third result. Suppose “φ is not provable from PA” is provable from PA. By the principle of explosion, if PA was consistent then EVERYTHING would be provable from it. So by finding something that isn’t provable from its axioms, PA is able to prove its own consistency! But then, by Gödel’s second incompleteness theorem, PA must be inconsistent. This contradicts our background assumption that PA is consistent.

Note that the sentence “φ is not provable from PA” isn’t the same type of sentence as “φ is provable from PA”. As we saw a moment ago, the second is a sentence of the form “there’s some number n with a particular property,” and all of these sentences are provable from PA if and only if they’re true of the standard naturals. But the first sentence is of the form “there’s no number n with a particular property”, which is not the same! It’s possible for this sentence to be true of the standards but false of the nonstandards. All we need is for there to be no standard numbers and at least one nonstandard number with that property.

So if PA is consistent, then it can never prove that it doesn’t prove something. Now, notice that all that I’ve said applies more generally than just Peano arithmetic. For any consistent mathematical theory that express sufficient arithmetic for Gödel’s incompleteness theorems to apply, every known is a known known and every unknown is an unknown unknown.

The subtlety of Gödel’s second incompleteness theorem

Gödel’s second incompleteness theorem is an order of magnitude more subtle than his first. It’s commonly summarized as “no consistent theory strong enough to do arithmetic can prove its own consistency.” But there’s a lot of subtlety in both the “strong enough to do arithmetic” and the “prove its own consistency.” First of all, what exactly counts as strong enough to do arithmetic? Peano arithmetic certainly does, but it has an infinite axiom schema. Do any finite theories meet the criterion of “strong enough to do arithmetic”? It urns out that the answer to this is yes! Robinson arithmetic, which is what you get if you remove the infinite axiom schema of induction from PA, meets the requirement.

There are weaker theories of arithmetic, like Presburger arithmetic (which has only addition) and Skolem arithmetic (only multiplication), that don’t meet the criterion, and are therefore not subject to the incompleteness theorems. And it turns out that both of these theories are actually decidable! This is even stronger than being sound and complete; soundness and completeness tell us that there’s an algorithm that determines after a finite amount of time that a given sentence is a tautology, but not necessarily that such an algorithm exists to determine that a sentence is NOT a tautology. Decidability gives us both: not only can we classify any tautology as such in a finite time, we can also classify any non-tautology as such in a finite time.)

The amount of arithmetic we need is exactly the amount that Gödel uses to prove the incompleteness theorems. This has two parts: one, the theory must express enough arithmetic to do Gödel encoding (and therefore to express the notion of “provability”), and two, the theory must be able to formalize diagonalization. Dan Willard came up with theories that formalize enough arithmetic to do the first but not the second: these are theories that can talk about their own provability via Gödel coding, but are still too weak to be subject to the incompleteness theorems. Thus, these theories can actually prove their own consistency! These fascinating theories are called self-verifying theories.

Everything I’ve said so far has been about the subtlety of the notion of “strong enough to do arithmetic”. Now I want to talk about the subtlety of the notion of “proving one’s own consistency.” I’ll do this by first taking a brief interlude to talk about an argument I recently saw against the Consistency Thesis in philosophy of math that uses this notion. A friend of mine writes:

Consistency is a weak soundness property.

Here’s what I’ll call the Consistency Thesis: (Mathematical sentences are justified and/or true when they are part of a consistent theory.) What I want to do is to raise a problem for that thesis. Maintaining the Consistency Thesis leads to bizarre mathematical beliefs which don’t lend themselves to being true or justified. A merely ‘consistent’ theory can claim P(0), P(1), and so on, and yet claim that there’s some n for which ¬P(n). Such a theory is omega-inconsistent.

There are mathematical theories which despite their syntactic consistency, claim to be inconsistent. They will claim to be able to derive that 0=1 and yet never derive 0=1. One example of a consistent but unsound theory would be this: [suppose we add to ZFC the new axiom “ZFC is inconsistent”. If ZFC is consistent, then the new theory ZFC’ is consistent (by Godel’s second incompleteness theorem), even though it falsely proves that ZFC is inconsistent.]

So far, I’ve only mentioned omega-inconsistent theories. I should also mention that there are omega-consistent theories which are arithmetically unsound. A much stronger soundness property would be soundness in omega-logic, which entails consistency, omega-consistency, and arithmetical soundness.

Here’s the response I gave:

You say that an adherent to the Consistency Thesis believes in mathematical theories that are in fact consistent (no contradictions can be proved from them) but claim to be inconsistent; for instance ZFC + ¬Con(ZFC). This bit about “claiming to be inconsistent” is subtler than it might initially seem. There’s a very important difference between Con(ZFC) and “ZFC is consistent”. A first order theory can’t talk directly about its own consistency, it can only talk about properties of the objects in its models. We are allowed an indirect method to talk about consistency of theories, Gödel encoding. But this method has problems.

Gödel encoding allows us to write down statements that, if understood to be about the natural numbers, are equivalent to the assertion that a theory proves a contradiction. But this “if understood to be about the natural numbers” is a very important qualification, because no first order theory categorically defines the natural numbers (i.e. no first order theory has as its only model the natural numbers). More generally, no theory within a logic that has a sound, complete, and finitary proof system categorically describes the natural numbers (these are the only logics that a Formalist will see as well-defined, by the way).

What this means is that when we write “Con(ZFC)”, we’re actually using a short-hand for a complicated sentence about the objects in our models, and this complicated sentence is NOT equivalent to the claim that no contradiction can be proven from ZFC. Con(ZFC) could be false in a model even if ZFC is consistent, and Con(ZFC) could be true in a model even if ZFC is inconsistent, so long as that model is not the standard natural numbers.

So the adherent of the Consistency Thesis is not actually committed to weird beliefs about a theory being consistent and claiming its own consistency; they are just committed to the belief that the natural numbers are not well-defined.

The same objection applies to the claim that they have to accept as valid theories that claim P(0), P(1), and so on, but also that there’s some n for which ¬P(n). That’s true, and that’s fine! One can just say that such theories are not theories of the standard natural numbers; the n for which ¬P(n) is some other type of mathematical object that is not a natural number.

A TL;DR for my response: “Con(T)” only means “T is consistent” if T is about the natural numbers. Furthermore, the theories that assert their own inconsistency never have the natural numbers as a model. So it’s ultimately not very weird that these theories assert “¬Con(T)”… this statement doesn’t actually mean “T is inconsistent” in any of the models of the theory!

Undecidability results on lambda diagrams

For background on how lambda diagrams work, look HERE.

The equivalence of lambda calculus with Turing machine computations, and the logical limitations on computation, combine to give us interesting limitative results on lambda diagrams.

Reducibility, Absolute and Contingent

We begin by defining a notion of reducibility. A lambda diagram is reduced if there is no beta reduction that can be applied to the diagram. Here are some examples of reduced diagrams:

And here are some examples of non-reduced diagrams:

For each of these diagrams, if you keep applying beta reductions in any order you will eventually get to a point where the diagram is reduced. But this is not the case for all lambda diagrams. Check out the following diagram for M M (the Mockingbird function applied to itself):

Applying beta reduction gives you the following:

And we’re back where we started! This tells us that our diagram is irreducible.

Sometimes there are multiple beta reductions that can be applied. In these cases, it might be that one sequence of beta reductions reduces the diagram, but another sequence goes forever without reducing it. For instance, here is the diagram for False (M M):

If we first feed (M M) as input to False, then we get a reduced diagram:

But if we first feed M to M, then what we get back is the same thing we started with:

If we keep feeding M to M, we keep getting back the same diagram and never fully reduce. When a diagram reduces on some series of beta reductions and never reduces on others, we call the diagram contingently reducible. If every strategy of beta reduction eventually reduces the diagram, we call it absolutely reducible. And if no strategy reduces the diagram, we call it absolutely irreducible.

Some interesting observations:

It’s possible for two diagrams A and B to both be absolutely reducible, but for A B to be absolutely irreducible.

It’s also possible for a diagram A to be absolutely irreducible, but the diagram A B to be contingently reducible (though never absolutely reducible).

It’s not, however, possible for A to be absolutely irreducible but A B to be absolutely reducible. Why? Simply because you can choose your reduction strategy for A B to be any of the reduction strategies for A, each of which always fails. In general, if any sub-diagram within a lambda diagram is contingently reducible, then the whole diagram cannot be absolutely reducible. Why? Simply because you can choose your reduction strategy for the whole diagram to be a reduction strategy that fails for the sub-diagram (at least one such strategy must exist, by our assumption that the sub-diagram is only contingently reducible). This same reduction strategy that fails for the sub-diagram, also fails for the whole diagram. So the whole diagram cannot be absolutely reducible.

Here’s a question that I don’t know the answer to: Is it possible for an application of of an irreducible diagram to an irreducible diagram to be reducible? I suspect not.

The distinction between contingent and absolute reducibility goes away the moment you fix a general reduction strategy that for any diagram gives a unique prescription for which reduction should be applied (if any are possible). Henceforth, I will assume that a particular reduction strategy has been fixed, and so will only speak of reducibility and irreducibility, no absolutes and contingents.

Reducibility Oracles

It might be convenient if we could always know whether a given diagram reduces or not. We might even wonder if there’s a lambda diagram that “computes” the answer to this question. Let’s suppose there is such a diagram, which we’ll call a “reducibility oracle.” It’s a black box, so we’ll denote its diagram as a box with an R:

We’ll define the behavior of the reducibility behavior as follows:

We’ll now design a larger diagram R’ that uses R:

Let’s see how R’ behaves when given an input (call it F).

Notice that if F F is reducible, then R’ F is irreducible. And if F F is irreducible, then R’ F is reducible.

So what happens if we feed R’ to itself? Well, by the logic of the above paragraph, if R’ R’ is reducible, then R’ R’ is irreducible. And if R’ R’ is irreducible, then R’ R’ is reducible. We have obtained a contradiction! And so no such reducibility oracle diagram can exist.

But wait, there’s more!

Let’s define S(N) to be the size of the smallest lambda diagram that reduces to the Church numeral N, where size means the number of lines required to build the diagram.

There’s a tight correspondence between lambda calculus and programming languages. In fact, lambda calculus can be thought of as just one specific highly abstract functional programming language (along the lines of Haskell and Lisp). Recall now that the Kolmogorov complexity K of a string s (given a particular programming language) is defined as the shortest program that outputs that string. If we choose our programming language to be lambda calculus, then we get a correspondence between K(N) and S(N). This correspondence gives us the following results:

There is no lambda diagram that serves as a “size oracle” – i.e. a lambda diagram that when fed a number N, returns S(N). (Analogous to the uncomputability of Kolmogorov complexity)

For any sound proof system F, there’s a number L such that F cannot prove that the smallest lambda diagram that reduces to any number has more than L lines. (Analogous to Chaitin’s incompleteness theorem).

Sizes of lambda diagrams

Let me make some final notes on the function S(N). Every number has a “standard form”: λfx.f (f … (f x))), with N copies of f. The lambda diagram for this looks like an upside down ladder with N steps:

This has 2N + 3 lines, so we know that for all N, S(N) ≤ 2N + 3. We can define a number as “irreducibly complex” if S(N) = 2N + 3. For instance, 0 and 1 are irreducibly complex, and I think that 2 and 3 are as well:

This raises an interesting question: What is the smallest number that isn’t irreducibly complex? Another question is whether there are arbitrarily large irreducibly complex numbers (I strongly suspect not, but am not positive).

We can find some upper bounds for S(N), where N is a product or a power, as follows:

In general, for any computable function f whatsoever, there is a lambda diagram for f, to which we can append the diagram for any number N to represent f(N). The size of this diagram will just be the size of the diagram for f, plus the size of the diagram for N, plus 1 (for the line connecting the two). This means that if a number M can be written as f(N) for some computable f, then we can produce a diagram for M whose size is some constant + 2N. Thus S(f(N)) grows at most linearly with respect to N, no matter how fast-growing f is.