The Hypergame Paradox

Credit to Joel David Hamkins, who I heard discussing this paradox on an episode of the podcast My Favorite Theorem.

Define a finite game to be any two-player turn-based game such that every possible playthrough ends after finitely many turns. For example, tic-tac-toe is a finite game because every game ends in at most nine turns.. So is chess with the 50-move-rule enforced (if 50 moves are taken without any pawn advances or captures, then the game ends in a draw). In both these examples, there’s an upper bound to how long the game can last, but this is not required. A game of transfinite Nim would count as finite; every game lasts for only finitely many turns, even though there is no upper bound on the number of turns it takes.

Now, consider the game Hypergame. To play Hypergame, Player 1 begins by choosing any finite game G. Then Player 2 plays the first move of G, Player 1 plays the second move of G, and so on until the game is completed. (Since Player 1 chose a finite game, this will always happen after some finite amount of time.)

Is Hypergame a finite game? Yes, we can easily see that it must be. Whatever game Player 1 chooses will be over after n steps, for some finite n. So that playthrough of Hypergame will have taken n+1 steps.

But if Hypergame is a finite game, then it is a valid choice for the first move of Hypergame! So we can now imagine the following playthrough of Hypergame:

A Troubling Playthrough of Hypergame
Player 1: For the finite game that we shall play, I pick Hypergame.
Player 2: Hm, okay. So now I’m playing the first move of Hypergame. So I must now choose any finite game. I’ll choose Hypergame!
Player 1: Alright so I’m again playing the first move of this new game of Hypergame. I’ll choose Hypergame again.
Player 2: And I choose Hypergame again as well.
So on forever…

At no point does either player violate the rules of Hypergame. And yet, we ended up with an infinite playthrough of Hypergame, which we proved was impossible! So we have a contradiction. What is the resolution?

✵✵✵

Here’s one possible resolution, analogous to the resolutions of similar set-theoretic paradoxes.

We can think about a game as a directed rooted tree. The vertices of the tree correspond to game states, and the edges correspond to the allowed moves. The root of the tree corresponds to the starting game state, and Player 1 gets to choose which edge to travel along first. From the new vertex, Player 2 decides the next edge to travel along. And so on. The tree’s leaves correspond to ending states of the game, and each leaf is labelled according to which player won in that ending state.

In this framework, what is a finite game? As I defined it above, a finite game is just any directed rooted tree such that every path starting at the root ends at a leaf after passing through finitely many edges. This corresponds perfectly to the idea that every possible playthrough of the game takes only finitely many turns. Notice that a finite game is not necessarily a finite tree! The game tree of a finite game is only finite if it’s also finitely branching. In other words, for a game to have a finite tree requires not just that every playthrough is finitely long, but that each player always has only finitely many choices on their turn.

For instance, the game tree of Hypergame is not a finite tree, because Player 1 has infinitely many possible finite games to choose from on his first turn. How big exactly is the game tree of Hypergame? We know that we have to have a vertex corresponding to the start of any finite game, so it must be at least as large as the set of all finite games. But how large is this set?

This is where we run into problems. The game tree of a finite game can be arbitrarily large. Consider the game which starts by Player 1 choosing any real number and then immediately losing. The height of the game tree is 1, but its width is the cardinality of the continuum. Similarly for any set X we can find a game whose tree has cardinality |X|. This means that there are finite games of arbitrary cardinalities. But then as a corollary to the nonexistence of a largest cardinality, we know that there is no set of all finite games! And this implies that Hypergame has no game tree! More precisely, there is no set corresponding to the game tree of Hypergame as we defined it.

Couldn’t we instead think about Hypergame as a proper class? Sure! But then when we choose a finite game in our first move, we couldn’t be picking Hypergame, as the property “is a finite game” would only apply to sets and not proper classes. This means that we can’t actually select Hypergame as our first move! And so we avoid the paradoxical conclusion that we can keep picking Hypergame ad infinitum.

I find it quite fascinating that the seemingly innocent notion of a finite game can lead us into paradoxes involving proper classes!

Hilbert-type Infinitary Logics

I want to describe a hierarchy of infinitary logics, and show some properties of one of these logics in particular.

First, a speedy review of first order logic. In the language of first order logic we have access to parentheses {(, )}, the propositional connectives {∧, ∨, ¬, →}, the equals sign {=}, quantifiers {∀, ∃}, and a countably infinite store of variables for quantification over {x1, x2, x3, …}. These are the logical symbols common to any first-order language, but to complete the language we additionally specify a set of constant symbols {c1, c2, c3, …}, function symbols {f1, f2, f3, …}, and relation symbols {R1, R2, R3, …}. These sets can be any cardinality whatsoever. We can then define the set of grammatical sentences (“well-formed formulas”) of this language. We also interpret these symbols in a fairly straightforward way: a first-order structure has a universe of objects, and constants are assigned referents within that universe, function symbols are assigned n-ary functions on the universe, and relation symbols are assigned n-ary relations on the universe.

One consequence of the construction is that all of our sentences are finite, which puts an important cap on expressive power. Consider a language of arithmetic with constants for every natural number: {0, 1, 2, 3, …}. We might naturally want to say that the set of constants exhausts all the objects in our universe. But this takes an infinitely long sentence: ∀x (x=0 ∨ x=1 ∨ x=2 ∨ x=3 ∨ …). You might think you could be clever and find a finite expression of this idea, but it turns out that you can’t. (This is a consequence of Gödel’s first incompleteness theorem.)

So a natural extension of first-order-logic is to allow infinitely long sentences. For any two cardinal numbers α and β, define Lα,β to be first-order logic, but with conjunctions of any length < α allowed and blocks of quantifiers of any length < β. (Note that infinite conjunctions implies infinite disjunctions as well.) For example, Lω,ω is ordinary first-order logic: conjunctions and quantifier blocks of any finite length. Lω1,ω is first-order logic plus countably infinite conjunctions, but only finite quantifiers. Lω,ω1 has finite conjunctions but countably infinite quantifiers. Question: what logic is this equivalent to?

Notice that countably infinitely many quantifiers use countably infinitely many variables, but if you only have finite conjunctions you can only use finitely many of them. So this ends up being equivalent to Lω,ω. For this same reason, if β > α then Lα,β is no different from Lα,α.

Lω1,ω is especially interesting to logicians, because it’s significantly stronger than first-order logic, but not so much stronger as to lose all nice proof-theoretic properties. In particular, there’s a sound and complete proof system for Lω1,ω! It’s also quite simple: just the FOL proof system plus one new axiom and one infinitary deduction rule:

New axiom
For any m ∈ ω and any set of sentences {φn | n ∈ ω},
n∈ω φn → φm

New inference rule
φ0, φ1, φ2, … ⊢ ∧n∈ω φn

If we allow deductions of any countably infinite (successor) ordinal type, then we get a sound and complete proof system. This means that for any countable set of Lω1,ω-sentences Σ and any Lω1,ω-sentence φ, you can deduce φ from Σ just in case Σ actually (ω1,ω)-logically implies φ.

Infinite conjunctions give you a massive boost in expressive power going from FOL to Lω1,ω. You can categorically define the natural numbers: Take Peano arithmetic and add to it the axiom: ∀x ∨n∈ω (x = n). See why this works? We’ll refer back to this theory as PA*.

So Lω1,ω is powerful enough to be able to categorically define the natural numbers. We might wonder if we can also categorically define all other countable structures. It turns out that while we can’t do that, we can do something slightly weaker. For any countable structure, there’s a single Lω1,ω-sentence that defines it up to isomorphism among countable structures. The complexity of this sentence is a way of measuring the complexity of the structure, and has close connections to other measures of structure complexity. (The key word to learn more about this is Scott rank.)

What else can you do with Lω1,ω? Here’s something really cool: Tarski’s theorem on the undefinability of truth tells us that in first-order logic you cannot define a truth predicate. But in Lω1,ω, you can! Take a countable first-order language L. Add on the language of arithmetic (0, S, +, ×, ≤) to L. L is still countable, so enumerate all its sentences {φ1, φ2, φ3, …}. Now, the sentence Tr(x) := ∨n∈ω (x=n & φn) is a truth predicate: for any n, if φn is true then Tr(n) is true and if φn is false then Tr(n) is false.

You can express the idea that “infinitely many things satisfy φ(x)” with the following sentence:

n∈ω ∀x0∀x1…∀xn ∃y (φ(y) ∧ ∧k∈ω(y ≠ xk)).

Expanding this out:

∀x0 ∃y (φ(y) ∧ y≠x0)

∀x0∀x1 ∃y (φ(y) ∧ y≠x0 ∧ y≠x1)

∀x0∀x1∀x2 ∃y (φ(y) ∧ y≠x0 ∧ y≠x1 ∧ y≠x2)

The first line says “there’s at least two things satisfying φ”, the second says “there’s at least three things satisfying φ”, and so on forever.

Finally, unlike FOL, Lω1,ω is not compact. This means that there are sets of sentences without models, where every finite subset has a model. But it’s actually even worse than that! Lω1,ω is strongly non-compact. You can have an unsatisfiable set of sentences, every countable subset of which has a model! Once again, this pretty remarkable fact has a simple proof:

Take the language of Peano arithmetic and add on ℵ1-many constant symbols: {cα | α < ω1}. Now add to PA* the set of sentences {cα ≠ cβ | α ≠ β, both in ω1}. Let’s call this theory Σ. Every countable subset of Σ has a model, but Σ itself doesn’t. You can always take countably many constant symbols and assign them distinct referents in ℕ such that some natural numbers are left out. However, you can’t assign uncountably many distinct referents in ℕ while still leaving out some natural numbers, by a simple cardinality argument.

There’s an interesting twist here: Consider the set of sentences {cα ≠ cβ | α ≠ β, both < ω1CK} ⊂ Σ, where ω1CK is the Church-Kleene ordinal, the smallest uncomputable ordinal. ω1CK is countable, so this is a countable set of sentences. This set of sentences has a model, but to obtain it you must choose a bijection from the set of constants {cα | α < ω1CK} to the natural numbers. By the definition of ω1CK, this bijection is uncomputable! There are uncountably many countable ordinals above ω1CK (there are countably many computable ordinals, and uncountably many countable ordinals), so while every countable subset of Σ has a model, uncountably many of these models will be uncomputable!

Hypernaturals Simplified (Ultra Series 2)

Previous: Introduction

What is a hypernatural number? It is a collection of infinitely long sequences of natural numbers. More precisely, it is an equivalence class of these infinite sequences.

An equivalence class under what equivalence relation? This is a little tricky to describe.

I’ll start with a slight lie to simplify the story. When we see the trouble that results from our simple definition, I will reveal the true nature of the equivalence relation that gives the hypernaturals. In the process you’ll see how the notion of an ultrafilter naturally arises.

So, hypernaturals are all about infinite sequences of natural numbers. Some examples of these:

(0,1,2,3,4,5,6,7,8,9,…)
(0,1,0,2,0,3,0,4,0,5,…)
(1,2,1,2,1,2,1,2,1,2,…)
(0,2,4,6,8,10,12,14,…)
(3,1,4,1,5,9,2,6,5,3,…)

We’ll define an equivalence relation ~ between sequences as follows:

Let x and y be infinite sequences of natural numbers.
Then x ~ y iff x and y agree in all but finitely many places.

For example, (0,1,2,3,4,5,6,…) ~ (19,1,2,3,4,5,6,…), because these two sequences only disagree at one spot (the zero index).

(1,1,2,2,4,4,…) and (1,2,4,8,…) are not equivalent, because these sequences disagree at infinitely many indices (every index besides the zeroth index).

Same with (0,1,2,3,4,5,6,…) and (1,2,3,4,5,6,7,…); even though they look similar, these sequences disagree everywhere.

(2,4,6,8,10,12,14,…) and (2,0,6,0,10,0,14,…) are not equivalent, because these sequences disagree at infinitely many indices (every odd index).

One can easily check that ~ is an equivalence relation, and thus it partitions the set of sequences of naturals into equivalence classes. We’ll denote the equivalence class of the sequence (a1, a2, a3, …) as [a1, a2, a3, …]. These equivalence classes are (our first stab at the definition of) the hypernaturals!

For instance, the equivalence class of (0,0,0,0,0,…) contains (1,4,2,0,0,0,0,0,…), as well as (0,2,4,19,0,0,0,0,…), and every other sequence that eventually agrees with (0,0,0,0,…) forever. So all of these correspond to the same hypernatural number: [0,0,0,0,…]. This object is our first hypernatural number! It is in fact the hypernatural number that corresponds exactly to the ordinary natural number 0. In other words 0 = [0,0,0,0,…].

[1,1,1,1,…] is a distinct equivalence class from [0,0,0,0,…]. After all, the sequences (0,0,0,0,…) and (1,1,1,1,…) disagree everywhere. You might guess that [1,1,1,1,…] is the hypernatural analogue to the natural number 1, and you’d be right!

For any standard natural number N, the corresponding hypernatural number is [N,N,N,N,N,…], the equivalence class of the sequence consisting entirely of Ns.

Now consider the hypernatural [0,1,2,3,4,5,6,…]. Let’s call it K. Does K = N for any standard natural number N? In other words, is (0,1,2,3,4,5,6,…) ~ (N,N,N,N,N,…) true for any finite N? No! Whatever N you choose, it will only agree with (0,1,2,3,4,5,6,…) at one location. We need cofinite agreement, and here we have merely finite agreement. Not good enough! This means that K is our first nonstandard natural number!

How does K relate to the standard naturals in terms of order? We haven’t talked about how to define < on the hypernaturals yet, but it’s much the same as our definition of =.

[a1, a2, a3, …] = [b1, b2, b3, …]
iff
{ k∈ℕ | ak = bk } is cofinite

[a1, a2, a3, …] < [b1, b2, b3, …]
iff
{ k∈ℕ | ak < bk } is cofinite

Exercise: Verify that this is in fact a well-defined relation. Every equivalence class has many different possible representatives; why does the choice of representatives not matter for the purpose of describing order?

Now we can see that K > N for every standard N. Look at (0,1,2,3,4,5,…) and (N,N,N,N,…). The elements of the first sequence are only less than the elements of the second sequence at the first N indices. Then the elements of K are greater than the elements of N forever. So elements of K’s representative sequence are greater than elements of N’s representative sequence in a cofinite set of indices. Thus, K > N for every standard N. So K is an infinitely large number!

Here’s another one: K’ = [0,2,4,6,8,…]. You can see that K’ > K, because the elements of K’ are greater than those of K at all but one index (the first one). So we have another, bigger, infinite number.

Addition and multiplication are defined elementwise, so

K + K
= [0,1,2,3,4,…] + [0,1,2,3,4,…]
= [0+0, 1+1, 2+2, 3+3, 4+4, …]
= [0,2,4,6,8,…]
= K’

K’
= [0,2,4,6,8,…]
= [2⋅0, 2⋅1, 2⋅2, 2⋅3, 2⋅4, …]
= 2⋅[0,1,2,3,4,…]
= 2⋅K

Predictably, we get many many infinities. In fact, there are continuum many nonstandard hypernatural numbers!

Proof: we construct an injection f from ℝ to *ℕ. If x is a real number, then f(x) := [floor(x), floor(10x), floor(100x), floor(1000x), …]. For example, f(35.23957…) = [35,352,3523,35239,352395, …]. For any two distinct reals x and y, the sequences x and y will eventually disagree forever. So each real is mapped to a distinct hypernatural, meaning that there are no more reals than hypernaturals. At the same time, there are no more hypernaturals than reals, because there are only continuum many countable sequences of natural numbers. So |*ℕ| = |ℝ|.

It turns out that every nonstandard hypernatural number is also larger than every standard natural number. We’ll see why in a bit, but it’ll take a bit of subtlety that I’ve yet to introduce.

Now, > is transitive in ℕ. Is it also transitive in *ℕ? Yes! Suppose A > B and B > C. Choose any representative sequences (a1, a2, a3, …), (b1, b2, b3, …), and (c1, c2, c3, …) for A, B, and C. Then X = { k∈ℕ | ak > bk } and Y = { k∈ℕ | bk > ck } are both cofinite. The intersection of cofinite sets is also cofinite, meaning that X⋂Y = { k∈ℕ | ak > bk and bk > ck } = { k∈ℕ | ak > ck } is cofinite. So A > C!

It’s a good sign that > is transitive. But unfortunately, the story I’ve told you thus far starts to break down here. The greater-than relation is a total order on the natural numbers. For any naturals a and b, exactly one of the following is true: a = b, a > b, b > a. But this is not true of the hypernaturals!

Consider the two hypernatural numbers n = [0,1,0,1,0,1,…] and m = [1,0,1,0,1,0,…]. Are n and m equal? Clearly not; they disagree everywhere. So n ≠ m.

Is n > m? No. The set of indices where n’s sequence is greater than m’s sequence is {1, 3, 5, 7, …}, which is not cofinite.

So is m > n? No! The set of indices where m’s sequence is greater than n’s sequence is {0, 2, 4, 6, …}, which is also not cofinite!

So as we’ve currently defined the hypernatural numbers, the > relation is not a total relation on them. This might be fine for some purposes, but we’ll be interested in defining the hypernaturals to mirror the properties of the naturals as closely as possible. So we’ll have to tweak our definition of the hypernaturals. The tweak will occur way back at the start where we defined our equivalence relation on sequences of naturals.

Recall: we said that two sequences (a1, a2, a3, …) and (b1, b2, b3, …), are equivalent if they agree in all but finitely many places. Said another way: a ~ b if { k∈ℕ | ak = bk } is cofinite. We defined > similarly: a > b if the agreement set for > is cofinite.

The problem with this definition was that it wasn’t definitive enough. There are cases where the agreement set is neither cofinite nor finite. (Like in our previous example, where the agreement set was the evens.) In such cases, our system gives us no direction as to whether a > b or b > a. We need a criterion that still deals with all the cofinite and finite cases appropriately, but also gives us a definitive answer in every other case. In other words, for ANY possible set X of indices of agreement, either X or X’s complement must be considered “large enough” to decide in its favor.

For example, maybe we say that if { k∈ℕ | ak = bk } = {0,2,4,6,8,…}, then a > b. Now our criterion for whether a > b is: the set of indices for which ak = bk is either cofinite OR it’s the evens. This implies that [1,0,1,0,1,0,…] > [0,1,0,1,0,1,…].

Once we’ve made this choice, consistency forces us to also accept other sets besides the evens as decisive. For instance, now compare (0,0,1,0,1,0,…) and (0,1,0,1,0,1,…). The set of indices where the first is greater than the second is {2,4,6,8,…}. But notice that the first differs only cofinitely from (1,0,1,0,…), meaning that [0,0,1,0,1,0,…] = [1,0,1,0,1,0,…]. The conclusion is that [0,0,1,0,1,0,…] > [0,1,0,1,0,1,…], which says that the set of indices {2,4,6,8,…} must also be decisive. And in general, once we’ve accepted the evens as a decisive set of indices, we must also accept the evens minus any finite set.

The three criterion we’ve seen as desirable for what sets of indices will count as decisive are (1) includes all cofinite sets, (2) for any set X, either X or X’s complement is decisive, and (3) consistency. These requirements turn out to correspond perfectly to a type of mathematical object called a free ultra-filter!

In the next post, we will define ultrafilters and finalize our definition of the hypernatural numbers!

Ultrafilters, Ultraproducts, and Hypernaturals 1: Introduction

This is the series I wish would have existed a month ago when I first started learning about ultrafilters and ultraproducts. First of all, what’s the motivation for learning about these ultra-objects? I imagine that if you’re here, you probably already have some degree of interest in ultrafilters, ultraproducts, or nonstandard models of arithmetic. But I’ll see if I can bolster that interest a little bit.

The most exciting application to me personally is that ultraproducts give you a recipe for constructing new mathematical structures out of familiar ones. Out of some model M one can construct a a new model *M, which typically has a much richer and stranger structure, but is nonetheless elementarily equivalent to M (this is called Łoś’s theorem). Elementary equivalence means that all the expressive resources of first-order logic are insufficient to tell M and *M apart: they agree on all first-order sentences.

For example, the ultraproduct of ℝ (the real numbers) is *ℝ, the hyperreal numbers. The hyperreals contain an enormous supply of infinitesimal quantities clustered around every real, as well as infinitely large quantities. And all the usual operations on ℝ, like addition and multiplication, are already nicely defined in *ℝ, meaning that these infinitesimals and infinities have a well-defined algebraic structure. And Łoś’s theorem tells us that ℝ and *ℝ are elementary equivalent: any first-order sentence true of the reals is also true of the hyperreals.

Another example: the ultraproduct of ℕ (the natural numbers) is *ℕ, the hypernaturals. The hypernaturals don’t contain any infinitesimals, but they do contain an uncountable infinity of infinite numbers. And since *N and N are elementarily equivalent, *ℕ is a model of true arithmetic! This is super exciting to me. True arithmetic is this unimaginably complicated set of sentences; there’s no Turing machine that decides which sentences reside in true arithmetic, nor is there a Turing machine with a halting oracle that does the job, nor even a Turing machine with a halting oracle for Turing machines with halting oracles! The computational complexity of true arithmetic is the limit of this sequence of progressively harder halting problems: you can only decide the set if you have an oracle for the halting problem, plus an oracle for the halting problem for TMs with oracles for the halting problem, plus an oracle for the halting problem for TMs with oracles for the halting problem for TMs with oracles for the halting problem, and so on. This level of complexity is known as 0(ω).

So true arithmetic is this impossibly uncomputable set of sentences. We know trivially that ℕ is a model of true arithmetic, because that’s how true arithmetic is defined { φ | ℕ ⊨ φ }. And we also know that true arithmetic has nonstandard models, because it’s a first-order theory and no first order theory can categorically define ℕ (I think the easiest way to see why this is true is to apply the Löwenheim-Skolem theorem, but – self-promotion alert – there’s also this very nice and simple proof from first principles). And ultraproducts allow us to construct an explicit example of one of these nonstandard models! You can actually write down some of these nonstandard numbers (they look like infinite sequences of natural numbers) and discover their strange properties. For example, (2, 3, 5, 7, 11, …) is a nonstandard prime number, (1, 2, 4, 8, 16, …) is infinitely even, and (0!, 1!, 2!, 3!, 4!, …) is a multiple of every standard natural number. We’ll dive into all of this soon enough.

Ultrafilters and ultraproducts also have applications outside of logic. A fairly basic result about ultrafilters (that every ultrafilter over a finite set contains a singleton) is equivalent to Arrow’s impossibility theorem in voting theory (that any voting system with unanimity, independence of irrelevant alternatives (IIR), and finitely many voters contains a dictator). And the existence of a singleton-free ultrafilter over an infinite set (a free ultrafilter) shows that with infinitely many voters, there’s a non-dictatorial voting system with unanimity and IIR. There’s a pretty good description of those results here, but finish reading my post first!

Nick Bostrom in an old paper titled Infinite Ethics describes how to apply ultrafilters to resolve some of the issues that arise when trying to apply utilitarian ethics in a universe with infinitely many experiencers. Suppose that the current universe contains an infinite number of experiencers, each of whom is having an experience with a valence of +1. By pressing a button, you can immediately transition to a universe where everybody is now experiencing a valence of +2. Clearly pressing the button is a utilitarian good. But 1+1+1+… = ∞ = 2+2+2+…. So it looks like a standard account of utility as real numbers says that the utility of the +2 world is the same as the utility of the +1 world (both just ∞). But if we treat utilities as hyperreal numbers instead of ordinary real numbers, then we can do the comparison and get the right answer. It’s worth noting that this approach doesn’t fix all the problems raised by an infinite universe; for instance, if pressing the button only increases a finite number of experiencers from valence +1 to valence +2, then the net utility of the universe is unchanged, even with hyperreal utilities. (This corresponds to a property that we’ll see soon, which is that a free ultrafilter contains no finite sets.)

Okay, introduction over. Hopefully you’re now excitedly asking yourself the question “So what are these ultrafilters and ultraproducts anyway??” The story begins in the next post!

Hypernaturals Simplified

Asymmetry between the infinite and finite

This is well-trod territory for this blog, but I want to give a little reminder of some of the subtleties involving infinity in logic. In particular, there’s a fun asymmetry between the finite and the infinite in terms of the expressive power of first order logic.

Consider the set of all first order sentences that are true in every infinite model. This includes any tautology (like P ∨ ¬ P), as well as some other sentences like ∃x ∃y (x ≠ y) (which says that there are at least two distinct things). Are there any finite models that satisfy this entire set of sentences?

Answer is no: for any finite n you can form a sentence that says “there are at least n things”. E.g. for n = 4 we say ∃x ∃y ∃z ∃w (x≠y ∧ x≠z ∧ x≠w ∧ y≠z ∧ y≠w ∧ z≠w). This basically says “there exists an x, y, z, and w that are all distinct”, i.e. there are at least four things. This can be easily generalized to any finite n. Sentences of this form all appear in our set, because they’re true in every infinite model. But of course, no finite model satisfies all of them. Any finite model contains only so many elements, say N, and the sentence which says “there are at least N+1 things” will be false in this model.

What about the reverse question? If you consider the set of all first order sentences that are true in every fiinite model, are there any infinite models that satisfy this set?

The answer is yes! If there weren’t, then you’d have a collection of sentences that suffice to rule out all infinite models while leaving all finite models. And ruling out all and only the infinite models of a theory turns out to be like a superpower that’s inconsistent with any logic that’s sound and complete like first-order logic. The reasoning, in ultra-condensed form: finitary proofs + soundness + completeness imply compactness, and compactness gives you a quick proof that any theory with arbitrarily large finite models has infinite models (add an infinity of constant symbols to your theory and declare all of them unequal. Since every finite subset of these declarations is satisfied by some model, compactness says that the whole lot is satisfied by some model, which is an infinite model of the starting theory).

This is an interesting asymmetry between the finite and the infinite. If you assert every sentence that’s true in all infinite models, you’ve automatically excluded all finite models. But if you assert every sentence that’s true in all finite models, you haven’t excluded all infinite models. A listener that patiently waited for you to finish saying all those infinitely many sentences might still take you to be talking about some infinite model. (And in fact, by the Lowenheim-Skolem theorem, they could understand you to be talking about a model of any infinite cardinality, which is pretty dramatic.)

TL;DR: There are no finite models that satisfy the set of all first-order sentences that are true in every infinite model. On the other hand, there are infinite models that satisfy the set of all first-order sentences that are true in every finite model.

Computing truth values of sentences of arithmetic, or: Math is hard

Previously I talked about the arithmetic hierarchy for sets, and how it relates to the decidability of sets. There’s also a parallel notion of the arithmetic hierarchy for sentences of Peano arithmetic, and it relates to the difficulty of deciding the truth value of those sentences.

Truth value here and everywhere else in this post refers to truth value in the standard model of arithmetic. Truth value in the sense of “being true in all models of PA” is a much simpler matter; PA is recursively axiomatizable and first order logic is sound and complete, so any sentence that’s true in all models of PA can be eventually proven by a program that enumerates all the theorems of PA. So if a sentence is true in all models of PA, then there’s an algorithm that will tell you that in a finite amount of time (though it will run forever on an input that’s false in some models).

Not so for truth in the standard model! As we’ll see, whether a sentence evaluates to true in the standard model of arithmetic turns out to be much more difficult to determine in general. Only for the simplest sentences can you decide their truth value using an ordinary Turing machine. And the set of all sentences is in some sense infinitely uncomputable (you’ll see in a bit in what sense exactly this is).

What we’ll discuss is a way to convert sentences of Peano arithmetic to computer programs. Before diving into that, though, one note of caution is necessary: the arithmetic hierarchy for sentences is sometimes talked about purely syntactically (just by looking at the sentence as a string of symbols) and other times is talked about semantically (by looking at logically equivalent sentences). Here I will be primarily interested in the entirely-syntactic version of the arithmetic hierarchy. If you’ve only been introduced to the semantic version of the hierarchy, what you see here might differ a bit from what you recognize.

Let’s begin!

The simplest types of sentences have no quantifiers at all. For instance…

0 = 0
2 ⋅ 2 < 7
(2 + 2 = 4) → (2 ⋅ 2 = 4)

Each of these sentences can be translated into a program quite easily, since +, ⋅, =, and < are computable. We can translate the → in the third sentence by converting it into a conjunction:

## (2 + 2 = 4) → (2 ⋅ 2 = 4)
not(2 + 2 == 4 and not 2 * 2 == 4)

Slightly less simple-looking are sentences with bounded quantifiers:

∀x < 10 (x + 0 = x)
∃x < 100 (x + x = x)
∀x < 5 ∃y < 7 (x > 1 → x⋅y = 12)
∃x < 5 ∀y < x ∀z < y (y⋅z ≠ x)

In each of these examples, the bounded quantifier could in principle be expanded out, leaving us with a finite quantifier-free sentence. This should suggest to us that adding bounded quantifiers doesn’t actually increase the computational difficulty.

We can translate sentences with bounded quantifiers into programs by converting each bounded quantifier to a for loop. The translation slightly differently depending on whether the quantifier is universal or existential:

def Aupto(n, phi):
    for x in range(n):
        if not phi(x):
            return False
    return True
def Elessthan(n, phi):
    for x in range(n):
        if phi(x):
            return True
    return False

Note that the second input needs to be a function; reflecting that it’s a sentence with free variables. Now we can quite easily translate each of the examples, using lambda notation to more conveniently define the necessary functions

## ∀x<10 (x + 0 = x)
Aupto(10, lambda x: x + 0 == x)

## ∃x<100 (x + x = x)
Elessthan(100, lambda x: x + x == x)

## ∀x<5 ∃y<7 ((x > 1) → (x*y = 12))
Aupto(5, lambda x: Elessthan(7, lambda y: not (x > 1 and x * y != 12)))

## ∃x<5 ∀y<x ∀z<y (y⋅z ≠ x)
Elessthan(5, lambda x: Aupto(x, lambda y: Aupto(y, lambda z: y * z != x)))

Each of these programs, when run, determines whether or not the sentence is true. Hopefully it’s clear how we can translate any sentence with bounded quantifiers into a program of this form. And when we run the program, it will determine the truth value of the sentence in a finite amount of time.

So far, we’ve only talked about the simplest kinds of sentences, with no unbounded quantifiers. There are two names that both refer to this class: Π0 and Σ0. So now you know how to write a program that determines the truth value of any Σ00 sentence!

We now move up a level in the hierarchy, by adding unbounded quantifiers. These quantifiers must all appear out front and be the same type of quantifier (all universal or all existential).

Σ1 sentences: ∃x1 ∃x2 … ∃xk Phi(x1, x2, …, xk), where Phi is Π0.
Π1 sentences: ∀x1 ∀x2 … ∀xk Phi(x1, x2, …, xk), where Phi is Σ0.

Some examples of Σ1 sentences:

∃x ∃y (x⋅x = y)
∃x (x⋅x = 5)
∃x ∀y < x (x+y > x⋅y)

And some Π1 sentences:

∀x (x + 0 = x)
∀x ∀y (x + y < 10)
∀x ∃y < 10 (y⋅y + y = x)

We can translate unbounded quantifiers as while loops:

def A(phi):
    x = 0
    while True:
        if not phi(x):
            return False
        x += 1

def E(phi):
    x = 0
    while True:
        if phi(x):
            return True
        x += 1

There’s a radical change here from the bounded case, which is that these functions are no longer guaranteed to terminate. A(Φ) never returns True, and E(Φ) never returns False. This reflects the nature of unbounded quantifiers. An unbounded universal quantifier is claiming something to be true of all numbers, and thus there are infinitely many cases to be checked. Of course, the moment you find a case that fails, you can return False. But if the universally quantified statement is true of all numbers, then the function will have to keep searching through the numbers forever, hoping to find a counterexample. With an unbounded existential quantifier, all one needs to do is find a single example where the statement is true and then return True. But if there is no such example (i.e. if the statement is always false), then the program will have to search forever.

I encourage you to think about these functions for a few minutes until you’re satisfied that not only do they capture the unbounded universal and existential quantifiers, but that there’s no better way to define them.

Now we can quite easily translate our example sentences as programs:

## ∃x ∃y (x⋅x = y)
E(lambda x: E(lambda y: x * x == y))

## ∃x (x⋅x = 5)
E(lambda x: x * x == 5)

## ∃x ∀y < x (x+y > x⋅y)
E(lambda x: Aupto(x, lambda y: x + y > x * y))

## ∀x (x + 0 = x)
A(lambda x: x + 0 == x)

## ∀x ∀y (x + y < 10)
A(lambda x: A(lambda y: x + y < 10))

## ∀x ∃y < 10 (y⋅y + y = x)
A(lambda x: Elessthan(10, y * y + y == x))

The first is a true Σ1 sentence, so it terminates and returns True. The second is a false Σ1 sentence, so it runs forever. See if you can figure out if the third ever halts, and then run the program for yourself to see!

The fourth is a true Π1 sentence, which means that it will never halt (it will keep looking for a counterexample and failing to find one forever). The fifth is a false Π1 sentence, so it does halt at the first moment it finds a value of x and y whose sum is 10. And figure out the sixth for yourself!

The next level of the hierarchy involves alternating quantifiers.

Σ2 sentences: ∃x1 ∃x2 … ∃xk Φ(x1, x2, …, xk), where Φ is Π1.
Π2 sentences: ∀x1 ∀x2 … ∀xk Φ(x1, x2, …, xk), where Φ is Σ1.

So now we’re allowed sentences with a block of one type of unbounded quantifier followed by a block of the other type of unbounded quantifier, and ending with a Σ0 sentence. You might guess that the Python functions we’ve defined already are strong enough to handle this case (and indeed, all higher levels of the hierarchy), and you’re right. At least, partially. Try running some examples of Σ2 or Π2 sentences and see what happens. For example:

## ∀x ∃y (x > y)
A(lambda x: E(lambda y: x > y))

It runs forever! If we were to look into the structure of this program, we’d see that A(Φ) only halts if it finds a counterexample to Φ, and E(Φ) only halts if it finds an example of Φ. In other words A(E(Φ)) only halts if A finds out that E(Φ) is false; but E(Φ) never halts if it’s false! The two programs’ goals are diametrically opposed, and as such, brought together like this they never halt on any input.

The same goes for a sentence like ∃x ∀y (x > y): for this program to halt, it would require that ∀y (x > y) is found to be true for some value of x, But ∀y (x > y) will never be found true, because universally quantified sentences can only be found false! This has nothing to do with the (x > y) being quantified over, it’s entirely about the structure of the quantifiers.

No Turing machine can decide the truth values of Σ2 and Π2 sentences. There’s a caveat here, related to the semantic version of the arithmetic hierarchy. It’s often possible to take a Π2 sentence like ∀x ∃y (y + y = x) and convert it to a logically equivalent but Π1 sentence like ∀x ∃y<x (y + y = x). This translation works, because y + y = x is only going to be true if y is less than or equal to x. Now we have a false Π1 sentence rather than a false Π2 sentence, and as such we can find a counterexample and halt.

We can talk about a sentence’s essential level on the arithmetic hierarchy, which is the lowest level of the logically equivalent sentence. It’s important to note here that “logically equivalent sentence” is a cross-model notion: A and B are logically equivalent if and only if they have the same truth values in every model of PA, not just the standard model. The soundness and completeness of first order logic, and the recursive nature of the axioms of PA, tells us that the set of sentences that are logically equivalent to a given sentence of PA is recursively enumerable. So we can generate these sentences by searching for PA proofs of equivalence and keeping track of the lowest level of the arithmetic hierarchy attained so far.

Even when we do this, we will still find sentences that have no logical equivalents below Σ2 or Π2. These sentences are essentially uncomputable; not just uncomputable in virtue of their form, but truly uncomputable in all of their logical equivalents. However, while they are uncomputable, they would become computable if we had a stronger Turing machine. Let’s take another look at the last example:

## ∀x ∃y (x > y)
A(lambda x: E(lambda y: x > y))

Recall that the problem was that A(E(Φ)) only halts if E(Φ) returns False, and E(Φ) can only return True. But if we had a TM equipped with an oracle for the truth value of E(Φ) sentences, then maybe we could evaluate A(E(Φ))!

Let’s think about that for a minute more. What would an oracle for the truth value of Σ1 sentences be like? One thing that would work is if we could run E(Φ) “to infinity” and see if it ever finds an example, and if not, then return False. So perhaps an infinite-time Turing machine would do the trick. Another way would be if we could simply ask whether E(Φ) ever halts! If it does, then ∃y (x > y) must be true, and if not, then it must be false.

So a halting oracle suffices to decide the truth values of Σ1 sentences! Same for Π1 sentences: we just ask if A(Φ) ever halts and return False if so, and True otherwise.

If we run the above program on a Turing machine equipped with a halting oracle, what will we get? Now we can evaluate the inner existential quantifier for any given value of x. So in particular, for x = 0, we will find that Ey (x > y) is false. We’ve found a counterexample, so our program will terminate and return False.

On the other hand, if our sentence was true, then we would be faced with the familiar feature of universal quantifiers: we’d run forever looking for a counterexample and never find one. So to determine that this sentence is true, we’d need an oracle for the halting problem for this new more powerful Turing machine!

Here’s a summary of what we have so far:

TM = Ordinary Turing Machine
TM2 = TM + oracle for TM
TM3 = TM + oracle for TM2

The table shows what type of machine suffices to decide the truth value of a sentence, depending on where on the arithmetic hierarchy the sentence falls and whether the sentence is true or false.

We’re now ready to generalize. In general, Σn sentences start with a block of existential quantifiers, and then alternate between blocks of existential and universal quantifiers n – 1 times before ending in a Σ0 sentence. Πn sentences start with a block of universal quantifiers, alternates quantifiers n – 1 times, and then ends in a Σ0 sentence. And as you move up the arithmetic hierarchy, it requires more and more powerful halting oracles to decide whether sentences are true:

(TM = ordinary Turing machine, TMn+1 = TM + oracle for TMn)

If we define Σω to be the union of all the Σ classes in the hierarchy, and Πω the union of the Π classes, then deciding the truth value of Σω ⋃ Πω (the set of all arithmetic sentences) would require a TMω – a Turing machine with an oracle for TM, TM2, TM3, and so on. Thus the theory of true arithmetic (the set of all first-order sentences that are true of ℕ), is not only undecidable, it’s undecidable with a TM2, TM3, and TMn for every n ∈ ℕ. At every level of the arithmetic hierarchy, we get new sentences that are essentially on that level (not just sentences that are superficially on that level in light of their syntactic form, but sentences which, in their simplest possible logically equivalent form, lie on that level).

This gives some sense of just how hard math is. Just understanding the first-order truths of arithmetic requires an infinity of halting oracles, each more powerful than the last. And that says nothing about the second-order truths of arithmetic! That would require even stronger Turing machines than TMω – Turing machines that have halting oracles for TMω, and then TMs with oracles for that, and so on to unimaginable heights (just how high we must go is not currently known).

The Rise of Anti-Set Theory

This report details the discovery and proliferation of a radical new form of set-theory that has come to be known across math departments across the world as Anti-Set Theory. The earliest appearance of Anti-Set Theory is shrouded in mystery, but my best efforts have traced it to the work of two little-known logicians named Narl Cowman and Bishi Ranger. The germinal notion that would eventually grow into Anti-Set Theory was the idea of a theory of sets obtained by taking all of our familiar intuitions about sets, and require their exact opposites to hold. Any property that would hold of all sets, would of necessity not hold of any the objects in this theory. An attractive idea by any standard, and one self-evidently worthy of pursuit. And thus Anti-Set Theory was born.

The development of Anti-Set Theory progressed in three phases: the Era of Simple Negation (which was plagued with unattractively permissive axioms and a proliferation of models), the Age of Negation after the Universal Quantifier (where the previous problems were resolved, but new problems of consistency and paradox arose), and finally the Modern Age of Axiomatic Anti-Set Theory (where there arose the split between “revolutionary” and “traditionalist” Anti-Settists, as they came to be known). Each of these phases will be explained in more depth in what follows.

The Era of Simple Negation

The first formulations of Anti-Set Theory suffered for practical reasons that were obvious in retrospect, but took embarrassingly long to remedy. The early anti-settists naively took their project to be to simply take the axioms of existing set theories and negate each of them, then to collect them into a new axiomatic theory and explore the consequences. While the idea to build off of existing set theories was a worthy one, there was a serious problem with this approach. Namely, the axioms of existing set theories were designed to be quite strong and restrictive, which made their negations overly weak and permissive. Many axioms had the form “All sets have this property”, the negation of which becomes “At least one set doesn’t have this property”, not “NO sets have this property.” The latter was the ambition of the original anti-settists, but it would take them until the Age of Negation After the Universal Quantifier about a half hour later to realize the crucial error they were making.

For instance, in the axioms of ZF (the orthodox axiomatization of set theory until Anti-Set Theory took the world by storm), there is an axiom known as “the Axiom of Pairing.” This axiom says, in plain English, that for any two sets x and y, there exists a set containing just x and y and nothing else (which we typically write as {x, y}). In the language of first-order logic, it said “∀x∀y∃z∀w (w∈z ↔ (w=x ∨ w=y))” (or in a more readable short-hand, “∀x∀y∃z (z = {x, y})”). Anti-Settists saw clearly the obvious benefit of having a set theory with the opposite property, namely that for any two sets x and y, the pair set {x, y} must NOT exist. But their original formulation of the Axiom of Anti-Pairing used simple negation: “¬∀x∀y∃z (z = {x, y})”, which is equivalent to “∃x∃y∀z (z ≠ {x, y})”, or “there are two sets x and y such that no set contains just x and y.” This was obviously too weak for what was desired.

The same problem cropped up for anti-union, anti-comprehension, anti-foundation, and virtually every other axiom of ZF. A new idea was needed. And soon enough, it came: we should place our negations AFTER the initial block of universal quantifiers, not before. The brilliance of this breakthrough is hard to overestimate. Fields Medals were awarded and backs were patted. This brings us to the second phase of the history of Anti-Set Theory.

The Age of Negation After the Universal Quantifier

Now that the Anti-Settists knew how they were going to proceed, they got busy axiomatizing their new theory. Every set has a power set? No more, now no set has a power set. The union of each set exists? Nope! Unions are banned. Infinite sets? Absolutely not.

The first few axioms to be proposed were anti-pairing, anti-union, anti-powerset, and anti-foundation and they took the following form:

Anti-Pairing: ∀x∀y¬∃z∀w (w ∈ z ↔ (w = x ∨ w = y))
Anti-Union: ∀x¬∃y∀z (z ∈ y ↔ ∃w (w ∈ x ∧ z ∈ w))
Anti-Powerset: ∀x∀y∃z (z ⊆ y ∧ z ∉ x)
Anti-Foundation: ∀x∀y (y ∈ x → ∃z (z ∈ x ∧ z ∈ y))
Anti-Infinity: ∀x¬∃y∀z (z ∈ x → ∀w (w = z⋃{z} → w ∈ y))

But soon they began running into trouble. The first hint that something was amiss appeared with the Axiom of Anti-Comprehension. The Axiom of Comprehension says that for any property Φ definable as a sentence of first-order ZF, and for any set X, there exists the subset of X consisting of those elements that satisfy Φ. Anti-Comprehension could thus be written:

Anti-Comprehension: ∀x∀y (y ≠ {z∈x | φ(z)})

Perhaps you can see the problem. What if φ is a tautology? Or in other words, what if φ(z) is a property satisfied by ANY and EVERY set, like, say, z=z? Then our axiom tells us that for any set x, the subset consisting of those elements that are equal to themselves cannot exist. But that’s just x itself! In other words, for any set x, x does not exist. The first Anti-Settists had unwittingly destroyed the entire universe of sets!

Another problem arose with Anti-Replacement. Recall that Replacement says that for any function F(x) definable in a sentence of first-order ZF, and for any set X, the image of X under F exists. So Anti-Replacement states that this image does not exist. But what if F is the identity function? Then Anti-Replacement tells us that the image of X under the identity map (namely, X itself) does not exist. And again, we’ve proved that no sets exist.

Quite simply, the problem the Anti-Settists were running into was that while their original axioms were too weak, their new axioms were too strong! So strong that they DESTROYED THE UNIVERSE. That’s strong.

The solution? Anti-Comprehension and Anti-Replacement were discarded. But there was another problem in the Axiom of Anti-Extensionality. Extensionality says that any two sets are the same if they share all the same elements. So Anti-Extensionality says that if two sets share all the same elements, then they are not the same. But if we compare any set X with itself using this standard, we find that X cannot equal X! This was even worse than before, because it violates the first-order tautology ∀x (x = x). Not only did this second wave of Anti-Settists destroy the universe, they also broke the rules of logic!

So Anti-Extensionality had to go. That much everybody agreed on. And the removal of this axiom settled the last of the paradoxes of Anti-Set Theory. But no sooner had the dust settled than a new controversy arose as to the nature of Extensionality…

The Great Schism: The Modern Age of Anti-Set Theory

So far, the Anti-Settists had compiled the following list of axioms:

Anti-Pairing: ∀x∀y¬∃z∀w (w ∈ z ↔ (w=x ∨ w=y))
Anti-Union: ∀x¬∃y∀z (z∈y ↔ ∃w (w∈x ∧ z∈w))
Anti-Powerset: ∀x∀y∃z (z⊆y ∧ z∉x)
Anti-Foundation: ∀x∀y (y∈x → ∃z (z∈x ∧ z∈y))
Anti-Infinity: ¬∃x∀y (y ∈ x → ∀z (z = y⋃{y} → z ∈ x))

You might have noticed that the Axiom of Anti-Infinity looks a little strange. The Axiom of Infinity tells us that there’s a set X that contains all empty sets, and such that for any set Y contained in X, if Y has a successor then that successor is also in X. So the Axiom of Anti-Infinity will say that there is no such set. We’ll prove shortly that in fact, the other axioms entail that there are no empty sets, so it’s vacuously true of every set that it “contains all empty sets.” Thus the only restriction placed on our universe by the Axiom of Anti-Infinity is that there’s no set that contains the successors of all its members with successors.

For a while, everybody agreed on this list of axioms and all was well. But after a couple of hours, new rumbles arose about Extensionality. A new contingent of Anti-Settists began arguing in favor of including the Axiom of Extensionality. It’s hard to describe exactly what was going in the minds of these individuals, who appeared to be turning their backs on everything that Anti-Set Theory was all about by accepting an ORDINARY axiom alongside all of their beautiful Anti-Axioms. Some of them expressed a concern that they had gone too far by turning their back on extensionality, and worried that Anti-Set Theory was so far removed from our intuitions that it no longer deserved to be called a theory of sets. Others pointed out that Anti-Set Theory as it was currently formulated ruled out certain models that they felt deserved to belong to the pantheon of Anti-Set Universes. The rest of the Anti-Settists called them crazy, but they persisted. The fighting reached fever pitch one afternoon in a meeting of the leading Anti-Settists, where one individual who shan’t be named accused another of “selling out” to Traditional Set Theory, and a fist-fight nearly broke out. This led to the Great Schism.

Anti-Settists fractured into two contingents that became known as the traditionalists, who advocated an extensional Anti-Set Theory, and the revolutionaries, who wanted an intensional Anti-Set Theory where two sets could share all the same elements but still be different. In recent history the fighting has cooled off, probably because both sides noticed that there actually didn’t seem to be all that huge of a difference between extensional and intensional anti-set theory. The primary realization was that the other axioms banned any objects based off of the elements they contained (so that anti-pairing, for instance, really says that there can be no set with the same elements as the pair of x and y, not just that the pair of x and y doesn’t exist).

Extensional Anti-Set Theory
Extensionality: ∀x∀y (∀z (z ∈ x ↔ z ∈ y) → x = y)
Anti-Pairing: ∀x∀y¬∃z∀w (w ∈ z ↔ (w = x ∨ w = y))
Anti-Union: ∀x¬∃y∀z (z ∈ y ↔ ∃w (w ∈ x ∧ z ∈ w))
Anti-Powerset: ∀x∀y∃z (z⊆y ∧ z∉x)
Anti-Foundation: ∀x∀y (y ∈ x → ∃z (z ∈ x ∧ z ∈ y))

Intensional Anti-Set Theory
Anti-Pairing: ∀x∀y¬∃z∀w (w ∈ z ↔ (w = x ∨ w = y))
Anti-Union: ∀x¬∃y∀z (z ∈ y ↔ ∃w (w ∈ x ∧ z ∈ w))
Anti-Powerset: ∀x∀y∃z (z⊆y ∧ z∉x)
Anti-Foundation: ∀x∀y (y ∈ x → ∃z (z ∈ x ∧ z ∈ y))

That covers the history of Anti-Set Theory up to modern times. Let’s now take a look at some of the peculiar details of the theory.

Theorem 1: No anti-sets contain exactly one element.

Proof: Suppose that there was an anti-set X such that X contained Y and nothing else. Then the sentence “Aw (w ∈ X ↔ (w = Y ∨ w = Y))” would be true. But this would be a violation of anti-pairing, as there can be no set whose elements are the same as the pair {Y, Y}. So no such anti-set can exist.

Theorem 2: No anti-sets contain exactly two elements.

Proof: Suppose there was an anti-set X such that X contained Y, Z, and nothing else. Then the sentence “Aw (w ∈ X ↔ (w = Y ∨ w = Z))” would be true. But this would be a violation of anti-pairing. So no such anti-set can exist.

Theorem 3: No anti-sets contain an empty anti-set.

Proof: Suppose that some set X contained a set Y, where Y is empty. By Anti-Foundation, every element of X must share an element with X. So Y must share some element with X. But Y contains nothing, so it can’t share any element with X. Contradiction. So no such anti-set exists.

Theorem 4: No anti-set can be its own union.

Proof: Follows trivially from the axiom of Anti-Union: UX doesn’t exist for any x, so if X = UX, then X doesn’t exist. (This rules out anti-sets with the same structure as limit ordinals.)

Theorem 5: No empty anti-sets exist.

Proof: Suppose there exists an empty anti-set X. Then the union of X is also an empty anti-set. The Axiom of Anti-Union says that there can be no anti-set with the same elements as the union of any anti-set. So there can be no empty anti-set. Contradiction, so no empty anti-set exists.

These theorems tell us a lot about the structure of anti-sets. In particular, every anti-set contains at least three elements, and each of those elements in turn contains at least three elements, and so on forever. So we only have infinite descending membership-chains of anti-sets.

Also, we’ve ruled out anti-sets with zero, one, and two elements, so you might think that no three-element anti-sets can exist either. But the same argument we used for one- and two-element anti-sets doesn’t work any longer, since pairing never produces three-element sets. In fact, the first model of Anti-Set Theory discovered contained exactly five anti-sets, each of which had three elements. This model is an intensional model, as it’s crucial that two of the sets (C and X) contain the same elements. We’ll call this the Primordial Anti-Set Model.

Universe: P, A, B, C, X
P = {A, B, C}
A = {B, C, X}
B = {A, C, X}
C = {A, B, X}
X = {A, B, X}

Here’s two attempts to visualize this model:

Beautiful! Let’s check each of the axioms to convince ourselves that it really is a valid model.

Anti-Union: U(P) = A ⋃ B ⋃ C = {A, B, C, X}. Similarly, you can show that U(A), U(B), U(C), and U(X) all contain A, B, C, and X. And {A, B, C, X} is not an anti-set in the universe, so no violations of Anti-Union occur.

Anti-Pairing: This one’s easy, since each of our sets contains three elements, and no three-element set can be formed by pairing.

Anti-Powerset: Note that the axiom of power set only says “there’s a set containing all the existing sets that are subsets of X, for each X”, but it doesn’t bring into existence any new subsets. So for instance, in this model, the power set of P is just {P}, as P is the only subset of P that exists. But {P} doesn’t exist! Similarly, for each element, its power set is just the set of itself, which doesn’t exist by anti-pairing.

Anti-Foundation: Does any set contain an element that it doesn’t have anything in common with? You can just verify this by inspection.

Anti-Infinity: (EDIT: this model and the following one actually violate Anti-Infinity! Realized this after originally posting. A future post will provide more details.)

Now, this was a model only of Intensional Anti-Set Theory, not Extensional Anti-Set Theory. Here’s a way to modify it to get a model that works in both theories:

Universe: T, A, B, C
T = {A, B, C}
A = {T, B, C}
B = {A, T, C}
C = {A, B, T}

Again, convince yourself that this universe satisfies the axioms.

Using a similar construction, we can show that there are models of anti-set theory with exactly N sets, for every N ≥ 4. Our universe: T, A1, A2, …, AN-1. T = {An | n ∈ {1,2,…,N-1}, Ak = {T} ⋃ {Aj | j ∈ {1,2,…,N-1} \ {k}}.

Many lines of research remain open in the field of Anti-Set Theory. Can Anti-Sets serve as a new foundation for mathematics? Are there models of Anti-Set Theory that contain structures isomorphic to the natural numbers? Is the theory still consistent if we include an Axiom of Anti-Choice? (Adding Anti-Choice rules out the models we’ve discussed so far. Perhaps only infinite universes are allowed with Anti-Choice.) Other proposed additions to the axioms include the simple negation of extensionality, and an “Axiom of Undefinability”, which says that no set exists whose elements satisfy a definable property. How would these additions affect the universe of sets?

I want to close by noting that set theory sometimes gets a bad rap among mathematicians and the wider community for being too abstract or not “useful enough.” We hope that the advent of Anti-Set Theory will make it plain to the public that there is a future world in which set theory can be of GREAT practical use to everybody. The possible applications of Anti-Set Theory are too numerous to count, so numerous in fact that I regard as unnecessary naming any specific examples.

The Transfinite Factorial

Chapter 1: When Circularity is not Paradoxical

Circular definitions can be quite bad. Take the definition “a is the natural number that is equal to a”, or worse, “a is the natural number that is NOT equal to a”. Or, for an even more perverse one, “a+1, where a is the smallest number that can be defined in fewer than 50 words.” But just as not all self-reference is paradoxical, similarly not all circularity is vicious. Circular definitions form a crucial component of every area of math, where they go under the more polite term “recursion.”

The key component of a benign circular definition is that it avoids infinite regress. So “a=a” is unhelpful, because to evaluate a you must first know what a evaluates to. But what about “an = n ⋅ an-1, and a0 = 1″? Now to know a500, you only need to know a499, which in turn only requires you know a498, which requires… and so on, each step moving a little closer to the base case “a0 = 1″, at which point you can travel all the way back up the chain to discover that a500 = 500!. (That’s 500 factorial, not an excitable 500.)

Douglas Hofstadter has a parable about the “porpuquine”, a rare cousin of the porcupine that grows smaller porpuquines on its back rather than quills. The market value of such a beast is determined by the total number of porpuquine noses you can find on it. And if you’re worried about an infinite regress, don’t fear! There’s a smallest porpuquine, which is entirely bald.

Recursive definitions are extremely nice when we want to define a function whose outputs are related to each other in a more simple way than they are to the inputs. Some examples (in which S stands for the successor function, which takes 0 to 1 to 2 to …):

Addition
∀x (x + 0 = x)
∀x∀y (x + S(y) = S(x+y))

Multiplication
∀x (x ⋅ 0 = 0)
∀x∀y (x ⋅ S(y) = x⋅y + x)

Notice the similarity between these two definitions. In each case we define what happens when we add 0 to a number, and then we define what happens when we add a successor to a number. This covers all the natural numbers! Let’s show this in action by proving that 9+3 = 12.

9 + 3
9 + S(2)
S(9 + 2)
S(9 + S(1))
S(S(9 + 1))
S(S(9 + S(0)))
S(S(S(9 + 0)))
S(S(S(9)))
S(S(10))
S(11)
12

The colorful step is where we hit our base case (evaluating x+0), at which point we need merely to pass our answers up the levels to get back to the solution of our original problem.

Another: Let’s show that 3⋅2 = 6.

3⋅2
3⋅S(1)
3⋅1 + 3
3⋅S(0) + 3
(3⋅0 + 3) + 3
(0 + 3) + 3
(0 + S(2)) + 3
S(0 + 2) + 3
S(0 + S(1)) + 3
S(S(0 + 1)) + 3
S(S(0 + S(0))) + 3
S(S(S(0 + 0))) + 3
S(S(S(0))) + 3
S(S(1)) + 3
S(2) + 3
3 + 3
3 + S(2)
S(3 + 2)
S(3 + S(1))
S(S(3 + 1))
S(S(3 + S(0)))
S(S(S(3 + 0)))
S(S(S(3)))
S(S(4))
S(5)
6

Each green line is where we use a “base case rule” – the first is the base case for multiplication, the second and third for addition. I like that you can see the complexity growing in the length of the line until you hit the base case, at which point the lines start shortening until we next need to apply the recursive step.

Chapter 2: Infinite Recursion

Now what happens if we want to add x + y but y is neither 0 nor a successor. “Huh? Isn’t every non-zero number a successor of some number?” I hear you ask. Not infinity! Or more precisely, not ω. Our recursive definitional scheme is all well and good for finite numbers, but it fails when we enter the realm of transfinite ordinals.

No problem! There’s an easy patch. The problem is that we don’t know how to add or multiply x and y when y is a limit ordinal. So we just add on an extra rule to cover this case!

Addition
α + 0 = α
α + S(β) = S(α + β)
α + λ = U{α + β | β ∈ λ}

Multiplication
α ⋅ 0 = 0
α ⋅ S(β) = α⋅β + α
α + λ = U{α ⋅ β | β ∈ λ}

If you’re scratching your head right now at the notion of “the union of a set of numbers” or “a number being an element of another number”, go read this for a thorough summary. In short, in ZFC we encode each number as the set of all smaller numbers (e.g. 3 = {0, 1, 2}), and ω (the first infinite number) is defined to be the set of all natural numbers (ω = {0, 1, 2, 3, …}). So 3 ⋃ 5 = {0,1,2} ⋃ {0,1,2,3,4} = {0,1,2,3,4} = 5. In general, the union of a set of numbers is just the maximum number among them (or if there is no maximum number, then it’s the supremum of all the numbers).

Outside the context of ZFC, we can equally write α + λ = lim (α + β) as β → λ. We can also write α + λ = sup {α + β | β < λ}. And for those more familiar with ZFC, we don’t need to take the union over all β ∈ λ, we can make do with any cofinal subset (we’ll do this to simplify calculations greatly later in the post).

Now we’ve extended + and × into the infinite, and what we get is a little unusual. The first major sign that things have changed is that addition and multiplication are no longer commutative.

1 + ω = ⋃{1 + n: n ∈ ω} = U{1, 2, 3, 4, …} = ω
ω + 1 = ω + S(0) = S(ω + 0) = S(ω)

ω certainly doesn’t equal S(ω), so ω + 1 ≠ 1 + ω.

Distributivity also fails! For instance (ω + 3) ⋅ 2 = ω⋅2 + 3, not ω⋅2 + 6. But we don’t lose every nice algebraic feature. Associativity still holds for both addition and multiplication! I.e. α + (β + γ) = (α + β) + γ and α ⋅ (β ⋅ γ) = (α ⋅ β) ⋅ γ. This result is really important.

Earlier we showed that 1 + ω = ω. One can easily show that in general, n + ω = ω for any finite n, and the same holds for α + β is “infinitely smaller” than β. This notion of “infinitely smaller” is a little subtle; ω is not infinitely smaller than ω⋅2, but it is infinitely smaller than ω2. (It should be noted that everything up to here has been standard textbook stuff, but from here on I will be introducing original notions that I haven’t seen elsewhere. So treat what follows this with skepticism.)

A useful definition for us will be α << β (read as “α is infinitely less than β”) if α + β = β. We’ll also say that α and β are “comparable” if neither is infinitely smaller than the other: α ~ β if ¬(α << β) and ¬(β << α). Some conjectures I have about the comparability relation:

Conjecture 1: Commutativity holds between comparable ordinals.

Conjecture 2: α ~ β if and only if there’s some finite n such that α < β⋅n and β < α⋅n.

I’m actually fairly confident that Conjecture 1 is false, and would love to see a counterexample.

Here are some more useful identities for transfinite addition and multiplication:

Theorem 1: For finite n and limit ordinal λ, α⋅(λ + n) = α⋅λ + α⋅n

Proof sketch: α⋅(λ + n) = α⋅(λ + (n-1)) + α = α⋅(λ + (n-2)) + α + α = … = α⋅(λ + 0) + α + α + … + α = α⋅λ + α⋅n.

Theorem 2: For finite n and β << α, (α + β)⋅n = α⋅n + β.

Proof sketch: (α + β)⋅n = (α + β)⋅(n-1) + (α + β) = (α + β)⋅(n-2) + (α + β) + (α + β) = … = (α + β)⋅0 + (α + β) + … + (α + β) = (α + β) + (α + β) + … + (α + β) + (α + β) = α + (β + α) + (β + … + α) + β = α + α + α + … + α + β = α⋅n + β.

In case it’s not clear what’s going with this proof, the idea is that we expand (α + β)⋅n out to (α + β) + … + (α + β) (n times). Then we shift parentheses to enclose all the middle βs, making them look like (β + α). But then by assumption, since β << α, the β vanishes in each of these middle terms, leaving just the final one: α + α + α + … + α + β = α⋅n + β.

Infinite exponentiation, tetration and so on can be defined in the exact same way as addition and multiplication: first cover the 0 case, then the successor case, and finally the limit case. And the limit case is defined by just taking the union of all smaller cases.

Chapter 3: Transfinite Factorials

Now, what’s the first function you think of when you think of recursively defined functions? That’s right, the factorial! I mentioned it at the beginning of this post, long ago: n! = n⋅(n-1)! and 0! = 1. So let’s extend this definition to obtain a transfinite factorial!

Factorial
0! = 1
S(α)! = α! ⋅ S(α)
λ! = U{β! | β ∈ λ}

Now we can ask ω! is.

Theorem: ω! = ω

Proof: ω! = U{n! | n ∈ ω} = U{1, 2, 6, 24, 120, …} = ω (since {1, 2, 6, 24, 120, …} is a cofinal subset of ω).

Next, (ω+1)! = ω! ⋅ (ω + 1) = ω ⋅ (ω + 1) = ω⋅ω + ω⋅1 = ω2 + ω. Note that to prove this, we’ve used Theorem 1 from before.

And we can keep going from there.

Theorem: (ω + n)! = ωn+1 + ωn⋅n + ωn-1⋅(n – 1) + … + ω2⋅2 + ω.

Try to prove this for yourself!

Now that we know what (ω + n)! is for each finite n, we are able to calculate what (ω⋅2)! is (using the fact that {ω + n | n ∈ ω} is a cofinal subset of ω⋅2). I’ll write its value, as well as the next few interesting ordinals:

(ω⋅2)! = U{ω!, (ω+1)!, (ω+2)!, …} = ωω
(ω⋅3)! = ωω⋅2
(ω⋅n)! = ωω⋅(n-1)

Using the same trick as before, this enables us to solve (ω2)!

2)! = U{ω!, (ω⋅2)!, (ω⋅3)!, …} = U{ω, ωω, ωω⋅2, ωω⋅3, …} = ωω2

From here, things suddenly take on a remarkable regularity (for a while at least). Looking at limit ordinals above ω2, they appear to have the following structure, at least up until we arrive at the first ordinal where standard notation fails us: ωωω = ε0.

Conjecture: For λ a limit ordinal such that ω2 ≤ λ < ε0, λ! = ωλ

If this conjecture is right, then (ωω)! = ωωω, and (ωωω)! = ωωωω, and so on. We can also write this using tetration notation by saying that (nω)! = n+1ω.

But something different happens once we get to ε0!

Conjecture: ε0! = U{(nω)! | n ∈ ω} = U{ω, 2ω, 3ω, 4ω, …} = ε0

If this is right, then this is pretty interesting behavior! The fixed points of the factorial function are 1, ω, and ε0, at the very least. What happens after ε0? The factorial function “loops” in a certain sense! Just as (ω+1)! was ω2 + ω, so (ε0+1)! ends up being ε02 + ε0. And (ε0⋅2)! = ε0ε0, paralleling (ω⋅2)! = ωω. And so on.

Conjecture: εn is a fixed point of the factorial function for all n.

There are plenty more questions to be answered. Is ω1 another fixed point of the factorial function? What exactly does the list of all the fixed points look like? Are there any cardinals that aren’t equal to their own factorial? What exactly is the general pattern for factorials of successor ordinals? How does α! compare to αα in general?

Finiteness can’t be captured in a sound, complete, finitary proof system

Consider the sentence “This blog has finitely many posts.” Do you understand what this sentence means? Then no set of rules (even infinite, even uncomputable!) can model your reasoning. This claim may sound shocking, but it can be justified on solid metamathematical grounds.

Another example: the sentence “There are finitely many planets in the universe.” You don’t have to think it’s true, you just have to think you understand what it means. What’s the common theme? It’s the notion of there being ‘finitely many’ of some class of objects. Let’s imagine building a language that has the expressive resources of first-order logic (which are quite modest), plus an additional quantifier F, whose semantics are given by the following rule: Fx φ(x) is satisfied by a model iff there are only finitely many objects in that model that satisfy φ(x).

It turns out that any language consisting of first order logic plus the quantifier F can’t be axiomatized in any sound, complete, and finitary proof system. Notice that effective enumerability of the rules of the proof system is not a requirement here! So long as the language is strong enough to express the semantics of {∧, ¬, ∀, F, variables xn, and relations Rn}, no set of sentences and sentence-manipulation rules in that language will suffice to capture these semantics.

Here’s a proof: consider the first-order theory of Peano arithmetic. This theory has nonstandard models (as any theory of arithmetic must have in a logic that is compact). All of these nonstandard models have the following feature: that there are numbers that are larger than infinitely many numbers. Think about what this means: this is a common feature of all nonstandards, so if we could write a sentence to express this feature then we could rule them all out! This is where the quantifier F steps in. With F, we can write the following simple sentence and add it to PA as an axiom:

∀x Fy (y < x)

In English: every number has finitely many numbers less than it. And with that, we’ve ruled out all nonstandard models! So now we have a theory that is categorical for ℕ. And that’s a big deal, metamathematically speaking!

Why? Well, as I’ve talked about in a previous post, you can prove some pretty strong limitative results about any logic that can produce a theory that’s categorical for ℕ. In particular, if we can produce such a theory then its logic cannot be compact. Quick proof: suppose a set of sentences Σ models ℕ. Add to Σ a constant c and the axioms “c ≠ 0”, “c ≠ 1”, “c ≠ 2”, and so on, and call this new set Σ’. Every finite subset of Σ’ models ℕ. So by compactness, Σ’ has a model. But this model is nonstandard – it contains numbers that aren’t natural numbers. And since Σ is a subset of Σ’, any model of Σ’ is also a model of Σ.

So compactness implies that no theory is categorical for ℕ. But compactness follows from the following three properties: a sound and complete proof system (Σ ⊢ α if and only if Σ ⊨ α), and that all proofs are only finitely long (try expressing this property without F!). Quick proof: If a set of sentences is finitely satisfied, then every finite subset of it has a model (by definition), so no finite subset of it can be refuted (by soundness), so the entire set can’t be refuted (by finite proofs), so the entire set is satisfied (by completeness).

So soundness + completeness + finiteness ⇒ compactness ⇒ the existence of nonstandard models of arithmetic in any theory that models ℕ. Which means that the semantics of F cannot be captured in any sound, complete, and finite proof system!

Take your pick: either you don’t really understand the semantics of the “finitely many” quantifier F, or no set of rules (not even requiring this set to be finite or computable) can fully capture your reasoning in finite-length proofs.

More information about related extensions of first-order logic and their limitations can be found here. The result I describe here is a rephrasing of results discussed there.

The Mindblowing Goodstein Sequences

Goodstein sequences are amazingly strange and unintuitive. I want to tell you all their unusual properties up-front, but I also don’t want to spoil the surprise. So let me start by defining the Goodstein sequence of a number.

We start by defining the hereditary base-n representation of a number m. We first write m as a sum of powers of n (e.g. for m = 266 and n = 2, we write 266 = 28 + 23 + 21). Now we write each exponent as a sum of powers of n. (So now we have 266 = 223 + 22+1 + 21). And we continue this until all numbers in the representation are ≤ n. For 266, our representation stabilizes at 266 = 222+1 + 22+1 + 21.

Now we define Gn(m) as follows: if m = 0, then Gn(m) = 0. Otherwise, Gn(m) is the number produced by replacing every n in the hereditary base-n representation of m by n+1 and subtracting 1. So G2(266) = G2(222+1 + 22+1 + 21) = 333+1 + 33+1 + 31 – 1 ≈ 4.4 × 1038.

Finally, we define the Goodstein sequence for a number n as the following:

a0 = n
a1 = G2(a0)
a2 = G3(a1)
a3 = G4(a2)
And so on.

Let’s take a small starting number and look at its Goodstein sequence. Suppose we start with 4.

a0 = 4 = 22
a1 = 33 – 1 = 2⋅32 + 2⋅3 + 2 = 26
a2 = 2⋅42 + 2⋅4 + 1 = 41
a3 = 2⋅52 + 2⋅5 = 60
a4 = 2⋅62 + 2⋅6 – 1 = 2⋅62 + 6 + 5 = 83
a5 = 2⋅72 + 7 + 4 = 109

Now, notice that the sequence always seems to increase. And this makes sense! After all, each step we take, we increase the base of each exponent by 1 (a big increase in the value) and only subtract 1. Here’s a plot of the first 100 values of the Goodstein sequence of 4:

And for further verification, here are the first million values of the Goodstein sequence of 4:

Looks like exponential growth to me! Which, again, is exactly what you’d expect. In addition, the higher the starting value, the more impressive the exponential growth we see. Here we have the first 100 values of the Goodstein sequence of 6:

Notice how after only 100 steps, we’re already at a value of 5 × 1010! This type of growth only gets more preposterous for larger starting values. Let’s look at the Goodstein sequence starting with 19.

a0 = 19
a1 ≈ 7 × 1012
a2 ≈ 1.3 × 10154
a3 ≈ 1.8 × 102184
And so on.

We can’t even plot a sequence like this, as it’ll just look like two perpendicular lines:

Alright, so here’s a conjecture that hopefully you’re convinced of: Goodstein sequences are always growing. In fact, let’s conjecture something even weaker: Goodstein sequences never terminate (i.e. never go to zero).

We’re finally ready for our first big reveal! Both of these conjectures are wrong! Not only do some Goodstein sequences terminate, it turns out that EVERY Goodstein sequence eventually terminates!

This is pretty mind-blowing. What about all those graphs I showed? Well, the sequences do increase for a very very long time, but they eventually turn around. How long is eventually? The Goodstein sequence of 4 takes 3 × 2402,653,210 – 1 steps to turn around!! This number is too large for me to be able to actually show you a plot of the sequence turning around. But we do know that once the Goodstein sequence stops increasing, it actually stays fixed for 3 × 2402,653,209 steps before finally beginning its descent to zero. Heuristically, it would look something like this:

(Why does it stay constant, incidentally? The reason is that when we get to a number whose hereditary base-B notation just looks like B + n for some n < B, we increase B by 1 and decrease n by 1 each step. This doesn’t change the sequence’s value! So it stays constant until we get to n = 0. At that point, we decrease by 1 until we descend all the way to zero.)

So, how do we prove this? We do it using infinite ordinals. (Now you see why this topic caught my eye recently!) The plan is to associate with Goodstein sequence a “matching sequence” of ordinals. The way to do this is pretty simple: for a number written in hereditary base-n notation, just replace every n in that representation with ω. Now you have an infinite ordinal!

So, for example, here’s the first few numbers in the Goodstein sequence of 4, along with their translation to ordinals:

4 = 22 becomes ωω
26 = 33 – 1 = 2⋅32 + 2⋅3 + 2 becomes ω2⋅2 + ω⋅2 + 2
41 = 2⋅42 + 2⋅4 + 1 becomes ω2⋅2 + ω⋅2 + 1
60 = 2⋅52 + 2⋅5 becomes ω2⋅2 + ω⋅2
83 = 2⋅62 + 2⋅6 – 1 = 2⋅62 + 6 + 5 becomes ω2⋅2 + ω + 5
109 = 2⋅72 + 7 + 4 becomes ω2⋅2 + ω + 4

Examine this sequence of ordinals for a couple of minutes. You might notice something interesting: at each step, the ordinal is decreasing! This turns out to always hold true. Let’s see why.

Notice that there are two cases: either the ordinal is a successor ordinal (like ω2 + 4), or it’s a limit ordinal (like ω2 + ω). If it’s a successor ordinal, then the next ordinal in the sequence is just its predecessor. (So ω2 + 4 becomes ω2 + 3). This is obviously a smaller ordinal.

What if it’s a limit ordinal? Then the smallest limit ordinal in its expanded representation drops to a smaller limit ordinal and some finite number is added. So for instance, in the ordinal ω2 + ω⋅2, ω⋅2 will drop to ω and some finite number will be added on depending on which step in the sequence you’re at. So we’ll end up with ω2 + ω + n, for some finite n. This ordinal is smaller than the original one, because we’ve made an infinitely large jump downwards (from one limit ordinal to a lower limit ordinal), and only increased by a finite amount.

So in either case, we go to a smaller ordinal. And now the key realization is that every decreasing sequence of ordinals MUST terminate! (This is one of my favorite facts about ordinals, by the way. Even though we’re looking at arbitrarily large infinite ordinals, it’s still the case that there is no decreasing sequence that goes on forever. Whenever we hit a limit ordinal, we have to make an infinitely large jump downwards to get to the next element in the sequence, as the ordinal has no predecessor.)

So we’ve paired every Goodstein sequence with a decreasing sequence of ordinals. And every decreasing sequence of ordinals eventually terminates. So the Goodstein sequence must terminate as well! (One easy way to see this is to notice that every value in a Goodstein sequence is ≤ the ordinal assigned to it. Another way is to notice that the only number that could possibly be assigned to the ordinal 0 is 0 itself. So when the decreasing sequence of ordinals hits 0, as it must eventually, so must the Goodstein sequence) And that’s our proof!

That Goodstein sequences always terminate is the big flashy result for this post. But there’s much more that’s cool about them. For instance, the proof that we just used did not rely on the fact that we just increased the base by 1 at each step. In fact, even if we update the base by an arbitrarily large function at each stage, the sequences still must terminate!

Even cooler, the proof that all Goodstein sequences terminate cannot be formalized in Peano arithmetic! And in fact, no proof can be found of this theorem using PA. Laurie Kirby and Jeff Paris proved in 1982 that the statement that Goodstein sequences always terminate could be reduced to a theorem of Gentzen from which the consistency of PA could be deduced. So if PA could prove that Goodstein sequences always terminate, then it could prove its own consistency! Gödel’s second incompleteness theorem tells us that this cannot be, and thus assuming that PA really is consistent, it cannot prove that that these sequences all terminate.

This was historically the third result to be found independent of PA, and it was the first that was purely number-theoretic in character (as opposed to meta-mathematical, like the Gödel sentences). It gives us a very salient way of expressing the limitations of Peano arithmetic: simply write a program that computes the Goodstein sequence of any given input, terminating only when it reaches 0 and outputting the length of the sequence. (Code Golf Stack Exchange shows that this can be done using just 77 characters of Haskell! And here’s a nice way to do it in Python). Now, it happens to be true that this program will always terminate for any natural number input (if run on sufficiently powerful hardware). But Peano arithmetic cannot prove this!

As a final fun suprise, let’s define G(n) to be the length of the Goodstein sequence starting at n. As we’ve already seen, this sequence gets really big really quickly. But exactly how quickly does it grow? Turns out that it grows much much faster than other commonly known fast-growing computable functions. For instance, G(n) grows much faster than Ackermann’s function, and G(12) is already greater than Graham’s number!