ZFC, and getting the right answer by accident

There’s something I’m confused about with regards to ZFC. We already know that no first order theory allows you to perfectly capture the semantics of concepts like the natural numbers or finitely many. The best we can do if restricted to first-order logic (which is where most of modern mathematics tries to stay) is to have theories with surrogates for those concepts.

For instance, there’s this object called ω that stands in for the natural numbers in ZFC, even though there are all these models of ZFC in which ω contains many more objects besides the natural numbers (in fact, uncountably many in some models). And “X is finite” is replaced by “there’s a bijection from X to an element of ω.”

In formalizing these surrogate concepts, we make sure to not include any false statements about the concepts, which gives us a type of soundness of our surrogate concept. I.e., ZFC isn’t going to prove things about ω that are false of the set of natural numbers, because in one model ω is the set of natural numbers.

But it doesn’t give us completeness. We aren’t going to be able to prove ALL the true first-order sentences about the natural numbers, or about the concept of finiteness. (This is of course just a product of the first-order nature of the theories in which we’re defining these surrogates.)

So in each of these cases there will be true statements involving the concept that our theory will be agnostic about. We can look for “really good” surrogates, surrogates which allow us to prove most of the important and interesting true statements about the concept, and only fail in very abstract and unusual settings. The degree to which we can make good surrogates is the degree to which a first-order thinker can make sense of and usefully apply non-first-order concepts. (A first-order thinker being one that has no built-in understanding of non-first-order concepts.)

So one general question is: how good is a given surrogate? And another is, how do we know based on the axioms how good of a surrogate we’ll be getting? This is the thing I’m confused about.

In ZFC, there’s this weird phenomenon of the theory “getting the right answers accidentally.” It’s a little tough to put into words, but here’s an example:

ZFC can prove that |P(X)| > |X| for all X. So for instance ZFC can prove that |P(ω)| > |ω|. Meaning that ZFC can prove that the power set of the naturals is uncountable. But ZFC has countable models! (As does every first-order theory.) In those models, the power set of the naturals is NOT uncountable.

First order logic is sound, so what’s going on ISN’T that ZFC is proving a sentence that’s false in some of its models. It’s that the sentence is false in that model, if interpreted to be about the actual concept, and true in that model if interpreted to be about the surrogate concept. The surrogate for “P(ω) is uncountable” in ZFC is “there’s no bijection from P(ω) to ω”. And in the countable models of ZFC, the bijection that would prove the equinumerosity of ω and P(ω) is missing! So in those models, it’s actually TRUE that “there’s no bijection from P(ω) to ω”, even though P(ω) and ω really do have the same number of elements.

This is the sort of subtle nonsense you get used to in mathematical logic. Two accidents cancel out: first ZFC makes the mistake of having models where P(ω) is countable, and second it makes the mistake of losing track of the bijection from P(ω) and ω. And as a result, ZFC is able to prove the correct thing.

This seems really weird to me, and it’s an unsettling way for a supposed foundations of mathematics to be getting the right answers. This type of thing happens a lot, and it feels like we keep getting “lucky” in that our unintended interpretations of a sentence interfere with each other and cancel out their problematic features. It makes me wonder why we should be confident that ZFC will continue giving us the right answers (as opposed to being agnostic on important basic questions). And in fact we do have some examples of important and basic questions that ZFC is agnostic on, most dramatically that if |X| < |Y|, then |P(X)| < |P(Y)|!

It’s not that I doubt that ZFC’s surrogates for non-first order concepts end up allowing us to prove an enormous amount of the true facts about these concepts, because they clearly do. But is there some principled reason we can give for WHY the axioms of ZFC lead us to such a robust framework for mathematics in which we can prove many true statements?

One thing that suggests this is the power of the axiom of replacement. It’s just an extremely strong and useful axiom that allows us to prove a whole lot. But that doesn’t seem to help explain the “right-by-accident” phenomenon. So what does?

Who would win in a fight, logic or computation?

Which paradigm is stronger? When I say logic here, I’ll be referring to standard formal theories like ZFC and Peano arithmetic. If one uses an expansive enough definition to include computability theory, then logic encompasses computation and wins by default.

If you read this post, you might immediately jump to the conclusion that logic is stronger. After all, we concluded that in first order Peano arithmetic one can unambiguously define not only the decidable sets, but the sets decidable with halting oracles, and those decidable with halting oracles for halting oracles, and so on forever. We saw that the Busy Beaver numbers, while undecidable, are definable by a Π1 sentence. And Peano arithmetic is a relatively weak theory; much more is definable in a theory like ZFC!

On the other hand, remember that there’s a difference between what’s absolutely definable and what’s definable-relative-to-ℕ. A set of numbers is absolutely definable if there’s a sentence that is true of just those numbers in every model of PA. It’s definable-relative-to-ℕ if there’s a sentence that’s true of just those numbers in the standard model of PA (the natural numbers, ℕ). Definable-relative-to-ℕ is an unfair standard for the strength of PA, given that PA, and in general no first order system, is able to categorically define the natural numbers. On the other hand, absolute definability is the standard that meets the criterion of “unambiguous definition”. With an absolute definition, there’s no room for confusion about which mathematical structure is being discussed.

All those highly undecidable sets we discussed earlier were only definable-relative-to-ℕ. In terms of what PA is able to define absolutely, it’s limited to only the finite sets. Compare this to what sets of numbers can be decided by a Turing machine. Every finite set is decidable, plus many infinite sets, including the natural numbers!

This is the first example of computers winning out over formal logic in their expressive power. While no first order theory (and in general no theory in a logic with a sound and complete proof system) can categorically define the natural numbers, it’s incredibly simple to write a program that defines that set using the language of PA. In regex, it’s just “S*0”. And here’s a Python script that does the job:

def check(s):
    if s == '':
        return False
    elif s == ‘0’:
        return True
    elif s[0] == ’s’:
        return check(s[1:])
    else:
        return False

We don’t even need a Turing machine for this; we can build a simple finite state machine:

In general, the sets decidable by computer can be much more complicated and interesting than those absolutely definable by a first-order theory. While first-order theories like ZFC have “surrogates” for infinite sets of numbers (ω and its definable subsets), these surrogate sets include nonstandard elements in some models. As a result, ZFC may fail to prove some statements about these sets which hold true of ℕ but fail for nonstandard models. For instance, it may be that the Collatz conjecture is true of the natural numbers but can’t be proven from ZFC. This would be because ZFC’s surrogate for the set of natural numbers (i.e. ω) includes nonstandard elements in some models, and the Collatz conjecture may fail for some such nonstandard elements.

On top of that, by taking just the first Turing jump into uncomputability, computation with a halting oracle, we are able to prove the consistency of ZFC and PA. Since ZFC is recursively axiomatizable, there’s a program that runs forever listing out theorems, such that every theorem is output at some finite point. We can use this to produce a program that looks to see if “0 = 1” is a theorem of ZFC. If ZFC is consistent, then the program will search forever and never find this theorem. But if ZFC is not consistent, then the program will eventually find the theorem “0 = 1” and will terminate at that point. Now we just apply our halting oracle to this program! If ZFC is consistent, then the halting oracle tells us that the program doesn’t halt, in which case we return “True”. If ZFC is not consistent, then the halting oracle tells us that the program does halt, in which case we return “False”. And just like that, we’ve decided the truth value of Con(ZFC)!

The same type of argument applies to Con(PA), and Con(T) for any T that is recursively axiomatizable. If you’re not a big fan of ZFC but have some other favored system for mathematics, then so long as this system’s axioms can be recognized by a computer, its consistency can be determined by a halting oracle.

Some summary remarks.

On the one hand, there’s a very simple explanation for why logic appears over and over again to be weaker than computation: we only choose to study formal theories that are recursively axiomatizable and have computable inference rules. Of course we can’t do better than a computer using a logical theory that can be built into a computer! If on the other hand, we chose true arithmetic, the set of all first-order sentences of PA that are true of ℕ, to be our de jure theory of mathematics, then the theorems of mathematics could not be recursively enumerated by any Turing machine, or indeed any oracle machine below 0(ω). So perhaps there’s nothing too mysterious here after all.

On the other hand, there is something very philosophically interesting about truths of mathematics that cannot be proven just using mathematics, but could be proven if it happened that our universe gave us access to an oracle for the halting problem. If a priori reasoning is thought to be captured by a formal system like ZFC, then it’s remarkable that there are facts about the a priori (like Con(ZFC)) that cannot possibly be established by a priori reasoning. Any consistency proof cannot be provided by a consistent system itself, and going outside to a stronger system whose consistency is more in doubt doesn’t help at all. The only possible way to learn the truth value of such statements is contingent; we can learn it if the universe contains some physical manifestation of a halting oracle!

Computing truth values of sentences of arithmetic, or: Math is hard

Previously I talked about the arithmetic hierarchy for sets, and how it relates to the decidability of sets. There’s also a parallel notion of the arithmetic hierarchy for sentences of Peano arithmetic, and it relates to the difficulty of deciding the truth value of those sentences.

Truth value here and everywhere else in this post refers to truth value in the standard model of arithmetic. Truth value in the sense of “being true in all models of PA” is a much simpler matter; PA is recursively axiomatizable and first order logic is sound and complete, so any sentence that’s true in all models of PA can be eventually proven by a program that enumerates all the theorems of PA. So if a sentence is true in all models of PA, then there’s an algorithm that will tell you that in a finite amount of time (though it will run forever on an input that’s false in some models).

Not so for truth in the standard model! As we’ll see, whether a sentence evaluates to true in the standard model of arithmetic turns out to be much more difficult to determine in general. Only for the simplest sentences can you decide their truth value using an ordinary Turing machine. And the set of all sentences is in some sense infinitely uncomputable (you’ll see in a bit in what sense exactly this is).

What we’ll discuss is a way to convert sentences of Peano arithmetic to computer programs. Before diving into that, though, one note of caution is necessary: the arithmetic hierarchy for sentences is sometimes talked about purely syntactically (just by looking at the sentence as a string of symbols) and other times is talked about semantically (by looking at logically equivalent sentences). Here I will be primarily interested in the entirely-syntactic version of the arithmetic hierarchy. If you’ve only been introduced to the semantic version of the hierarchy, what you see here might differ a bit from what you recognize.

Let’s begin!

The simplest types of sentences have no quantifiers at all. For instance…

0 = 0
2 ⋅ 2 < 7
(2 + 2 = 4) → (2 ⋅ 2 = 4)

Each of these sentences can be translated into a program quite easily, since +, ⋅, =, and < are computable. We can translate the → in the third sentence by converting it into a conjunction:

## (2 + 2 = 4) → (2 ⋅ 2 = 4)
not(2 + 2 == 4 and not 2 * 2 == 4)

Slightly less simple-looking are sentences with bounded quantifiers:

∀x < 10 (x + 0 = x)
∃x < 100 (x + x = x)
∀x < 5 ∃y < 7 (x > 1 → x⋅y = 12)
∃x < 5 ∀y < x ∀z < y (y⋅z ≠ x)

In each of these examples, the bounded quantifier could in principle be expanded out, leaving us with a finite quantifier-free sentence. This should suggest to us that adding bounded quantifiers doesn’t actually increase the computational difficulty.

We can translate sentences with bounded quantifiers into programs by converting each bounded quantifier to a for loop. The translation slightly differently depending on whether the quantifier is universal or existential:

def Aupto(n, phi):
    for x in range(n):
        if not phi(x):
            return False
    return True
def Elessthan(n, phi):
    for x in range(n):
        if phi(x):
            return True
    return False

Note that the second input needs to be a function; reflecting that it’s a sentence with free variables. Now we can quite easily translate each of the examples, using lambda notation to more conveniently define the necessary functions

## ∀x<10 (x + 0 = x)
Aupto(10, lambda x: x + 0 == x)

## ∃x<100 (x + x = x)
Elessthan(100, lambda x: x + x == x)

## ∀x<5 ∃y<7 ((x > 1) → (x*y = 12))
Aupto(5, lambda x: Elessthan(7, lambda y: not (x > 1 and x * y != 12)))

## ∃x<5 ∀y<x ∀z<y (y⋅z ≠ x)
Elessthan(5, lambda x: Aupto(x, lambda y: Aupto(y, lambda z: y * z != x)))

Each of these programs, when run, determines whether or not the sentence is true. Hopefully it’s clear how we can translate any sentence with bounded quantifiers into a program of this form. And when we run the program, it will determine the truth value of the sentence in a finite amount of time.

So far, we’ve only talked about the simplest kinds of sentences, with no unbounded quantifiers. There are two names that both refer to this class: Π0 and Σ0. So now you know how to write a program that determines the truth value of any Σ00 sentence!

We now move up a level in the hierarchy, by adding unbounded quantifiers. These quantifiers must all appear out front and be the same type of quantifier (all universal or all existential).

Σ1 sentences: ∃x1 ∃x2 … ∃xk Phi(x1, x2, …, xk), where Phi is Π0.
Π1 sentences: ∀x1 ∀x2 … ∀xk Phi(x1, x2, …, xk), where Phi is Σ0.

Some examples of Σ1 sentences:

∃x ∃y (x⋅x = y)
∃x (x⋅x = 5)
∃x ∀y < x (x+y > x⋅y)

And some Π1 sentences:

∀x (x + 0 = x)
∀x ∀y (x + y < 10)
∀x ∃y < 10 (y⋅y + y = x)

We can translate unbounded quantifiers as while loops:

def A(phi):
    x = 0
    while True:
        if not phi(x):
            return False
        x += 1

def E(phi):
    x = 0
    while True:
        if phi(x):
            return True
        x += 1

There’s a radical change here from the bounded case, which is that these functions are no longer guaranteed to terminate. A(Φ) never returns True, and E(Φ) never returns False. This reflects the nature of unbounded quantifiers. An unbounded universal quantifier is claiming something to be true of all numbers, and thus there are infinitely many cases to be checked. Of course, the moment you find a case that fails, you can return False. But if the universally quantified statement is true of all numbers, then the function will have to keep searching through the numbers forever, hoping to find a counterexample. With an unbounded existential quantifier, all one needs to do is find a single example where the statement is true and then return True. But if there is no such example (i.e. if the statement is always false), then the program will have to search forever.

I encourage you to think about these functions for a few minutes until you’re satisfied that not only do they capture the unbounded universal and existential quantifiers, but that there’s no better way to define them.

Now we can quite easily translate our example sentences as programs:

## ∃x ∃y (x⋅x = y)
E(lambda x: E(lambda y: x * x == y))

## ∃x (x⋅x = 5)
E(lambda x: x * x == 5)

## ∃x ∀y < x (x+y > x⋅y)
E(lambda x: Aupto(x, lambda y: x + y > x * y))

## ∀x (x + 0 = x)
A(lambda x: x + 0 == x)

## ∀x ∀y (x + y < 10)
A(lambda x: A(lambda y: x + y < 10))

## ∀x ∃y < 10 (y⋅y + y = x)
A(lambda x: Elessthan(10, y * y + y == x))

The first is a true Σ1 sentence, so it terminates and returns True. The second is a false Σ1 sentence, so it runs forever. See if you can figure out if the third ever halts, and then run the program for yourself to see!

The fourth is a true Π1 sentence, which means that it will never halt (it will keep looking for a counterexample and failing to find one forever). The fifth is a false Π1 sentence, so it does halt at the first moment it finds a value of x and y whose sum is 10. And figure out the sixth for yourself!

The next level of the hierarchy involves alternating quantifiers.

Σ2 sentences: ∃x1 ∃x2 … ∃xk Φ(x1, x2, …, xk), where Φ is Π1.
Π2 sentences: ∀x1 ∀x2 … ∀xk Φ(x1, x2, …, xk), where Φ is Σ1.

So now we’re allowed sentences with a block of one type of unbounded quantifier followed by a block of the other type of unbounded quantifier, and ending with a Σ0 sentence. You might guess that the Python functions we’ve defined already are strong enough to handle this case (and indeed, all higher levels of the hierarchy), and you’re right. At least, partially. Try running some examples of Σ2 or Π2 sentences and see what happens. For example:

## ∀x ∃y (x > y)
A(lambda x: E(lambda y: x > y))

It runs forever! If we were to look into the structure of this program, we’d see that A(Φ) only halts if it finds a counterexample to Φ, and E(Φ) only halts if it finds an example of Φ. In other words A(E(Φ)) only halts if A finds out that E(Φ) is false; but E(Φ) never halts if it’s false! The two programs’ goals are diametrically opposed, and as such, brought together like this they never halt on any input.

The same goes for a sentence like ∃x ∀y (x > y): for this program to halt, it would require that ∀y (x > y) is found to be true for some value of x, But ∀y (x > y) will never be found true, because universally quantified sentences can only be found false! This has nothing to do with the (x > y) being quantified over, it’s entirely about the structure of the quantifiers.

No Turing machine can decide the truth values of Σ2 and Π2 sentences. There’s a caveat here, related to the semantic version of the arithmetic hierarchy. It’s often possible to take a Π2 sentence like ∀x ∃y (y + y = x) and convert it to a logically equivalent but Π1 sentence like ∀x ∃y<x (y + y = x). This translation works, because y + y = x is only going to be true if y is less than or equal to x. Now we have a false Π1 sentence rather than a false Π2 sentence, and as such we can find a counterexample and halt.

We can talk about a sentence’s essential level on the arithmetic hierarchy, which is the lowest level of the logically equivalent sentence. It’s important to note here that “logically equivalent sentence” is a cross-model notion: A and B are logically equivalent if and only if they have the same truth values in every model of PA, not just the standard model. The soundness and completeness of first order logic, and the recursive nature of the axioms of PA, tells us that the set of sentences that are logically equivalent to a given sentence of PA is recursively enumerable. So we can generate these sentences by searching for PA proofs of equivalence and keeping track of the lowest level of the arithmetic hierarchy attained so far.

Even when we do this, we will still find sentences that have no logical equivalents below Σ2 or Π2. These sentences are essentially uncomputable; not just uncomputable in virtue of their form, but truly uncomputable in all of their logical equivalents. However, while they are uncomputable, they would become computable if we had a stronger Turing machine. Let’s take another look at the last example:

## ∀x ∃y (x > y)
A(lambda x: E(lambda y: x > y))

Recall that the problem was that A(E(Φ)) only halts if E(Φ) returns False, and E(Φ) can only return True. But if we had a TM equipped with an oracle for the truth value of E(Φ) sentences, then maybe we could evaluate A(E(Φ))!

Let’s think about that for a minute more. What would an oracle for the truth value of Σ1 sentences be like? One thing that would work is if we could run E(Φ) “to infinity” and see if it ever finds an example, and if not, then return False. So perhaps an infinite-time Turing machine would do the trick. Another way would be if we could simply ask whether E(Φ) ever halts! If it does, then ∃y (x > y) must be true, and if not, then it must be false.

So a halting oracle suffices to decide the truth values of Σ1 sentences! Same for Π1 sentences: we just ask if A(Φ) ever halts and return False if so, and True otherwise.

If we run the above program on a Turing machine equipped with a halting oracle, what will we get? Now we can evaluate the inner existential quantifier for any given value of x. So in particular, for x = 0, we will find that Ey (x > y) is false. We’ve found a counterexample, so our program will terminate and return False.

On the other hand, if our sentence was true, then we would be faced with the familiar feature of universal quantifiers: we’d run forever looking for a counterexample and never find one. So to determine that this sentence is true, we’d need an oracle for the halting problem for this new more powerful Turing machine!

Here’s a summary of what we have so far:

TM = Ordinary Turing Machine
TM2 = TM + oracle for TM
TM3 = TM + oracle for TM2

The table shows what type of machine suffices to decide the truth value of a sentence, depending on where on the arithmetic hierarchy the sentence falls and whether the sentence is true or false.

We’re now ready to generalize. In general, Σn sentences start with a block of existential quantifiers, and then alternate between blocks of existential and universal quantifiers n – 1 times before ending in a Σ0 sentence. Πn sentences start with a block of universal quantifiers, alternates quantifiers n – 1 times, and then ends in a Σ0 sentence. And as you move up the arithmetic hierarchy, it requires more and more powerful halting oracles to decide whether sentences are true:

(TM = ordinary Turing machine, TMn+1 = TM + oracle for TMn)

If we define Σω to be the union of all the Σ classes in the hierarchy, and Πω the union of the Π classes, then deciding the truth value of Σω ⋃ Πω (the set of all arithmetic sentences) would require a TMω – a Turing machine with an oracle for TM, TM2, TM3, and so on. Thus the theory of true arithmetic (the set of all first-order sentences that are true of ℕ), is not only undecidable, it’s undecidable with a TM2, TM3, and TMn for every n ∈ ℕ. At every level of the arithmetic hierarchy, we get new sentences that are essentially on that level (not just sentences that are superficially on that level in light of their syntactic form, but sentences which, in their simplest possible logically equivalent form, lie on that level).

This gives some sense of just how hard math is. Just understanding the first-order truths of arithmetic requires an infinity of halting oracles, each more powerful than the last. And that says nothing about the second-order truths of arithmetic! That would require even stronger Turing machines than TMω – Turing machines that have halting oracles for TMω, and then TMs with oracles for that, and so on to unimaginable heights (just how high we must go is not currently known).

A Self-Interpreting Book

A concept: a book that starts by assuming the understanding of the reader and using concepts freely, and as you go on it introduces a simple formal procedure for defining words. As you proceed, more and more words are defined in terms of the basic formal procedure, so that halfway through, half of the words being used are formally defined, and by the end the entire thing is formally defined. Once you’re read through the whole book, you can start it over and read from the beginning with no problem.

I just finished a set theory textbook that felt kind of like that. It started with the extremely sparse language of ZFC: first-order logic with a single non-logical symbol, ∈. So the alphabet of the formal language consisted of the following symbols: ∈ ( ) ∧ ∨ ¬ → ↔ ∀ ∃ x ‘. It could have even started with a sparser formal language if it was optimizing for alphabet economy: ∈ ( ∧ ¬ ∀ x ‘ would suffice. As time passed and you got through more of the book, more and more things were defined in terms of the alphabet of ZFC: subsets, ordered pairs, functions from one set to another, transitivity, partial orders, finiteness, natural numbers, order types, induction, recursion, countability, real numbers, and limits. By the last chapter it was breathtaking to read a sentence filled with complex concepts and realize that every single one of these concepts was ultimately grounded in this super simple formal language we started with, with a finitistic sound and complete system of rules for how to use each one.

But could it be possible to really fully define ALL the terms used by the end of the book? And even if it were, could the book be written in such a way as to allow an alien that begins understanding nothing of your language to read it and, by the end, understand everything in the book? Even worse, what if the alien not only understands nothing of your language, but starts understanding nothing of the concepts involved? This might be a nonsensical notion; an alien that can read a book and do any level of sophisticated reasoning but doesn’t understand concepts like “and” and “or“.

One way that language is learned is by “pointing”: somebody asks me what a tree is, so I point to some examples of trees and some examples of non-trees, clarifying which is and which is not. It would be helpful if in this book we could point to simple concepts by means of interactive programs. So, for instance, an e-book where an alien reading the book encounters some exceedingly simple programs that they can experiment with, putting in inputs and seeing what results. So for instance, we might have a program that takes as input either 00, 01, 10, or 11, and outputs the ∧ operation applied to the two input digits. Nothing else would be allowed as inputs, so after playing with the program for a little bit you learn everything that it can do.

One feature of such a book would be that it would probably use nothing above first-order logical concepts. The reason is that the semantics of second-order logic cannot be captured by any sound and complete proof system, meaning that there’s no finitistic set of rules one could explain to an alien so that they know how to use the concepts involved correctly. Worse, the set of second-order tautologies is not even recursively enumerable (worse than the set of first-order tautologies, which is merely undecidable), so no amount of pointing-to-programs would suffice. First-order ZFC can define a lot, but can it define enough to write a book on what it can define?

The subtlety of Gödel’s second incompleteness theorem

Gödel’s second incompleteness theorem is an order of magnitude more subtle than his first. It’s commonly summarized as “no consistent theory strong enough to do arithmetic can prove its own consistency.” But there’s a lot of subtlety in both the “strong enough to do arithmetic” and the “prove its own consistency.” First of all, what exactly counts as strong enough to do arithmetic? Peano arithmetic certainly does, but it has an infinite axiom schema. Do any finite theories meet the criterion of “strong enough to do arithmetic”? It urns out that the answer to this is yes! Robinson arithmetic, which is what you get if you remove the infinite axiom schema of induction from PA, meets the requirement.

There are weaker theories of arithmetic, like Presburger arithmetic (which has only addition) and Skolem arithmetic (only multiplication), that don’t meet the criterion, and are therefore not subject to the incompleteness theorems. And it turns out that both of these theories are actually decidable! This is even stronger than being sound and complete; soundness and completeness tell us that there’s an algorithm that determines after a finite amount of time that a given sentence is a tautology, but not necessarily that such an algorithm exists to determine that a sentence is NOT a tautology. Decidability gives us both: not only can we classify any tautology as such in a finite time, we can also classify any non-tautology as such in a finite time.)

The amount of arithmetic we need is exactly the amount that Gödel uses to prove the incompleteness theorems. This has two parts: one, the theory must express enough arithmetic to do Gödel encoding (and therefore to express the notion of “provability”), and two, the theory must be able to formalize diagonalization. Dan Willard came up with theories that formalize enough arithmetic to do the first but not the second: these are theories that can talk about their own provability via Gödel coding, but are still too weak to be subject to the incompleteness theorems. Thus, these theories can actually prove their own consistency! These fascinating theories are called self-verifying theories.

Everything I’ve said so far has been about the subtlety of the notion of “strong enough to do arithmetic”. Now I want to talk about the subtlety of the notion of “proving one’s own consistency.” I’ll do this by first taking a brief interlude to talk about an argument I recently saw against the Consistency Thesis in philosophy of math that uses this notion. A friend of mine writes:

Consistency is a weak soundness property.

Here’s what I’ll call the Consistency Thesis: (Mathematical sentences are justified and/or true when they are part of a consistent theory.) What I want to do is to raise a problem for that thesis. Maintaining the Consistency Thesis leads to bizarre mathematical beliefs which don’t lend themselves to being true or justified. A merely ‘consistent’ theory can claim P(0), P(1), and so on, and yet claim that there’s some n for which ¬P(n). Such a theory is omega-inconsistent.

There are mathematical theories which despite their syntactic consistency, claim to be inconsistent. They will claim to be able to derive that 0=1 and yet never derive 0=1. One example of a consistent but unsound theory would be this: [suppose we add to ZFC the new axiom “ZFC is inconsistent”. If ZFC is consistent, then the new theory ZFC’ is consistent (by Godel’s second incompleteness theorem), even though it falsely proves that ZFC is inconsistent.]

So far, I’ve only mentioned omega-inconsistent theories. I should also mention that there are omega-consistent theories which are arithmetically unsound. A much stronger soundness property would be soundness in omega-logic, which entails consistency, omega-consistency, and arithmetical soundness.

Here’s the response I gave:

You say that an adherent to the Consistency Thesis believes in mathematical theories that are in fact consistent (no contradictions can be proved from them) but claim to be inconsistent; for instance ZFC + ¬Con(ZFC). This bit about “claiming to be inconsistent” is subtler than it might initially seem. There’s a very important difference between Con(ZFC) and “ZFC is consistent”. A first order theory can’t talk directly about its own consistency, it can only talk about properties of the objects in its models. We are allowed an indirect method to talk about consistency of theories, Gödel encoding. But this method has problems.

Gödel encoding allows us to write down statements that, if understood to be about the natural numbers, are equivalent to the assertion that a theory proves a contradiction. But this “if understood to be about the natural numbers” is a very important qualification, because no first order theory categorically defines the natural numbers (i.e. no first order theory has as its only model the natural numbers). More generally, no theory within a logic that has a sound, complete, and finitary proof system categorically describes the natural numbers (these are the only logics that a Formalist will see as well-defined, by the way).

What this means is that when we write “Con(ZFC)”, we’re actually using a short-hand for a complicated sentence about the objects in our models, and this complicated sentence is NOT equivalent to the claim that no contradiction can be proven from ZFC. Con(ZFC) could be false in a model even if ZFC is consistent, and Con(ZFC) could be true in a model even if ZFC is inconsistent, so long as that model is not the standard natural numbers.

So the adherent of the Consistency Thesis is not actually committed to weird beliefs about a theory being consistent and claiming its own consistency; they are just committed to the belief that the natural numbers are not well-defined.

The same objection applies to the claim that they have to accept as valid theories that claim P(0), P(1), and so on, but also that there’s some n for which ¬P(n). That’s true, and that’s fine! One can just say that such theories are not theories of the standard natural numbers; the n for which ¬P(n) is some other type of mathematical object that is not a natural number.

A TL;DR for my response: “Con(T)” only means “T is consistent” if T is about the natural numbers. Furthermore, the theories that assert their own inconsistency never have the natural numbers as a model. So it’s ultimately not very weird that these theories assert “¬Con(T)”… this statement doesn’t actually mean “T is inconsistent” in any of the models of the theory!

The Anti-Set Program

In ZFC set theory, we specify a collection of sentences within a first-order language to count as our axioms. The models of this collection of sentences are the set-theoretic universes (and many of these models are “unintended” – pesky perversions in which the set of naturals ω is uncountably large, as one example – but we’ll put this aside for this post). Most of the axioms act as constraints that all sets must follow. For instance, the axiom of pairing says that “For any sets x and y, there must exist another set containing as elements just x and y and nothing else”. This is an axiom that begins with universal quantification over the sets of the universe, and then states some requirement that must hold of all these sets.

The anti-set program is what you get when you take each of these restriction axioms, and negate the restriction it imposes on all sets. So, for instance, the axiom of anti-pairing says that for any sets x and y, there must NOT exist any set {x, y}. Contrast this with the simple negation of pairing, which would tell us only that there exist two sets x and y such that their pair doesn’t exist. The anti-axiom is much stronger than the negated axiom, in that it requires NO pairs to exist.

Not all axioms begin with universal quantifiers, in particular the axiom of infinity, which simply asserts that a set exists that satisfies a certain property. To form the axiom of anti-infinity, we simply negate the original axiom (so that no sets with that property exist).

As it turns out, the anti-set program, if applied to ALL the axioms of ZFC, ends in disaster and paradox. In particular, a contradiction can be derived from anti-comprehension, from anti-replacement, and from anti-extensionality. We don’t handle these cases the same way. Anti-comprehension and anti-replacement are simply discarded, being too difficult to patch. By contrast, anti-extensionality is replaced by ordinary extensionality. What’s up with that? The philosophical justification is simply that extensionality, being the most a priori of the bunch, is needed to justify us calling the objects in our universe sets at all.

There’s one last consideration we must address, which regards the axiom of anti-choice. I am currently uncertain as to whether adding this axiom makes the theory inconsistent. One thing that is currently known about the axiom of anti-choice is that with its addition, out go all finite models (there’s a really pretty proof of this that I won’t include here). In the rest of this post, I will be excluding anti-choice from the axioms and only exploring models of anti-ZF.

With that background behind us, let me list the axioms of anti-ZF.

Anti-Pairing
∀x∀y∀z∃w (w ∈ z ∧ w ≠ x ∧ w ≠ y)
No set is the pair of two others.

Anti-Union
∀x∀y∃z (z ∈ y ∧ ¬∃w (z ∈ w ∧ w ∈ x))
No set is the union of another.

Anti-Powerset
∀x∀y∃z (z ∈ y ∧ ∃w (w ∈ z ∧ w ∉ x))
No set contains all existing subsets of another.

Anti-Foundation
∀x∀y (y ∈ x → ∃z (z ∈ x ∧ z ∈ y))
Every set’s members have at least one element in common with it.

Anti-Infinity
∀x∃y ((y is empty ∧ y ∉ x) ∨ (y ∈ x ∧ ∃z (z = S(y) ∧ z ∉ x)))
Every set either doesn’t contain all empty sets, or has an element whose successor is outside the set.

If the axioms of ZF are considered to be a maximally nice setting for mathematics, then perhaps the axioms of anti-ZF can be considered to be maximally bad for mathematics.

We need to now address some issues involving the axiom of anti-infinity. First, the abbreviations used: “y is empty” is shorthand for “∀z (z ∉ y)” and “z = S(y)” is shorthand for “∀w (w ∈ z ↔ (w ∈ y ∨ w = y))” (i.e. z = y ⋃ {y}).

Second, it turns out that from the other axioms we can prove that there are no empty sets. So the first part of the disjunction is always false, meaning that the second part must always be true. Thus we can simplify the axiom of anti-infinity to the following statement, which is logically equivalent in the context of the other axioms:

Anti-Infinity
∀x∃y (y ∈ x ∧ ∃z (z = S(y) ∧ z ∉ x))
Every set has an element whose successor is outside the set.

Let’s now prove some elementary consequences of these axioms.

Theorems

>> There are no empty sets.
Anti-Union ⊢ ¬∃x∀y (y ∉ x)
Suppose ∃x∀y (y ∉ x). Call this set ∅. So ∀y (y ∉ ∅) Then ⋃∅ = ∅. But this implies that the union of ∅ exists. This contradicts anti-union.

>> There are no one-element sets.
Anti-Pairing ⊢ ¬∃x∃y∀z (z ∈ x ↔ z = y)
Suppose that ∃x∃y∀z (z ∈ x ↔ z = y). Then ∃x∃y∀z (z ∈ x ↔ (z = y ∨ z = y)). But this is a violation of anti-pairing, as then x would be the pair of y and y. Contradiction.

>> There are no two-element sets.
Anti-Pairing ⊢ ¬∃x∃y∃z∀w (w ∈ x ↔ (w = y ∨ w = z))
Suppose that ∃x∃y∃z∀w (w ∈ x ↔ (w = y ∨ w = z)). Then w is the pair of y and z, so anti-pairing is violated. Contradiction.

>> No models of anti-ZF have just N-element sets (for any finite N).
Suppose that a model of anti-ZF had only N-element sets. Take any of these sets and call it X. By anti-infinity, X must contain an element with a successor that is outside the set. Call this element Y and its successor S(Y). Y cannot be its own successor, as then its successor would be inside the set. This means that Y ∉ Y. Also, Y is an N-element set by assumption. But since Y ≠ S(Y), S(Y) must contain all the elements of Y in addition to Y itself. So S(Y) contains N+1 elements. Contradiction.

>> There is no set of all sets.
Suppose that there is such a set, and call it X. By Anti-Infinity, X must contain an element Y with a successor that’s outside of X. But no set is outside of X, by assumption! Contradiction.

>> No N-element model of anti-ZF can have N-1 sets that each contain N-1 elements.
Suppose that this were true. Consider any of these sets that contain N-1 elements, and call it X. By the same argument made two above, this set must contain an element whose successor contains N elements. But there are only N sets in the model, so this is a set of all sets! But we already know that no such set can exist. Contradiction.

>> Every finite set of disjoint sets must be its own choice function.
By a choice function for a set X, I mean a set that contains exactly one element in common with each element of X. Suppose that X is a finite set of disjoint sets. Let’s give the elements of X names: A1, A2, A3, …, AN. By anti-foundation, each An must contain at least one element of X. Since the Ans are disjoint, these elements cannot be the same. Also, since there are only N elements of X, no An can contain more than one element of X. So each element of X contains exactly one element of X. Thus X is a choice function for X!

The next results come from a program I wrote that finds models of anti-ZF of a given size.

>> There are no models of size one, two, three or four.

>> There are exactly two non-isomorphic models of size five.

Here are pictures of the two:

Now, some conjectures! I’m pretty sure of each of these, especially Conjecture 2, but haven’t been able to prove them.

Conjecture 1: There’s always a set that contains itself.

Conjecture 2: There can be no sets of disjoint sets.

Conjecture 3: In an N-element model, there are never less than three sets with fewer than N-1 elements.

Fundamentals of Logic: Syntax, Semantics, and Proof

A while back, I wrote a several-part introduction to mathematical logic. I wrote this when I was first wading into the subject and noticed that a lot of the stuff I was learning was not easy to find online in a concise beginner-level essay. Since then, I’ve learned an enormous amount more and become a lot more comfortable with the basics of the field. Looking back on my introduction series, I’m pretty happy with it. There’s nothing in there that’s incorrect. But I do notice that when I read it over, I wish that I had framed certain things slightly differently. In particular, one of the distinctions that I think is absolutely central to mathematical logic – the distinction between syntax, semantics, and proof – is muddled a bit in the write-up, and proof is bundled in with syntax in a way that I’m not totally satisfied with.

Given that, and given that this series is for whatever reason now one of the most popular reads on my blog, I want to provide a brief update post. I sort of did this a week or so ago, with this post, which linked to the slides for a presentation I gave for a discussion group. But I want to do it for REAL now, as an actual post of its own. And honestly, I also just love this subject and am happy for any excuse to write about it more.

So, let me begin by laying out what seems to me to be the primary trichotomy that you’ll see pop up over and over again in logic: syntax, semantics, and proof. One at a time, then.

Syntax

The syntax of a logic is a detailing of the allowed symbols of a language and the conditions for their grammatical use. It’s a set of characters, along with a function that defines the grammar of the language by taking in (finite, in most logics) strings of these characters and returning true or false. This function is usually inductively defined. So for instance in propositional logic we have some set of characters designated to be our propositional variables (we’ll denote them p, q, r, …), and another set of characters designated to be our logical symbols (∧, ∨, ¬, →, and parentheses). We inductively define our grammar function G as follows: Every propositional variable is grammatical. If X and Y are grammatical strings, then so are (¬X), (X ∧ Y), (X ∨ Y), and (X → Y). And no other string is grammatical. This definition is what allows us to say that (p ∧ (¬(q ∨ r))) is grammatical but (p ∧ (¬(q ∨ ¬))) is not. The grammar for propositional logic is especially nice and simple, and first-order logic is only mildly more complicated. The important thing is that the syntax for these logics as well as higher order logics is algorithmically checkable; it’s possible to write a simple program that verifies whether any given input string counts as grammatical or not.

Semantics

The semantics of a logic is a system for assigning some sort of meaning to the grammatical strings of the language. There are different types of meanings that might be assigned, but the primary one for classical logic is truth and falsity. For propositional logic, our semantics is a set of truth functions, which are functions that take in grammatical strings and return either true or false (you’ve encountered these before if you’ve seen truth tables; each row in a truth table corresponds to a particular truth function). Not just any such function will do; these functions will have to satisfy certain constraints, such as that whatever a truth function assigns to a string X, it must assign the opposite value to the string (¬X), and that (A ∧ B) is assigned true if and only if both A and B are assigned true. These constraints on our functions are really what endow our strings with meaning; they induce a sort of structure on the truth values of strings that lines up with our intended interpretation of them.

For propositional logic, we give an inductive definition for our collection of valid truth functions. We construct a truth function by first doing any assignment of truth values to the propositional variables (p, q, r, and so on), and then defining what the function does to the rest of the strings in terms of these truth values. So for any strings X and Y, the truth function assigns true to (¬X) if and only if it assigns false to X. It assigns true to (X ∧ Y) if and only if it assigns true to X and true to Y. And so on. By our construction of the grammar of propositional logic, we’ve guaranteed that our function is defined on all grammatical strings. Of course, there are many such truth functions (one for every way of assigning truth values to the propositional variables) – and this collection of truth functions is what defines our semantics for propositional logic.

First order logic is again more complicated than propositional logic in its semantics, but not hugely so. When talking about the semantics of first-order logic, we call its truth functions models (or structures), and each model comes with a universe of objects that the quantifiers of the logic range over. An important difference between the semantics of propositional logic and the semantics of first order logic is that in the former, we can construct an algorithm that directly searches through the possible truth functions. In the latter, getting a similar sort of direct access to the models is much more difficult. Consider, for instance, that some models will have universes with infinitely many elements, meaning that to verify the truth of a universally quantified statement requires verifying it for infinitely many elements.

Last thing I need to say about semantics before moving on to proof: there is the all-important concept of semantic entailment. A set of sentences (call it A) is said to semantically entail another sentence (call it X), if every truth function that assigns everything in A true also assigns X true. When this is the case, we write: A ⊨ X.

Examples
{(p ∧ q)} ⊨ p
{(p ∨ q), (¬q)} ⊨ p
{p, (p→q), (q→r)} ⊨ r

Proof

Notice that we defined our semantics using complicated words like “functions” and “inductive definition”. More troubling, perhaps, we seem to have engaged in a little bit of circularity when we said in our definition of the semantics that (A ∧ B) is assigned true only when A and B are both assigned true. Suppose that some pesky philosopher were to come up to us and ask us what we meant by “only when A and B are both assigned true”. What would we have to say? We couldn’t refer to our formal definition of the semantics of ∧ to ground our meaning; as this definition uses the concept of “and” in it explicitly! The circularity is even more embarrassingly vivid when in first order logic we’re asked to define the semantics of “∀”, and we say “∀x φ(x) is true whenever φ(x) is true of every object in our model”. In other words, there’s a certain sense in which our semantics is a little hand-wavy and non-rigorous. It relies on the reader already having an understanding of mathematical concepts that are usually much more advanced than the ones being defined, and indeed implicitly relies on the reader’s understanding of the ones being defined in the first place!

With this criticism of semantics fresh in our heads, we might want to ask for a purely mechanical system that would take in some starting strings and churn out new strings that follow from them according to the semantics. Such a system wouldn’t rely on the user’s understanding of any higher-order concepts; instead a mere understanding of the concept of string manipulations would suffice. That is, we want to rigorously define such a procedure so that the results of the procedure mirror the structure that we know the semantics has, but without relying on any high-level concepts or circularity. We can instead just point to the mechanical process by which strings of these symbols are manipulated and be content with that. A proof system is just such a mechanical system.

There are multiple styles of proof systems, varying primarily with respect to how many inference rules and axioms they have. On the side of many axioms and few inference rules we have Hilbert-style systems. And on the side of many inference rules and few (sometimes zero) axioms we have Gentzen-style natural deduction systems. But hold on, what are these “axioms” and “inference rules” I’m suddenly bringing up? Axioms are simply strings that can be used in a proof at any time whatsoever. Inference rules are functions that take in some set of strings and produce new ones. A proof is just an ordered list of strings, where each is either an axiom or a result of applying some inference rule to earlier strings.

For propositional logic, one common proof system involves just three axioms and one inference rule. The three: X → (Y → X), (X → (Y → Z)) → ((X → Y) → (X → Z)), and ((¬Y) → (¬X)) → (X → Y). The inference rule is the famous modus ponens, which takes the form of a function that takes as input X and X → Y, and produces as output Y.

(The discerning amongst you might notice that the “three axioms” are actually three infinite axiom schemas; for each you can populate X, Y, and Z with any three grammatical strings you like and you get a valid instance of the axioms. The even more discerning might notice that these axioms only involve the → and ¬ symbols, but we seem to be missing ∧ and ∨. That’s correct; we neglect them for economy of our proof system and are allowed to do so because all sentences with ∧ and ∨ can be translated into sentences using just → and ¬. To expand our proof system to deal with ∧ and ∨, we simply add a few more axioms to incorporate these translations.)

Soundness and Completeness

I’ve said that the purpose of a proof system is to mimic the semantics of a logic. I will now say more precisely what this mimicry amounts to.

There is a close cousin to the semantic-entailment relation (⊨). This is known as the syntactic entailment relation, and is written ⊢. We say that a set of sentences A syntactically entails a sentence X if there exists some proof that uses the sentences in A as assumptions and concludes with the sentence X. So A ⊢ X can be thought of as “X can be proven, provided that we assume A.”

To say that a proof system for a logic mimics the semantics of the logic is to say that the syntactic entailments and semantic entailments line up perfectly. This has two components:

Soundness: If A ⊢ X, then A ⊨ X.
Completeness: If A ⊨ X, then A ⊢ X.

There’s one other feature of a proof system that’s totally essential if it’s to do the job we want it to do; namely, that it should be actually computable. We should be able to write an algorithm that verifies whether any given sequence of strings counts as a valid proof according to the proof system. This is known as effective enumerability.

To have a sound, complete, and effectively enumerable proof system for a logic is to know that our semantics is on solid ground; that we’re not just waving our hands around and saying things that cannot be cashed out in any concrete way, but that in fact everything we are saying can be grounded in some algorithm. There’s a general philosophy in play here, something like “if I can compute it then it’s real.” We can rest a lot more comfortably if we know that our logic is computable in this sense; that our deductive reasoning can be reduced to a set of clear and precise rules that fully capture its structure.

Perhaps predictably, our next question is whether we really do have a sound, complete, and effectively enumerable proof system for a logic. It turns out that the answer depends on which logic is being discussed. This is already perhaps a little disconcerting; we might have hoped that every logic of any use can be nailed down concretely with a proof system. But alas, it’s not the case.

With that said, there do exist sound, complete, and effectively enumerable proof systems for propositional logic. Maybe that’s not so exciting, since propositional logic is pretty limited in terms of what sort of math you can do with it. But it turns out that sound complete and effective proof systems ALSO exist for first-order logic! And first-order logic is where we form theories like Peano arithmetic (for natural number arithmetic) and ZFC (for set theory).

In other words, there’s a logic within which one can talk about things like numbers and sets, and the semantics of this logic can be made precise with a proof system. That’s great news! Math is on sound grounds, right? Well…

You might be skeptical right now, the phrase “incompleteness theorems” bubbling into your mind, and you’re right to be. While we can form languages in first order logic whose sentences look to be describing natural numbers (in PA) or sets (in ZFC), it turns out that none of these languages have the expressive power to actually pin down these structures uniquely. Said more carefully, there is no first-order collection of sentences which has as a unique model the natural numbers. There’s a theorem called the Löwenheim-Skolem Theorem, which is one of the most shocking results in the field of mathematical logic. It says that if a first-order theory has any infinite model, then it has infinite models of every cardinality. So if we want to describe the natural numbers, we have to also be simultaneously describing structures of every cardinality. At this point, this is getting into well-covered territory for this blog, so I’ll just direct you to these posts for more detail about how this works: here, here, and here.

So, in first order logic you have a proof system that perfectly mirrors your semantics but no theory whose semantics line up with the natural numbers. There are many ways we can strengthen our semantics to give us the strength to talk about the natural numbers categorically, the most popular of which is second-order logic. In second order logic, we are allowed to quantify not only over objects in our universe, but over sets of objects as well. This turns out to be a really big deal. In second-order logic, we can actually write down a set of sentences that uniquely pick out the natural numbers as their only model. But guess what? Second-order logic has no sound and complete proof system!

This is a pattern that holds very generally. Either we have a logic with the expressive power to talk categorically about the natural numbers, in which case we have no sound and complete proof system, OR we have a logic with a sound and complete proof system, in which case we lack the ability to talk categorically about the natural numbers. No matter how we go, we end up at a loss as to how to ground the semantics of ℕ in an algorithmic procedure. And as a result, we also end up unable to have any algorithmic procedure that enumerates all the truths about natural numbers. This is what has become known as the incompleteness phenomenon.

I’ll stop there for now! Here’s a brief summary of the main concepts I’ve discussed. A logic defined by its syntax, semantics, and proof system. The syntax details the allowed symbols of the language and which combinations are grammatical. The semantics details the possible truth assignments to these sentences, consistent with our intended interpretation of the symbols. And the proof system provides a mechanical procedure for deriving new strings from old strings. A set of strings X semantically entails a string Y if every truth assignment that satisfies all the strings in X also satisfies Y. A set of strings X syntactically entails Y if there’s some proof of Y from the assumptions of X. A proof system is said to be sound and complete if its syntactic entailment relation perfectly mirrors the semantic entailment relation. And it’s said to be effectively enumerable if there’s an algorithm for determining whether a given sequence of strings counts as a proof or not. First-order logic has a sound, complete, and effectively enumerable proof system. But it lacks the expressive power to talk categorically about the natural numbers. No set of sentences in a first-order language has the natural numbers as a unique model. Second-order logic has the expressive power to talk categorically about the natural numbers, but it has no sound and complete proof system. In general, no logic has both (1) a sound, complete, and effectively enumerable proof system, and (2) theories that are categorical for the natural numbers.

The Transfinite Factorial

Chapter 1: When Circularity is not Paradoxical

Circular definitions can be quite bad. Take the definition “a is the natural number that is equal to a”, or worse, “a is the natural number that is NOT equal to a”. Or, for an even more perverse one, “a+1, where a is the smallest number that can be defined in fewer than 50 words.” But just as not all self-reference is paradoxical, similarly not all circularity is vicious. Circular definitions form a crucial component of every area of math, where they go under the more polite term “recursion.”

The key component of a benign circular definition is that it avoids infinite regress. So “a=a” is unhelpful, because to evaluate a you must first know what a evaluates to. But what about “an = n ⋅ an-1, and a0 = 1″? Now to know a500, you only need to know a499, which in turn only requires you know a498, which requires… and so on, each step moving a little closer to the base case “a0 = 1″, at which point you can travel all the way back up the chain to discover that a500 = 500!. (That’s 500 factorial, not an excitable 500.)

Douglas Hofstadter has a parable about the “porpuquine”, a rare cousin of the porcupine that grows smaller porpuquines on its back rather than quills. The market value of such a beast is determined by the total number of porpuquine noses you can find on it. And if you’re worried about an infinite regress, don’t fear! There’s a smallest porpuquine, which is entirely bald.

Recursive definitions are extremely nice when we want to define a function whose outputs are related to each other in a more simple way than they are to the inputs. Some examples (in which S stands for the successor function, which takes 0 to 1 to 2 to …):

Addition
∀x (x + 0 = x)
∀x∀y (x + S(y) = S(x+y))

Multiplication
∀x (x ⋅ 0 = 0)
∀x∀y (x ⋅ S(y) = x⋅y + x)

Notice the similarity between these two definitions. In each case we define what happens when we add 0 to a number, and then we define what happens when we add a successor to a number. This covers all the natural numbers! Let’s show this in action by proving that 9+3 = 12.

9 + 3
9 + S(2)
S(9 + 2)
S(9 + S(1))
S(S(9 + 1))
S(S(9 + S(0)))
S(S(S(9 + 0)))
S(S(S(9)))
S(S(10))
S(11)
12

The colorful step is where we hit our base case (evaluating x+0), at which point we need merely to pass our answers up the levels to get back to the solution of our original problem.

Another: Let’s show that 3⋅2 = 6.

3⋅2
3⋅S(1)
3⋅1 + 3
3⋅S(0) + 3
(3⋅0 + 3) + 3
(0 + 3) + 3
(0 + S(2)) + 3
S(0 + 2) + 3
S(0 + S(1)) + 3
S(S(0 + 1)) + 3
S(S(0 + S(0))) + 3
S(S(S(0 + 0))) + 3
S(S(S(0))) + 3
S(S(1)) + 3
S(2) + 3
3 + 3
3 + S(2)
S(3 + 2)
S(3 + S(1))
S(S(3 + 1))
S(S(3 + S(0)))
S(S(S(3 + 0)))
S(S(S(3)))
S(S(4))
S(5)
6

Each green line is where we use a “base case rule” – the first is the base case for multiplication, the second and third for addition. I like that you can see the complexity growing in the length of the line until you hit the base case, at which point the lines start shortening until we next need to apply the recursive step.

Chapter 2: Infinite Recursion

Now what happens if we want to add x + y but y is neither 0 nor a successor. “Huh? Isn’t every non-zero number a successor of some number?” I hear you ask. Not infinity! Or more precisely, not ω. Our recursive definitional scheme is all well and good for finite numbers, but it fails when we enter the realm of transfinite ordinals.

No problem! There’s an easy patch. The problem is that we don’t know how to add or multiply x and y when y is a limit ordinal. So we just add on an extra rule to cover this case!

Addition
α + 0 = α
α + S(β) = S(α + β)
α + λ = U{α + β | β ∈ λ}

Multiplication
α ⋅ 0 = 0
α ⋅ S(β) = α⋅β + α
α + λ = U{α ⋅ β | β ∈ λ}

If you’re scratching your head right now at the notion of “the union of a set of numbers” or “a number being an element of another number”, go read this for a thorough summary. In short, in ZFC we encode each number as the set of all smaller numbers (e.g. 3 = {0, 1, 2}), and ω (the first infinite number) is defined to be the set of all natural numbers (ω = {0, 1, 2, 3, …}). So 3 ⋃ 5 = {0,1,2} ⋃ {0,1,2,3,4} = {0,1,2,3,4} = 5. In general, the union of a set of numbers is just the maximum number among them (or if there is no maximum number, then it’s the supremum of all the numbers).

Outside the context of ZFC, we can equally write α + λ = lim (α + β) as β → λ. We can also write α + λ = sup {α + β | β < λ}. And for those more familiar with ZFC, we don’t need to take the union over all β ∈ λ, we can make do with any cofinal subset (we’ll do this to simplify calculations greatly later in the post).

Now we’ve extended + and × into the infinite, and what we get is a little unusual. The first major sign that things have changed is that addition and multiplication are no longer commutative.

1 + ω = ⋃{1 + n: n ∈ ω} = U{1, 2, 3, 4, …} = ω
ω + 1 = ω + S(0) = S(ω + 0) = S(ω)

ω certainly doesn’t equal S(ω), so ω + 1 ≠ 1 + ω.

Distributivity also fails! For instance (ω + 3) ⋅ 2 = ω⋅2 + 3, not ω⋅2 + 6. But we don’t lose every nice algebraic feature. Associativity still holds for both addition and multiplication! I.e. α + (β + γ) = (α + β) + γ and α ⋅ (β ⋅ γ) = (α ⋅ β) ⋅ γ. This result is really important.

Earlier we showed that 1 + ω = ω. One can easily show that in general, n + ω = ω for any finite n, and the same holds for α + β is “infinitely smaller” than β. This notion of “infinitely smaller” is a little subtle; ω is not infinitely smaller than ω⋅2, but it is infinitely smaller than ω2. (It should be noted that everything up to here has been standard textbook stuff, but from here on I will be introducing original notions that I haven’t seen elsewhere. So treat what follows this with skepticism.)

A useful definition for us will be α << β (read as “α is infinitely less than β”) if α + β = β. We’ll also say that α and β are “comparable” if neither is infinitely smaller than the other: α ~ β if ¬(α << β) and ¬(β << α). Some conjectures I have about the comparability relation:

Conjecture 1: Commutativity holds between comparable ordinals.

Conjecture 2: α ~ β if and only if there’s some finite n such that α < β⋅n and β < α⋅n.

I’m actually fairly confident that Conjecture 1 is false, and would love to see a counterexample.

Here are some more useful identities for transfinite addition and multiplication:

Theorem 1: For finite n and limit ordinal λ, α⋅(λ + n) = α⋅λ + α⋅n

Proof sketch: α⋅(λ + n) = α⋅(λ + (n-1)) + α = α⋅(λ + (n-2)) + α + α = … = α⋅(λ + 0) + α + α + … + α = α⋅λ + α⋅n.

Theorem 2: For finite n and β << α, (α + β)⋅n = α⋅n + β.

Proof sketch: (α + β)⋅n = (α + β)⋅(n-1) + (α + β) = (α + β)⋅(n-2) + (α + β) + (α + β) = … = (α + β)⋅0 + (α + β) + … + (α + β) = (α + β) + (α + β) + … + (α + β) + (α + β) = α + (β + α) + (β + … + α) + β = α + α + α + … + α + β = α⋅n + β.

In case it’s not clear what’s going with this proof, the idea is that we expand (α + β)⋅n out to (α + β) + … + (α + β) (n times). Then we shift parentheses to enclose all the middle βs, making them look like (β + α). But then by assumption, since β << α, the β vanishes in each of these middle terms, leaving just the final one: α + α + α + … + α + β = α⋅n + β.

Infinite exponentiation, tetration and so on can be defined in the exact same way as addition and multiplication: first cover the 0 case, then the successor case, and finally the limit case. And the limit case is defined by just taking the union of all smaller cases.

Chapter 3: Transfinite Factorials

Now, what’s the first function you think of when you think of recursively defined functions? That’s right, the factorial! I mentioned it at the beginning of this post, long ago: n! = n⋅(n-1)! and 0! = 1. So let’s extend this definition to obtain a transfinite factorial!

Factorial
0! = 1
S(α)! = α! ⋅ S(α)
λ! = U{β! | β ∈ λ}

Now we can ask ω! is.

Theorem: ω! = ω

Proof: ω! = U{n! | n ∈ ω} = U{1, 2, 6, 24, 120, …} = ω (since {1, 2, 6, 24, 120, …} is a cofinal subset of ω).

Next, (ω+1)! = ω! ⋅ (ω + 1) = ω ⋅ (ω + 1) = ω⋅ω + ω⋅1 = ω2 + ω. Note that to prove this, we’ve used Theorem 1 from before.

And we can keep going from there.

Theorem: (ω + n)! = ωn+1 + ωn⋅n + ωn-1⋅(n – 1) + … + ω2⋅2 + ω.

Try to prove this for yourself!

Now that we know what (ω + n)! is for each finite n, we are able to calculate what (ω⋅2)! is (using the fact that {ω + n | n ∈ ω} is a cofinal subset of ω⋅2). I’ll write its value, as well as the next few interesting ordinals:

(ω⋅2)! = U{ω!, (ω+1)!, (ω+2)!, …} = ωω
(ω⋅3)! = ωω⋅2
(ω⋅n)! = ωω⋅(n-1)

Using the same trick as before, this enables us to solve (ω2)!

2)! = U{ω!, (ω⋅2)!, (ω⋅3)!, …} = U{ω, ωω, ωω⋅2, ωω⋅3, …} = ωω2

From here, things suddenly take on a remarkable regularity (for a while at least). Looking at limit ordinals above ω2, they appear to have the following structure, at least up until we arrive at the first ordinal where standard notation fails us: ωωω = ε0.

Conjecture: For λ a limit ordinal such that ω2 ≤ λ < ε0, λ! = ωλ

If this conjecture is right, then (ωω)! = ωωω, and (ωωω)! = ωωωω, and so on. We can also write this using tetration notation by saying that (nω)! = n+1ω.

But something different happens once we get to ε0!

Conjecture: ε0! = U{(nω)! | n ∈ ω} = U{ω, 2ω, 3ω, 4ω, …} = ε0

If this is right, then this is pretty interesting behavior! The fixed points of the factorial function are 1, ω, and ε0, at the very least. What happens after ε0? The factorial function “loops” in a certain sense! Just as (ω+1)! was ω2 + ω, so (ε0+1)! ends up being ε02 + ε0. And (ε0⋅2)! = ε0ε0, paralleling (ω⋅2)! = ωω. And so on.

Conjecture: εn is a fixed point of the factorial function for all n.

There are plenty more questions to be answered. Is ω1 another fixed point of the factorial function? What exactly does the list of all the fixed points look like? Are there any cardinals that aren’t equal to their own factorial? What exactly is the general pattern for factorials of successor ordinals? How does α! compare to αα in general?

On Self-Hating Theories of Arithmetic

Gödel’s second incompleteness theorem tells us that no (sufficiently powerful) consistent theory can prove the statement of its own consistency. But did you know that such a theory can prove the statement of its own inconsistency? A consistent theory that claims to be inconsistent is what I call a self-hating theory.

My convention in what follows: ℕ refers to the real, true natural numbers, the set consisting of {0, 1, 2, 3, …} and nothing else. ω refers to the formal object that exists in a theory of arithmetic that is “supposed to be” ℕ, but inevitably (in first-order logic) fails to be.

When I write ℕ ⊨ ψ, I am saying that the sentence ψ is true of the natural numbers. When I write T ⊢ ψ (resp. T ⊬ ψ), I am saying that the sentence ψ can be (resp. can’t be) proven from the axioms of the theory T. And when I write T ⊨ ψ, I am saying that the axioms of T semantically entail the truth of ψ (or in other words, that ψ comes out true in all models of T). The next two paragraphs will give some necessary background on Gödel’s encoding, and then we’ll explore the tantalizing claims I started with.

Gödel’s breathtakingly awesome insight was that within any language that is expressive enough to talk about natural number arithmetic, one can encode sentences as numbers and talk about syntactic features of these sentences as properties of numbers. When a number n encodes a sentence ψ, we write n = ⟦ψ⟧. Then Gödel showed that you can have sentences talking about the provability of other sentences. (The next step, of course, was showing that you can have sentences talking about their own provability – sneaking in self-reference through the back door of any structure attempting to describe arithmetic.)

In particular, in any theory of natural number arithmetic T, one can write a sentence that on its surface appears to just be a statement about the properties of natural numbers, but when looked at through the lens of Gödel’s encoding, ends up actually encoding the sentence “T ⊢ ψ”. And this sentence is itself encoded as some natural number! So there’s a natural number n such that n = ⟦T ⊢ ψ⟧. It’s a short step from here to generating a sentence that encodes the statement of T’s own consistency. We merely need to encode the sentence “¬∃n (n = ⟦T ⊢ 0=1⟧)”, or in English, there’s no number n such that n encodes a proof of “0=1” from the axioms of T. In even plainer English, no number encodes a proof of contradiction from T (from which it follows that there IS no proof of contradiction from T, as any proof of contradiction would be represented by some number). We write this sentence as Con(T).

Okay, now we’re in a position to write the original claim of this post more formally. If a theory T is consistent, then ℕ ⊨ Con(T). And Gödel’s second incompleteness theorem tells us that if ℕ ⊨ Con(T), then T ⊬ Con(T). But if T doesn’t prove the sentence Con(T), then no contradiction can be derived by adding ¬Con(T) as an axiom! So (T + ¬Con(T)) is itself a consistent theory, i.e. ℕ ⊨ Con(T + ¬Con(T)). But hold on! (T + ¬Con(T)) can prove its own inconsistency! Why? Because (T + ¬Con(T)) ⊢ ¬Con(T), i.e. it proves that a contradiction can be derived from the axioms of T, and it also has as axioms every one of the axioms of T! So the same number that encodes a proof of the inconsistency of T, also counts as a proof of the inconsistency of (T + ¬Con(T))!

Summarizing this all:

ℕ ⊨ Con(T)

T ⊬ Con(T)

ℕ ⊨ Con(T + ¬Con(T)),
but
(T + ¬Con(T)) ⊢ ¬Con(T + ¬Con(T))

There we have it, a theory that is consistent but proves its own inconsistency!

Expressed another way:

T ⊢ ∃n (n = ⟦T ⊢ 0=1⟧),
but
T ⊬ 0=1

Ok, so believe it or not, a lot of the strangeness of this can be explained away by thinking about the implications of nonstandard models of arithmetic. One easy way to see this is to reflect on the fact that, as we saw above, “T is consistent” becomes in Gödel’s encoding, “There is no natural number n such that n encodes a proof of T’s inconsistency.” Or more precisely, “T is consistent” becomes “There is no natural number n such that n = ⟦T ⊢ 0=1⟧.”

Now, no first-order theory can pin down the natural numbers.

(I’ve written about this here and here.) I.e. no first order theory can express a quantification like “there is no natural number N such that …”. You can try, for sure, by defining some object ω and adding axioms to restrict its structure to look more and more like ℕ, but no matter how hard you try, no matter how many axioms you add, there will always be models of the theory in which ω ≠ ℕ. In particular, ω will be a strict superset of ℕ in all of these nonstandard models (ℕ ⊂ ω), so that ω contains all the naturals but also additional nonstandard numbers.

So now consider what happens when we try to quantify over the naturals by saying “∀x ∈ ω”. This quantifier inevitably ranges over ALL of the elements of ω in each model, so it also touches the nonstandard numbers in the nonstandard models. This means that the theory only semantically entails quantified statements that are true of all possible nonstandard numbers! (Remember, T ⊨ ψ means that ψ is true in ALL models of T.)

One nice consequence of this is that if T has a model in which ω = ℕ then in this model “∀x∈ω Φ(x)” is true only if Φ(x) is true of all natural numbers. By the completeness of first-order logic, this means that T can’t prove “∀x∈ω Φ(x)” unless it’s true of ℕ. This is reassuring; if T ⊢ ∀x∈ω Φ(x) and T has a model in which ω = ℕ, then ℕ ⊨ ∀x∈ω Φ(x).

But the implication doesn’t go the other way! ℕ ⊨ ∀x∈ω Φ(x) does not guarantee us that T ⊢ ∀x∈ω Φ(x), because T can only prove that which is true in EVERY model. So T can only prove “∀x∈ω Φ(x)” if Φ(x) is true of all the naturals and every nonstandard number in every model of T!

This is the reason that we don’t know for sure that if Goldbach’s conjecture is true of ℕ, then it’s provable in Peano arithmetic. On the face of it, this looks quite puzzling; Goldbach’s conjecture can be written as a first-order sentence and first-order logic is complete, so if it’s true then how could we possibly not prove it? The answer is hopefully clear enough now: Goldbach’s conjecture might be true of all of ℕ but false of some nonstandard models of Peano arithmetic (henceforth PA).

You might be thinking “Well if so, then we can just add Goldbach’s conjecture as an axiom to PA and get rid of those nonstandard models!” And you’re right, you will get rid of those nonstandard models. But you won’t get rid of all the nonstandard models in which Goldbach’s conjecture is true! You can keep adding as axioms statements that are true of ℕ but false of some nonstandard model, and as you do this you rule out more and more nonstandard models. At the end of this process (once your theory consists of literally all the first-order sentences that are true of ℕ), you will have created what is known as “True Arithmetic”: {ψ | ℕ ⊨ ψ}.

But guess what! At this point, have you finally ruled out all the nonstandard models? No! There’s still many many more (infinitely many, in fact! Nonstandard models of every cardinality! So many models that no cardinality is large enough to describe how many!) Pretty depressing, right? There are all these models that agree with ℕ on every first order sentence! But they are still not ℕ (most obviously because they contain numbers larger than 0, 1, 2, and all the rest of ℕ).

The nonstandard models of True Arithmetic are the models that are truly irremovable in any first-order theory of arithmetic. Any axiom you add to try to remove them will also remove ℕ as a model. And when you remove ℕ as a model, some pretty wacky stuff begins to happen.

Fully armed now with new knowledge of nonstandard numbers, let’s return to the statement I started with at the top of this post: there are consistent theories that prove their own inconsistency. The crucial point, the thing that explains this apparent paradox, is that all such theories lack ℕ as a model.

If you think about this for a minute, it should make sense why this must be the case. If a theory T is consistent, then the sentence “∀x∈ω (x ≠ ⟦T ⊢ 0 = 1⟧)” is true in a model where ω = ℕ. So if T has such a model, then T simply can’t prove its own inconsistency, as it’s actually not inconsistent and the model where ω = ℕ will be able to see that! And once more, T can only prove what’s true in all of its models.

Okay, so now supposing T is consistent (i.e. ℕ ⊨ Con(T)), by Gödel’s second incompleteness theorem, T cannot prove its own consistency. This means that (T + ¬Con(T)) is a consistent theory! But (T + ¬Con(T)) no longer has ℕ as a model. Why? Because ℕ ⊨ Con(T) and (T + ¬Con(T)) ⊨ ¬Con(T). So for any consistent theory T, (T + ¬Con(T)) only has nonstandard models. What does this mean about the things that T + ¬Con(T) proves? It means that they no longer have to be true of ℕ. So for instance, even though ℕ ⊨ Con(T + ¬Con(T)), (T + ¬Con(T)) might end up proving ¬Con(T + ¬Con(T)). And in fact, it does prove this! As we saw up at the top of this post, a moment’s examination will show that (T + ¬Con(T)) asserts as an axiom that a contradiction can be derived from the axioms of T, but also contains all the axioms of T! So by monotonicity, (T + ¬Con(T)) proves ¬Con(T + ¬Con(T)).

What do we say of this purported proof of contradiction from (T + ¬Con(T))? Well, we know for sure that it’s not a standard proof, one that would be accepted by a mathematician. I.e., it asserts that there’s some n in ω that encodes a proof of contradiction from (T + ¬Con(T)). But this n is not actually a natural number, it’s a nonstandard number. And nonstandards encode proofs only in the syntactical sense; a nonstandard proof is a proof according to Gödel’s arithmetic encoding, but Gödel’s arithmetic encoding only applies to natural numbers. So if we attempted to translate n, we’d find that the “proof” it encoded was actually nonsense all along: a fake proof that passes as acceptable by wearing the arithmetic guise of a real proof, but in actuality proves nothing whatsoever.

Summarizing:

In first order logic, every theory of arithmetic has nonstandard models that foil our attempts to prove all the truths of ℕ. Theories of arithmetic with ONLY nonstandard models and no standard model can prove things that don’t actually hold true of ℕ. In particular, since theories of arithmetic can encode statements about their own consistency, theories that don’t have ℕ as a model can prove their own inconsistency, even if they really are consistent.

So much for first order logic. What about

Second Order Logic?

As you might already know, second order logic is capable of ruling out all nonstandard models. There are second order theories that are categorical for ℕ. But there’s a large price tag for this achievement: second order logic has no sound and complete proof system!

Sigh. People sometimes talk about nature being tricky, trying to hide aspects of itself from us. Often you hear this in the context of discussions about quantum mechanics and black holes. But I think that the ultimate trickster is logic itself! Want a logic that’s sound and complete? Ok, but you’ll have to give up the expressive power to allow yourself to talk categorically about ℕ! Want to have a logic with the expressive power to talk about ℕ? Ok, but you’ll have to give up the possibility of a sound and complete proof system! The ultimate structure of ℕ remains shifty, slipping from our view as soon as we try to look closely enough at it.

Suppose that T is a second order theory that is categorical for ℕ. Then for every second-order sentence ψ that is true of ℕ, T ⊨ ψ. But we can’t make the leap from T ⊨ ψ to T ⊢ ψ without a complete proof system! So there will be semantic implications of T that cannot actually be proven from T.

In particular, suppose T is consistent. Then T ⊨ Con(T), but T ⊬ Con(T), by Gödel’s second. And since T ⊬ Con(T), (T + ¬Con(T)) is consistent. But since T ⊨ Con(T), (T + ¬Con(T)) ⊨ Con(T). So (T + ¬Con(T)) ⊨ Con(T) ∧ ¬Con(T)!

In other words, T + ¬Con(T) actually has no model! But it’s consistent! There are consistent second-order theories that are actually not logically possible – that semantically entail a contradiction and have no models. How’s that for trickiness?

Updated Introduction to Mathematical Logic

I’ve noticed that some of my posts that have gotten the most hits are my introductions to mathematical logic. Overall, I’m pretty happy with that series of posts, though I think since then I’ve come up with clearer and more precise ways of talking about the primary distinctions: syntax, semantics, and proof systems.

A little while ago, I gave a talk on the subject of “true but unprovable” statements. The goal of the talk was just to explore this notion and explain what exactly it means to a logician. In the course of developing the talk, I ended up re-summarizing the basics of mathematical logic. I think this summary makes a good “companion piece” to my earlier introductory series, as well as being more standard in its presentation. The most important slide is this one:

Here’s the link to the full slideshow: True but Unprovable

And here you can look through it on this blog (but some of the formatting got a little wacky in the embedding process).