ZFC as One of Humankind’s Great Inventions

Recently I told a friend that I thought ZFC was one of humankind’s greatest inventions. He pointed out that it was pretty bold to claim this about something that most of mankind has never heard of, which I thought was a fair objection. After thinking for a bit, I reflected that the sense of greatness I meant wasn’t really consequentialist, and thus it was independent of how many people know what ZFC is, or even how many people’s lives are affected in any way by it. Instead I intended greatness in a sort of aesthetic and intellectual sense.

The closest analogy to ZFC outside of math is the idea of a “theory of everything” for physics. If we found a theory of everything for physics, it’d likely have a bunch of important practical consequences, and that’d be part of what makes it a great invention. But it would also be a great invention in an intellectual sense, as a discovery of something fundamental and unifying of many seemingly disparate phenomena we observe. This is what ZFC is like: a mathematical theory of everything. One reason this analogy is imperfect is that due to the incompleteness theorems, we know that there can be no “theory of everything” for mathematics. (Any theory of everything will have at least one thing it can’t prove, namely its own consistency.) So ZFC’s greatness can’t come from being a perfect theory of everything, because we know that it is not. Nonetheless, ZFC serves as a foundation for virtually all known mathematics, and this is what I think is so incredible about it.

What does it mean for something to “serve as a foundation” for math? ZFC is a foundation in (at least) three ways: (1) in terms of its ability to define virtually all mathematical concepts, (2) in terms of its structures being rich enough to contain objects that come from virtually all fields of math, and (3) in terms of being an axiom system that suffices to prove virtually every result in known mathematics.

Syntax

Virtually every mathematical concept you can think of has a definition in the language of ZFC. For example, we have definitions for numbers like “π” and “√2”, sets like ℕ and ℝ, algebraic objects like the group S5 and the ring ℚ[x], geometric objects like Platonic solids and differential manifolds, computational objects like Turing machines and cellular automata, and even logical entities like models of first order theories and proofs within formal systems. What makes this especially impressive is the simplicity of the language: it uses nothing besides the basic symbols of first order logic and one binary relation symbol: ∈. So one thing that ZFC teaches us is that virtually every concept in mathematics can be defined just in terms of the set membership relation, and all mathematics can be understood as exploring the properties of this relation.

Semantics

Models of ZFC are insanely richly structured. You can navigate within them to find sets corresponding to every object that mathematicians study. π has a representative set within any model of ZFC, as does the Monster group or the torus. These representative sets are not always perfect: there are models of ZFC where ℝ is countable, for instance. But within the model, they nonetheless share enough similarities with the original objects that virtually everything you can prove about the original object, remains true of the ZFC-representative.

Proof

Finally, ZFC is a computable set of sentences, and we may inquire about what can be proven from it. Keeping up the ambition of the previous two sections, we might want to claim that all mathematical truths can be proven from ZFC. But due to the limitations of first order logic discovered over the last century, we now know that this goal is not achievable. The set of all first order truths of arithmetic is not computable, and so there must be some such truths that aren’t logical consequences of ZFC. Nonetheless, it is commonly claimed that virtually all mathematical truths can be derived from ZFC using the usual proof system for first order logic.

This is especially remarkable given the simplicity of ZFC. I believe that the intuitive content of each axiom could be explained to a smart middle schooler. Additionally, these axioms are extremely intuitively appealing. the most controversial of them has been choice, which is equivalent to the statement that the Cartesian product of non-empty sets is also non-empty. Second most controversial is probably the axiom of infinity, which just says that there’s an infinite set. The rest are even less hard to accept than these.

Now, the fact that you can prove virtually everything from ZFC doesn’t mean that you should. So don’t interpret me as saying that ZFC is of practical use to the daily work of mathematicians trying to prove things outside of set theory and logic. Again, an analogy to physics: we might discover a theory of everything that we know reproduces all the known phenomena of GR and QM, but find that it’s so hard to prove things that we are practically never better off using this theory to calculate things. Nonetheless, ZFC as a theory of everything teaches us that most of math can be understood as conceptually quite simple: the logical consequences of a fairly simple and computable set of sentences about sets. People make a big deal out of Euclid’s axiomatization of geometry, but this is a small feat relative to the axiomatization of all of mathematics.

Metamath

And not only can ZFC prove virtually everything in ordinary mathematics, but ZFC can prove much of what we know in metamathematics and logic itself. When logicians are studying model theory, or even when set theorists are studying ZFC, they are almost always working with ZFC as their meta-theory, meaning that they are making sure that all of their proofs could ultimately be expanded out as ZFC proofs. So the big results of logic, like the completeness theorem, the compactness theorem, the incompleteness theorems, the Löwenheim-Skolem theorems, are all theorems of ZFC.

The fact that ZFC can even talk about these model theoretic notions means that models of ZFC are able to talk about models of ZFC, which is where things get very meta. One can prove that every model of ZFC – every one of these crazily richly-structured universes containing virtually all of mathematics – contains another such model of ZFC. This follows from the reflection theorem, which again can be proven in ZFC!

Hopefully I have now roused enough interest in you to get you to take a look at some of the actual mathematics. You might be curious to know what exactly this theory is. And you’re in luck, it’s simple enough that I can write the whole theory in just nine lines!

Note that with the exception of the final axiom, Choice, the only symbols I’ve used are logical symbols and ∈. I used shorthand for Choice for the sake of readability, but this could be expanded out just like the others. I’m also using a convention where any free variables are considered to be universally quantified over, which shortens things further.

I’ll close with a one-sentence description for each axiom.

Extensionality: No two distinct sets have all the same elements.
Pairing: For any two sets, there’s a set containing just those two.
Union: The union of any set of sets exists.
Powerset: There is a set of all subsets of any set.
Specification: For any property Φ and any set x, you can form a set out of just those elements of x with that property.
Replacement: For any definable function and any set, the image of that set under the function exists.
Infinity: There’s an infinite set.
Regularity: Every non-empty set has a member that it shares nothing with.
Choice: For any set of nonempty sets, there is a function that picks out one element from each.

The fact that you can prove everything from the infinitude of primes to Fermat’s Last Theorem from just these basic principles, is really quite mind-blowing.

One nonstandard is worth infinitely many standards

Suppose that M is a nonstandard model of true arithmetic (the set of all first-order sentences in the language of PA that are true in the standard model of arithmetic, ℕ). Now, take any sentence φ(x) with one free variable. Suppose that there’s some nonstandard number k in M such that φ holds of k. Since k is larger than every standard natural, the following infinite set of sentences are all true:

∃x (x > 0 ∧ φ(x))
∃x (x > 1 ∧ φ(x))
∃x (x > 2 ∧ φ(x))

∃x (x > 1000000 ∧ φ(x))

Since these sentences are true in M and M is a model of true arithmetic, these sentences must also be true in the standard model ℕ. So it must be true for every standard natural that there’s a larger standard natural that satisfies φ. In other words, you can guarantee that there are infinitely many standard naturals that satisfy a property φ just by finding a single nonstandard number k that satisfies φ in a model of true arithmetic!

Furthermore, since in ℕ it is true that every standard natural has a larger standard natural satisfying φ, the sentence ∀x ∃y (y > x ∧ φ(y)) is true in ℕ. So this sentence must be true in every model of true arithmetic, including M! This means that just by finding a single nonstandard satisfying φ, you can immediately be sure that there are infinitely many standard numbers AND infinitely many nonstandard numbers (in every nonstandard model of TA!) satisfying φ. This is pretty dramatic!

As an example, consider the twin prime conjecture. We can construct the predicate isPrime(x) in first-order PA with the formula ∀y (∃z (y⋅z = x) → (y=1 ∨ y=x)). Then the predicate isTwinPrime(x) is just: isPrime(x) ∧ isPrime(x+2). Now the twin prime conjecture just says that ∀x ∃y (y > x ∧ isTwinPrime(y)), which is exactly the form we saw in the last paragraph! So to prove the twin prime conjecture, it suffices to demonstrate a single nonstandard twin prime in a model of true arithmetic.

This post is lying to you

In classical logic there are exactly two truth values, True and False, and every sentence has exactly one of these truth values. Consider the Liar sentence: “This sentence is false.” If this sentence is True, then it must be False. And if it’s False, then it must not be False. If we apply classical logic to this sentence, then it must have one of the two truth values. But whichever way we go, we find ourselves in trouble. And so the Liar sentence glitches out classical logic and produces an error.

But not so fast. For the Liar sentence to glitch out classical logic, it must first be the case that the sentence can actually be produced in classical logic. We start with an English sentence “This sentence is not true” and reason intuitively about what classical logic would do with this sentence, but we must be able to form this sentence in classical logic in order for classical logic to have anything to say about it.

There are two major hurdles with importing “This sentence is not true” into classical logic: translating “this sentence” and translating “is not true”. The first requires us to be able to have self-referential sentences. It’s not so obvious to see how we accomplish this with the machinery of classical logic. The second requires us to have a truth predicate. This perhaps seems prima facie easier to do, but it turns out that it has its own host of issues.

Let’s deal with the second issue first: how do we make sense of “is not true”? We’ll start by looking at “is true” (once we have this, then we can just negate it to get “is not true”). So, how do we make sense of “is true”?

When we say “X is true”, we aren’t assigning equality between an object X and an object True. We’re instead attributing the property of truthiness to the sentence X. So “is true” seems to act kind of like a predicate in first-order logic. But there’s a problem. Predicates apply to objects in the domain of a model, not to sentences in the language itself. For instance, the predicate “is red” might apply to firetrucks and Clifford, as objects in the universe, but it’s not going to apply to sentences in the language.

With this in mind, it appears that “is true” is actually more like a modal operator. It applies to sentences, not objects. We could introduce a modal operator T such that TX is interpreted as “X is true”, and add an axiom schema that says for every sentence φ, φ ↔ Tφ. The problem with this is that we will never get self reference with this approach. We want to create a sentence P that says “P is not true”. We could try something like P ↔ ¬TP, but this ends up just being a false sentence, not paradoxical. What’s missing is that the sentence “P ↔ ¬TP” is not equivalent to the sentence P: they’re just different sentences.

So the modal approach to interpreting “is true” has failed for our purposes. It’s simply not subtle enough to allow us to express self-reference. So let’s return to the predicate approach. The problem was that predicates apply to objects, not sentences. But what if the sentences were themselves objects? Of course, the sentences cannot literally be objects: they are purely syntactical items, whereas objects exist in the semantics (the interpretation of the language). But each sentence could have some sort of representative object in the domain.

What Gödel showed is that this is indeed possible. He designed a coding technique such that every sentence in the language gets assigned a particular natural number, and no two sentences have the same number. And if sentences correspond to numbers, then properties of those sentences can be translated into properties of those numbers!

Now if our language is sufficiently expressive to talk about natural number arithmetic, then our sentences can express properties of other sentences! In other words, we want a theory in a logic that has ℕ as a model. And we also want it to be sufficiently expressive to be able to talk about properties of numbers, like “being prime” or “being twice-divisible by 7”. Then we can imagine a predicate True(x), such that True(x) is True if and only if the sentence encoded by the number x is True.

For notational convenience, we’ll write “the number that encodes the sentence P” as ⟦P⟧. Then what we want of our truth predicate is that for every sentence φ, True(⟦φ⟧) ↔ φ.

Now, returning to the Liar sentence, we’ve dealt with “is true”, but now have to deal with “this sentence”. Remember that we want a sentence φ that asserts that the truth predicate does not apply to itself. In other words, we want φ to be the same thing as ¬True(⟦φ⟧). But how can this be? Clearly these are two different sentences, no?

Well, it’s not so obvious that φ and ¬True(⟦φ⟧) are actually distinct sentences. Remember that ⟦φ⟧ is just some number. So the sentence φ might be ¬True(9129828475651384). This is only a genuine liar sentence if 9129828475651384 encodes the sentence φ.

So really what we need to do is to look for some natural number n such that the sentence encoded by n is exactly “¬True(n)”. This would be a sentence which if true must be false, and if false must be true. It’s not at all obvious that such a natural number exists. But in 1934, Carnap proved the diagonal lemma, the tool necessary to construct such a number.

The diagonal lemma says that in any theory that can express natural number arithmetic (specifically, a theory that can define all primitive recursive functions), and for every predicate P(x), there’s a sentence ψ such that ψ ↔ P(⟦ψ⟧) is provable. Let P(x) be equal to ¬True(x), and we get that there’s a sentence ψ such that ψ ↔ ¬True(⟦ψ⟧) is provable!

In other words, there’s a sentence ψ encoded by a number n, such that ψ is true if and only if ¬True(n). This is exactly the liar paradox! We’ve succeeded at sneaking in a contradiction to classical logic! So what does this mean? Is classical logic ultimately all inconsistent? Do we need to rebuild logic from the ground up?

Not quite! Notice that to actually get to the point where we could express the Liar sentence, we had to take on a lot of assumptions. Let’s list them all out:

  1. Our language allows us to express natural number arithmetic.
  2. Our theory of natural numbers is strong enough to allow Gödel coding.
  3. Our theory of natural numbers is strong enough to express every primitive recursive function.
  4. There is a truth predicate.

From these assumptions, we were able to prove an inconsistency. But this doesn’t mean that classical logic is therefore inconsistent! Rather, it means that any consistent theory has to violate at least one of these assumptions! In particular, if we have a consistent theory that allows us to do both Gödel coding and to express primitive recursive functions, then this theory cannot have a truth predicate!

It’s important to understand that #4 here really is an assumption. When I described a truth predicate, I said “we can imagine that such a predicate exists.” I never showed you how to explicitly construct it! We could always explicitly add in a truth predicate T to a theory of arithmetic, and then assert as axioms φ ↔ T(⟦φ⟧) for every sentence φ. And the above line of reasoning shows us that doing so will render our theory inconsistent. If we don’t explicitly add in a truth predicate, then we could try to construct it from primitive relation and function symbols of the language. But the above line of reasoning shows us that no matter how hard we try, we will never succeed in this construction!

It’s interesting to note that (2) and (3) are actually different assumptions. (3) implies (2), but (2) doesn’t imply (3). In other words, you can have very weak theories of arithmetic that are expressive enough to do Gödel coding, but not expressive enough to prove the diagonal lemma! The amazing feature of these theories is that it’s possible for them to prove their own consistency without becoming inconsistent!

Finally, notice that the diagonal lemma was quite a bit more powerful than what we strictly required for our reasoning above. In particular, it allowed us to talk about ANY predicate whatsoever, not just a truth predicate. Consider what happens when instead of using “is true” as our predicate, we use “is provable”. You might get a somewhat interesting result!

Some half-baked thoughts on moral arbitrariness

I find that there’s often a crucial assumption implicit in discussions of abortion ethics. It comes up at mentions of when personhood arises in the development from zygote to fetus to baby. One person claims that some particular moment is the threshold at which personhood arises. The other points out that zooming in to that moment and looking at extremely nearby moments, we see no particular reason to privilege one over the others. This arbitrariness is taken to be a fatal blow for the account of personhood.

This raises an interesting question. Could the fundamental moral laws be arbitrary? By analogy, think about the laws of physics. The laws of physics contain certain parameters like the gravitational constant G and me/mp, the ratio of the mass of an electron to the mass of a proton, whose values are likely arbitrary to some degree. Even taking into account fine-tuning for life, it’s probable that the fine-tuning isn’t infinitely precise and there’ll be some level of arbitrariness in the 1000th decimal place value of G.

If the laws of physics can be arbitrary, why not the laws of morality? Perhaps there’s just one arbitrary point at which moral personhood emerges, and there’s not much motivation for that point over any other. How strange you consider this to be will likely depend on what your meta-ethical theory is. Trivially, if you don’t think there are moral facts at all, then this puzzle never even arises for you. If you think there are moral facts, but there’s somehow socially or biologically determined, then it’s not so puzzling that there would be arbitrariness in the moral facts. But if you’re a moral realist that believes in an objectively true set of laws governing morality, then this view starts to look strange.

Among moral objectivists, it seems to me like anti-Humeans would not be okay with arbitrariness in the laws of morality. In meta-ethics, anti-Humeans are those who believe that moral facts are intrinsically motivating. This doesn’t mesh well with arbitrariness. If the moral laws are arbitrary, then why should I follow them rather than a neighboring set of laws that work just as well? Almost by definition, arbitrariness in the moral laws implies a lack of motivation, both motivation for the letter of the laws and motivation to live by the laws. On the other hand, if one takes a Humean stance on meta-ethics, perhaps arbitrariness is not so puzzling.

Moral arbitrariness might also be troubling to divine command theorists, who believe that the moral rules are set by God. There’s something that seems quite strange about saying that God’s commands are arbitrary to some extent (though to be fair, I say this from a very atheistic perspective, so perhaps my intuitions differ from theists here). But if this feels strange, then why shouldn’t it feel just as strange to say that the laws of the physical universe are arbitrary? Presumably God also decided on the precise values of all the physical parameters, and there seems to be arbitrariness there. Is there something particularly troubling about the idea that God’s choice of moral laws is arbitary?

Moral arbitrariness seems like an inevitable consequence of most, maybe all, moral systems. A rights-based approach has to deal with tradeoffs between different rights: how severe a breach of bodily autonomy is severe enough that it’s better to violate a person’s right to life? Any binary account of personhood seems bound to end up drawing the line at some arbitrary point. And gradualist accounts of personhood come with their own type of arbitrariness: why should the curve of increasing personhood with time look like precisely this, rather than some other very similar curve? Virtue theoretic approaches talk about virtues arising in a happy medium between two vices (e.g. bravery arising between cowardice and foolhardiness), but where is the precise middle point? If one were to completely codify virtue ethics, they would have to say precisely what level of riskiness is bravery, and when it tips over into foolhardiness. But there will always be thought experiments that place you just barely on either side of this threshold and reveal that there is no apparent moral difference between one side and the other.

Perhaps the framework that has the least trouble with moral arbitrariness is consequentialism. Something like utilitarianism says that the threshold for when you should choose Act 1 over Act 2 is exactly when the expected net utility produced by Act 1 exceeds the expected net utility produced by Act 2 (where utility is something like “happiness minus sadness”). Unfortunately, I think that this approach runs in to problems as well. Happiness is not one-dimensional, and neither is suffering. How do you make different types of happiness commensurable? How many sips of hot chocolate are equivalent to a roller-coaster ride? How many minutes in front of a fire on a cold night are equivalent to the moment of insight when you solve a tough mathematical problem? I find it hard to imagine that non-arbitrary answers to these types of questions exist.

If it’s true that most all moral frameworks contain fundamental arbitrariness, as I believe it is, then I think that this turns into a powerful argument against many types of moral realism. If you’re an anti-Humean, then you have to either deny the arbitrariness or explain why arbitrary moral laws would be intrinsically motivating to us. If you think that God created the moral laws, then you have to reckon with the apparent arbitrariness of those laws. Presumably God always makes the optimal choice when one exists, but what does God do when faced with a choice where there is no optimum?

Biblical inerrancy

A puzzling phenomenon is the existence of Biblical inerrantists. It seems to me to be impossible to have both (1) carefully read the Bible and (2) come to the conclusion that it’s inerrant. Bart Ehrman talks about a possible explanation for this phenomenon in the order in which people read the Gospels: if you read the Gospels all the way through, one at a time, rather than reading them simultaneously, side by side, then it’s much easier for you to fail to notice all of the discrepancies. And boy are there discrepancies!

Virtually no story that appears in multiple Gospels is identical in the different tellings. That’s not hyperbole; literally from the story of Jesus’s birth all the way to his crucifixion and resurrection, there are unambiguous contradictions to be found the entire way. I don’t think that these contradictions make it rationally impossible to be a Christian, but they certainly do make it rationally impossible to be a Christian of an inerrantist tradition. And for more liberal Christians, they face a serious challenge of how they justify placing such enormous stock in the wording of a text that is known to be error-ridden.

There are just too many examples of blatant contradictions to go through them all. It’s a remarkable fact that many Christians that have read these stories throughout their lives are completely unaware that they disagree with one another! What I want to do here is just to pick one of the most well-known stories, the empty tomb. As I go through the story as each Gospel tells it, if at any moment you feel skeptical of what I’m saying, just go look at the verses being cited for yourself! The source of all the following quotes is the New International Version (NIV).

I’ve copied the entire “empty tomb” story as it’s told in each of the Gospels and highlighted the differences.

Now, let’s test your reading comprehension! In the story of the empty tomb, how many women come to the tomb, one, two, or three? Depends on who you read! According to Mark, it was three (the two Marys and Salome). According to Matthew, it’s two (the two Marys). According to John it’s just Mary Magdalene. And according to Luke it’s some unspecified number more than 1.

How many entities do they encounter at the tomb, and are they ordinary men or angels? According to Mark, they see one young man already inside the tomb. Luke says that after they enter, two men suddenly appear beside them. Matthew describes a violent earthquake preceding the arrival of an angel from heaven, while still outside the tomb. And John describes two angels inside the tomb (seen by Mary from the outside of the tomb). What’s more, in John and Matthew the woman/women see and talks to Jesus at the tomb! You have to agree that this is a pretty noteworthy thing for Mark and Luke to leave out.

Ok, how about the stone in front of the entrance? When the women/woman arrive(s), is the stone already rolled away from the tomb (as in Mark, Luke, and John), or is it moved later (as in Matthew)?

When the one/two men/angels speak to the woman/women, do they say that Jesus will meet the disciples in Galilee? In Mark and Matthew, yes. But in Luke and John there’s no mention of this! And in fact, in Luke the disciples don’t go to Galilee to meet Jesus at all! Jesus appears to disciples in Jerusalem and tells them to stay there, which they do until the end of the gospel (Luke 24:51-53). He ascends right outside of Jerusalem (1:9,12)

When the woman/women leave the tomb, do they describe what they saw to the disciples or not? According to Luke, Matthew, and John, yes. But not according to Mark; in Mark, the women flee and in their fear “say nothing to anyone”!

Did Peter ever visit the tomb? Not in Matthew or Mark, but in Luke and John he does. Is he by himself or with another disciple? Matthew says he’s by himself, John describes another disciple with him.

So much for the empty tomb! This level of contradiction is not special to this story. Think about Jesus’s death. When did he die? This is one of the most blatant contradictions in the Bible, because both John and Mark take great pains to explicitly lay out their chronology. According to Mark, Jesus and his disciples have their last supper on the evening of the Passover (Mark 14:12-17), and the following morning he is taken to be crucified (Mark 15:1). In John there is no last supper! John explicitly states in John 19:14 that Jesus is taken away from crufixion on “the day of Preparation of the Passover”, that is, the day before Passover!

This business about the last supper is actually pretty interesting; in John there is no last supper, but the author still manages to fit in some of the dinner table discussion early in the narrative. In Matthew, Mark, and Luke, it is during the last supper that Jesus says that the bread is his body (Mark 14:22, Matthew 26:26, Luke 22:19) and the wine his blood (Mark 14:24, Matthew 26:28, Luke 22:20). In John, these things are said some 12 chapters before his arrest and crucifixion (John 6:32-58). The context is ENTIRELY different within John; there, it comes up after the miracle where he multiplies the loaves of bread and fish. His disciples talk to him about this miracle, and he responds with the famous line “I am the bread of life”. This upsets some of the Jews around him, and starts an argument about how Jesus can BE the bread, in response to which Jesus doubles down and says “I am the living bread that came down from heaven” and “Very truly I tell you, unless you eat the flesh of the Son of Man and drink his blood, you have no life in you.” Thus the origin of this doctrine. Given the importance that Catholics place on the wording here, it should be disturbing to them that there’s such dramatic disagreement between the Gospels on the context in which he said it!

Let’s now quickly run through some more Biblical contradictions. Who was the father of Joseph, Jesus’s father? According to Matthew 1:15, “Jacob begat Joseph, the husband of Mary”, whereas according to Luke 3:23, Joseph is said to be “the son of Heli”. The genealogies presented in Matthew and Luke are virtually in complete disagreement starting two generations up from Jesus. Apologists will often argue that one of the two is presenting the maternal lineage rather than the paternal line, but this is far from obvious when you look at the wording, which is specifically about Joseph’s father not Mary’s (plus the fact that in both genealogies, the entire rest of the list follows only the paternal line.)

In Mark 5:21-24, Jairus comes to Jesus before his daughter dies and asks him to heal her (“My little daughter is dying. Please come and put your hands on her so that she will be healed and live”), but in Matthew 9:18-20 the daughter has already died by the time Jairus comes to Jesus (“My daughter has just died. But come and put your hand on her, and she will live”).

In Mark 15:37-39, the curtain that separates the holy of holies from the rest of the temple rips in two after Jesus dies. And in Luke 23:45-46, it rips before.

In Matthew 2:1-23, Joseph Mary and Jesus flee to Egypt (250 miles away) after Jesus’s birth, where they stay until King Herod dies, after which they resettle in Nazareth. But in Luke 2:1-40, Joseph and the fam do their rites of purification in Bethlehem after birth and return to Nazareth directly, 41 days after Jesus’s birth.

Also in Luke 2, it is described that Joseph and Mary travel to Galilee for a census declared by decree by Caesar Augustus to be “taken of the entire Roman world.” The problem with this is that we have good historical records of Caesar Augustus, and no such census took place!

One final one: in Mark 2:25-26, Jesus references an Old Testament passage about David eating unlawfully eating consecrated bread “in the days of Abiathar the high priest.” There’s a big problem with this: Jesus made a mistake! In the Old Testament passage, Abiathar wasn’t the high priest! The high priest was Ahimelech, whose son Abiathar would much later become high priest (1 Sam 21.1-7). So Christians have a choice to make between either Jesus not knowing his Old Testament or Mark not being an inerrant recording of Jesus’s sayings.

✯✯✯

All these contradictions are begging for an explanation. Is one or more of the authors lying? Not necessarily. Lying implies intention, and it’s worth keeping in mind the timeline of the Bible. Jesus is purported to have lived from 0 to 30 AD. Scholars unanimously agree that the earliest of the Gospels is Mark, and that it was originally written around 70 AD. Next were Matthew and Luke, both around 80-85 AD, and finally came John, around 90-95 AD. That’s a gap of 40 to 65 years from the events that are described! What’s more, the authors of Mark, Matthew, and John were almost certainly not the actual historical Mark, Matthew, and John (for a bunch of reasons I won’t get into now, most obviously that these texts were written in Greek by highly educated individuals and all three of these individuals were uneducated and would not have known Greek). And of course, Luke wasn’t a disciple and never met Jesus personally.

So the first texts that are written are from non-eyewitnesses recording an oral tradition that had started forty to sixty-five years before! In a forty-year game of telephone, nobody needs to have lied in order for the original story to become warped beyond recognition. Anybody that doubts that stories can become so dramatically altered over time need only think about the many Trump supporters that to this day insist that Trump’s inauguration had more attendees than Obama’s, despite LITERAL TIME LAPSE FOOTAGE of the entire thing and photographs all throughout. In one poll, Trump and Clinton voters that were handed these two photos, one from Obama’s inauguration and the other from Trump’s:

And guess what? 15% of Trump voters said that the left photo has more people! Suffice it to say, in the presence of emotionally charged topics like religion and politics, human brains start to act funny. Put this in context: this is an event from four years ago that we have video records of. And it’s somehow supposed to be unimaginable that 40 years of word-of-mouth transmission by religious believers made any significant changes to the original stories?

It’s even worse than this. These first texts are not what our modern Gospels are based off of, simply because we don’t have any copies of these first texts! The first copies of the texts that we possess come over A HUNDRED YEARS LATER, meaning that we have more than a hundred years of scribes making copies of copies of copies of the original texts. We know for a fact that these scribes were not perfect copyists, from the thousands of copies of the Gospels we possess, which abound in mistakes as small as spelling errors to as large as entire missing stories or new stories that had never appeared before. I’m sure that you know the story of the adulteress who Jesus defends, saying “he that is without sin among you, let him first cast a stone at her.” Did you know that this story appears in none of our earliest copies of the Gospels? Scholars unanimously agree that this story was added by scribes hundreds of years after the original writing, both because it literally doesn’t appear in copies earlier than that and also because the writing style is different from the rest of John.

So it seems to me like there really is no mystery here once you learn about the actual history of the texts. There are contradictions in the Bible because the Bible is an extremely imperfect copy of a copy of a copy of a … copy of a text written by non-eyewitnesses that heard stories told by people who had heard stories told by… people who had heard stories told by eyewitnesses to the events.

Formal Semantics 1: Historical Prelude and Compositionality

English is really complicated. For a long time, logicians looking at natural languages believed that there could be no formal system detailing their grammar and semantics. They resigned themselves to extremely simple idealized fragments of English, like propositional logic (formalizing “and”, “not”, and “or”) and first-order logic (formalizing “every”, “some”, and “is”).

The slogan of the time was “ordinary language has no logic” (Bertrand Russell and Peter Strawson). Chomsky famously argued that the languages invented by logicians were too artificial and entirely unlike natural languages, and that therefore the methods of logicians simply couldn’t be applied to this more complex realm.

This attitude has changed over time. Perhaps the most important figure in the “logic of natural language” movement is Richard Montague, a student of the giant of logic Alfred Tarski. The first line of his paper English as a Formal Language reads “I reject the contention that an important theoretical difference exists between formal and natural languages”, and he follows this up by more or less single-handedly invented formal semantics, now a thriving field. Hilariously, Montague apparently saw this work as child’s play, writing:

I (…) sat down one day and proceeded to do something that I previously regarded, and continue to regard, as both rather easy and not very important — that is, to analyze ordinary language.

(This had to hit hard for linguists of his time.)

Alright, enough prologue. In the next few posts I want to describe a naive first pass at formalizing a fairly substantial fragment of English, modeled off of Montague semantics. The key concept throughout will be the notion of compositionality, which I’ll briefly describe now.

Compositionality

Compositionality is all about how to construct the meaning of phrases from their smaller components. Take a sentence like “The cat sat on the mat.” The meaning of this sentence clearly has something to do with the meanings of “the cat” and “sat on the mat”. Similarly, the meaning of “sat on the mat” must have something to do with the meanings of “sat”, “on”, “the”, and “mat”.

The compositionality thesis says that this is all that determines the meaning of “the cat sat on the mat.” In other words, the meaning of any phrase is a function of the meanings of the individual words within it. These meanings are composed together in some way to form the meaning of the sentence as a whole.

The natural question that arises now is, what is the nature of this composition? Take a very simple example: “Epstein died.” According to compositionality, the meaning of “Epstein died” depends only on the meanings of “Epstein” and “died”. That seems pretty reasonable. What about: “Epstein died suspiciously”? How do we compose the meanings of the individual words when there are three?

One proposal is to compose all three simultaneously. That’s possible, but a simpler framework would have us build up the meanings of our sentences iteratively, composing two units of meaning at a time until we’ve generated the entire sentence’s meaning.

Let me now introduce some notation that allows us to say this compactly. If X is some word, phrase, or sentence, we’ll denote the meaning of X as ⟦X⟧. Then the principle of binary compositionality is just that there’s some function F such that ⟦X Y⟧ = F(⟦X⟧, ⟦Y⟧).

There’s two major questions that arise at this point.

First, in which order should we compose our units of meaning? Should we combine “Epstein” with “died” first, and then combine that with “suspiciously”? Or should it be “Epstein” and “suspiciously” first, then that with “died”? Or should we combine “Epstein” with the combination of “suspiciously” and “died”?

One might suggest here that the order actually doesn’t matter; no matter what order we combine the meanings in, we should still get the same meaning. The problem with this is that “The Clintons killed Epstein” has a different meaning than “Epstein killed the Clintons.” If order of composition didn’t matter, then we’d expect these to mean the same thing.

Second, how exactly does composing two meanings work? Is there a single rule for composition, or are there multiple different rules that apply in different contexts? It would be most elegant if we could find a single universal rule for generating meanings of complicated phrases from simple ones, but maybe that’s overambitious.

For instance, you might model the meaning of “died” as a set of objects, namely all those objects that died at some moment in the past, and the meaning of “Epstein” as one particular object in the universe. Then we might have our composition rule be the following: ⟦Epstein died⟧ will be a truth value, and it will be True if and only if the object denoted by “Epstein” is within the set of objects denoted by “died”. So in this framework, ⟦X Y⟧ = True if and only if ⟦X⟧ ∈ ⟦Y⟧.

This works nicely for “Epstein died”. But what about “Epstein died suspiciously”? Now we have two compositions to do, and the order of composition will matter. The problem is that no matter how we compose things, it seems not to work. Suppose that we combine “died” and “suspiciously” first, then combine “Epstein” with that. Using our model, ⟦died suspiciously⟧ will be True if and only if ⟦died⟧ ∈ ⟦suspiciously⟧, which is already a little bit weird. But even worse, ⟦Epstein died suspiciously⟧ will be True if and only if ⟦Epstein⟧ ∈ ⟦died suspiciously⟧. But what would it mean for the object denoted by “Epstein” to be an element of a truth value? It looks like in this framework, most three-word sentences end up becoming vacuously false.

Anyway, the last two paragraphs only show us that one particular attempt to formalize composition fails to be universal. It doesn’t show that it’s impossible in general. In fact, we’ll end up doing pretty well with a small set of composition rules centered around function application. The idea can be very simply phrased as: ⟦X Y⟧ = ⟦X⟧(⟦Y⟧). And in particular, the meaning of “Epstein died suspiciously” will be ⟦suspiciously⟧(⟦died⟧)(⟦Epstein⟧). And that’s enough warm-up! Next we’ll explore this idea further and dive into our Montague-style system.

ZFC, and getting the right answer by accident

There’s something I’m confused about with regards to ZFC. We already know that no first order theory allows you to perfectly capture the semantics of concepts like the natural numbers or finitely many. The best we can do if restricted to first-order logic (which is where most of modern mathematics tries to stay) is to have theories with surrogates for those concepts.

For instance, there’s this object called ω that stands in for the natural numbers in ZFC, even though there are all these models of ZFC in which ω contains many more objects besides the natural numbers (in fact, uncountably many in some models). And “X is finite” is replaced by “there’s a bijection from X to an element of ω.”

In formalizing these surrogate concepts, we make sure to not include any false statements about the concepts, which gives us a type of soundness of our surrogate concept. I.e., ZFC isn’t going to prove things about ω that are false of the set of natural numbers, because in one model ω is the set of natural numbers.

But it doesn’t give us completeness. We aren’t going to be able to prove ALL the true first-order sentences about the natural numbers, or about the concept of finiteness. (This is of course just a product of the first-order nature of the theories in which we’re defining these surrogates.)

So in each of these cases there will be true statements involving the concept that our theory will be agnostic about. We can look for “really good” surrogates, surrogates which allow us to prove most of the important and interesting true statements about the concept, and only fail in very abstract and unusual settings. The degree to which we can make good surrogates is the degree to which a first-order thinker can make sense of and usefully apply non-first-order concepts. (A first-order thinker being one that has no built-in understanding of non-first-order concepts.)

So one general question is: how good is a given surrogate? And another is, how do we know based on the axioms how good of a surrogate we’ll be getting? This is the thing I’m confused about.

In ZFC, there’s this weird phenomenon of the theory “getting the right answers accidentally.” It’s a little tough to put into words, but here’s an example:

ZFC can prove that |P(X)| > |X| for all X. So for instance ZFC can prove that |P(ω)| > |ω|. Meaning that ZFC can prove that the power set of the naturals is uncountable. But ZFC has countable models! (As does every first-order theory.) In those models, the power set of the naturals is NOT uncountable.

First order logic is sound, so what’s going on ISN’T that ZFC is proving a sentence that’s false in some of its models. It’s that the sentence is false in that model, if interpreted to be about the actual concept, and true in that model if interpreted to be about the surrogate concept. The surrogate for “P(ω) is uncountable” in ZFC is “there’s no bijection from P(ω) to ω”. And in the countable models of ZFC, the bijection that would prove the equinumerosity of ω and P(ω) is missing! So in those models, it’s actually TRUE that “there’s no bijection from P(ω) to ω”, even though P(ω) and ω really do have the same number of elements.

This is the sort of subtle nonsense you get used to in mathematical logic. Two accidents cancel out: first ZFC makes the mistake of having models where P(ω) is countable, and second it makes the mistake of losing track of the bijection from P(ω) and ω. And as a result, ZFC is able to prove the correct thing.

This seems really weird to me, and it’s an unsettling way for a supposed foundations of mathematics to be getting the right answers. This type of thing happens a lot, and it feels like we keep getting “lucky” in that our unintended interpretations of a sentence interfere with each other and cancel out their problematic features. It makes me wonder why we should be confident that ZFC will continue giving us the right answers (as opposed to being agnostic on important basic questions). And in fact we do have some examples of important and basic questions that ZFC is agnostic on, most dramatically that if |X| < |Y|, then |P(X)| < |P(Y)|!

It’s not that I doubt that ZFC’s surrogates for non-first order concepts end up allowing us to prove an enormous amount of the true facts about these concepts, because they clearly do. But is there some principled reason we can give for WHY the axioms of ZFC lead us to such a robust framework for mathematics in which we can prove many true statements?

One thing that suggests this is the power of the axiom of replacement. It’s just an extremely strong and useful axiom that allows us to prove a whole lot. But that doesn’t seem to help explain the “right-by-accident” phenomenon. So what does?

Who would win in a fight, logic or computation?

Which paradigm is stronger? When I say logic here, I’ll be referring to standard formal theories like ZFC and Peano arithmetic. If one uses an expansive enough definition to include computability theory, then logic encompasses computation and wins by default.

If you read this post, you might immediately jump to the conclusion that logic is stronger. After all, we concluded that in first order Peano arithmetic one can unambiguously define not only the decidable sets, but the sets decidable with halting oracles, and those decidable with halting oracles for halting oracles, and so on forever. We saw that the Busy Beaver numbers, while undecidable, are definable by a Π1 sentence. And Peano arithmetic is a relatively weak theory; much more is definable in a theory like ZFC!

On the other hand, remember that there’s a difference between what’s absolutely definable and what’s definable-relative-to-ℕ. A set of numbers is absolutely definable if there’s a sentence that is true of just those numbers in every model of PA. It’s definable-relative-to-ℕ if there’s a sentence that’s true of just those numbers in the standard model of PA (the natural numbers, ℕ). Definable-relative-to-ℕ is an unfair standard for the strength of PA, given that PA, and in general no first order system, is able to categorically define the natural numbers. On the other hand, absolute definability is the standard that meets the criterion of “unambiguous definition”. With an absolute definition, there’s no room for confusion about which mathematical structure is being discussed.

All those highly undecidable sets we discussed earlier were only definable-relative-to-ℕ. In terms of what PA is able to define absolutely, it’s limited to only the finite sets. Compare this to what sets of numbers can be decided by a Turing machine. Every finite set is decidable, plus many infinite sets, including the natural numbers!

This is the first example of computers winning out over formal logic in their expressive power. While no first order theory (and in general no theory in a logic with a sound and complete proof system) can categorically define the natural numbers, it’s incredibly simple to write a program that defines that set using the language of PA. In regex, it’s just “S*0”. And here’s a Python script that does the job:

def check(s):
    if s == '':
        return False
    elif s == ‘0’:
        return True
    elif s[0] == ’s’:
        return check(s[1:])
    else:
        return False

We don’t even need a Turing machine for this; we can build a simple finite state machine:

In general, the sets decidable by computer can be much more complicated and interesting than those absolutely definable by a first-order theory. While first-order theories like ZFC have “surrogates” for infinite sets of numbers (ω and its definable subsets), these surrogate sets include nonstandard elements in some models. As a result, ZFC may fail to prove some statements about these sets which hold true of ℕ but fail for nonstandard models. For instance, it may be that the Collatz conjecture is true of the natural numbers but can’t be proven from ZFC. This would be because ZFC’s surrogate for the set of natural numbers (i.e. ω) includes nonstandard elements in some models, and the Collatz conjecture may fail for some such nonstandard elements.

On top of that, by taking just the first Turing jump into uncomputability, computation with a halting oracle, we are able to prove the consistency of ZFC and PA. Since ZFC is recursively axiomatizable, there’s a program that runs forever listing out theorems, such that every theorem is output at some finite point. We can use this to produce a program that looks to see if “0 = 1” is a theorem of ZFC. If ZFC is consistent, then the program will search forever and never find this theorem. But if ZFC is not consistent, then the program will eventually find the theorem “0 = 1” and will terminate at that point. Now we just apply our halting oracle to this program! If ZFC is consistent, then the halting oracle tells us that the program doesn’t halt, in which case we return “True”. If ZFC is not consistent, then the halting oracle tells us that the program does halt, in which case we return “False”. And just like that, we’ve decided the truth value of Con(ZFC)!

The same type of argument applies to Con(PA), and Con(T) for any T that is recursively axiomatizable. If you’re not a big fan of ZFC but have some other favored system for mathematics, then so long as this system’s axioms can be recognized by a computer, its consistency can be determined by a halting oracle.

Some summary remarks.

On the one hand, there’s a very simple explanation for why logic appears over and over again to be weaker than computation: we only choose to study formal theories that are recursively axiomatizable and have computable inference rules. Of course we can’t do better than a computer using a logical theory that can be built into a computer! If on the other hand, we chose true arithmetic, the set of all first-order sentences of PA that are true of ℕ, to be our de jure theory of mathematics, then the theorems of mathematics could not be recursively enumerated by any Turing machine, or indeed any oracle machine below 0(ω). So perhaps there’s nothing too mysterious here after all.

On the other hand, there is something very philosophically interesting about truths of mathematics that cannot be proven just using mathematics, but could be proven if it happened that our universe gave us access to an oracle for the halting problem. If a priori reasoning is thought to be captured by a formal system like ZFC, then it’s remarkable that there are facts about the a priori (like Con(ZFC)) that cannot possibly be established by a priori reasoning. Any consistency proof cannot be provided by a consistent system itself, and going outside to a stronger system whose consistency is more in doubt doesn’t help at all. The only possible way to learn the truth value of such statements is contingent; we can learn it if the universe contains some physical manifestation of a halting oracle!

Computing truth values of sentences of arithmetic, or: Math is hard

Previously I talked about the arithmetic hierarchy for sets, and how it relates to the decidability of sets. There’s also a parallel notion of the arithmetic hierarchy for sentences of Peano arithmetic, and it relates to the difficulty of deciding the truth value of those sentences.

Truth value here and everywhere else in this post refers to truth value in the standard model of arithmetic. Truth value in the sense of “being true in all models of PA” is a much simpler matter; PA is recursively axiomatizable and first order logic is sound and complete, so any sentence that’s true in all models of PA can be eventually proven by a program that enumerates all the theorems of PA. So if a sentence is true in all models of PA, then there’s an algorithm that will tell you that in a finite amount of time (though it will run forever on an input that’s false in some models).

Not so for truth in the standard model! As we’ll see, whether a sentence evaluates to true in the standard model of arithmetic turns out to be much more difficult to determine in general. Only for the simplest sentences can you decide their truth value using an ordinary Turing machine. And the set of all sentences is in some sense infinitely uncomputable (you’ll see in a bit in what sense exactly this is).

What we’ll discuss is a way to convert sentences of Peano arithmetic to computer programs. Before diving into that, though, one note of caution is necessary: the arithmetic hierarchy for sentences is sometimes talked about purely syntactically (just by looking at the sentence as a string of symbols) and other times is talked about semantically (by looking at logically equivalent sentences). Here I will be primarily interested in the entirely-syntactic version of the arithmetic hierarchy. If you’ve only been introduced to the semantic version of the hierarchy, what you see here might differ a bit from what you recognize.

Let’s begin!

The simplest types of sentences have no quantifiers at all. For instance…

0 = 0
2 ⋅ 2 < 7
(2 + 2 = 4) → (2 ⋅ 2 = 4)

Each of these sentences can be translated into a program quite easily, since +, ⋅, =, and < are computable. We can translate the → in the third sentence by converting it into a conjunction:

## (2 + 2 = 4) → (2 ⋅ 2 = 4)
not(2 + 2 == 4 and not 2 * 2 == 4)

Slightly less simple-looking are sentences with bounded quantifiers:

∀x < 10 (x + 0 = x)
∃x < 100 (x + x = x)
∀x < 5 ∃y < 7 (x > 1 → x⋅y = 12)
∃x < 5 ∀y < x ∀z < y (y⋅z ≠ x)

In each of these examples, the bounded quantifier could in principle be expanded out, leaving us with a finite quantifier-free sentence. This should suggest to us that adding bounded quantifiers doesn’t actually increase the computational difficulty.

We can translate sentences with bounded quantifiers into programs by converting each bounded quantifier to a for loop. The translation slightly differently depending on whether the quantifier is universal or existential:

def Aupto(n, phi):
    for x in range(n):
        if not phi(x):
            return False
    return True
def Elessthan(n, phi):
    for x in range(n):
        if phi(x):
            return True
    return False

Note that the second input needs to be a function; reflecting that it’s a sentence with free variables. Now we can quite easily translate each of the examples, using lambda notation to more conveniently define the necessary functions

## ∀x<10 (x + 0 = x)
Aupto(10, lambda x: x + 0 == x)

## ∃x<100 (x + x = x)
Elessthan(100, lambda x: x + x == x)

## ∀x<5 ∃y<7 ((x > 1) → (x*y = 12))
Aupto(5, lambda x: Elessthan(7, lambda y: not (x > 1 and x * y != 12)))

## ∃x<5 ∀y<x ∀z<y (y⋅z ≠ x)
Elessthan(5, lambda x: Aupto(x, lambda y: Aupto(y, lambda z: y * z != x)))

Each of these programs, when run, determines whether or not the sentence is true. Hopefully it’s clear how we can translate any sentence with bounded quantifiers into a program of this form. And when we run the program, it will determine the truth value of the sentence in a finite amount of time.

So far, we’ve only talked about the simplest kinds of sentences, with no unbounded quantifiers. There are two names that both refer to this class: Π0 and Σ0. So now you know how to write a program that determines the truth value of any Σ00 sentence!

We now move up a level in the hierarchy, by adding unbounded quantifiers. These quantifiers must all appear out front and be the same type of quantifier (all universal or all existential).

Σ1 sentences: ∃x1 ∃x2 … ∃xk Phi(x1, x2, …, xk), where Phi is Π0.
Π1 sentences: ∀x1 ∀x2 … ∀xk Phi(x1, x2, …, xk), where Phi is Σ0.

Some examples of Σ1 sentences:

∃x ∃y (x⋅x = y)
∃x (x⋅x = 5)
∃x ∀y < x (x+y > x⋅y)

And some Π1 sentences:

∀x (x + 0 = x)
∀x ∀y (x + y < 10)
∀x ∃y < 10 (y⋅y + y = x)

We can translate unbounded quantifiers as while loops:

def A(phi):
    x = 0
    while True:
        if not phi(x):
            return False
        x += 1

def E(phi):
    x = 0
    while True:
        if phi(x):
            return True
        x += 1

There’s a radical change here from the bounded case, which is that these functions are no longer guaranteed to terminate. A(Φ) never returns True, and E(Φ) never returns False. This reflects the nature of unbounded quantifiers. An unbounded universal quantifier is claiming something to be true of all numbers, and thus there are infinitely many cases to be checked. Of course, the moment you find a case that fails, you can return False. But if the universally quantified statement is true of all numbers, then the function will have to keep searching through the numbers forever, hoping to find a counterexample. With an unbounded existential quantifier, all one needs to do is find a single example where the statement is true and then return True. But if there is no such example (i.e. if the statement is always false), then the program will have to search forever.

I encourage you to think about these functions for a few minutes until you’re satisfied that not only do they capture the unbounded universal and existential quantifiers, but that there’s no better way to define them.

Now we can quite easily translate our example sentences as programs:

## ∃x ∃y (x⋅x = y)
E(lambda x: E(lambda y: x * x == y))

## ∃x (x⋅x = 5)
E(lambda x: x * x == 5)

## ∃x ∀y < x (x+y > x⋅y)
E(lambda x: Aupto(x, lambda y: x + y > x * y))

## ∀x (x + 0 = x)
A(lambda x: x + 0 == x)

## ∀x ∀y (x + y < 10)
A(lambda x: A(lambda y: x + y < 10))

## ∀x ∃y < 10 (y⋅y + y = x)
A(lambda x: Elessthan(10, y * y + y == x))

The first is a true Σ1 sentence, so it terminates and returns True. The second is a false Σ1 sentence, so it runs forever. See if you can figure out if the third ever halts, and then run the program for yourself to see!

The fourth is a true Π1 sentence, which means that it will never halt (it will keep looking for a counterexample and failing to find one forever). The fifth is a false Π1 sentence, so it does halt at the first moment it finds a value of x and y whose sum is 10. And figure out the sixth for yourself!

The next level of the hierarchy involves alternating quantifiers.

Σ2 sentences: ∃x1 ∃x2 … ∃xk Φ(x1, x2, …, xk), where Φ is Π1.
Π2 sentences: ∀x1 ∀x2 … ∀xk Φ(x1, x2, …, xk), where Φ is Σ1.

So now we’re allowed sentences with a block of one type of unbounded quantifier followed by a block of the other type of unbounded quantifier, and ending with a Σ0 sentence. You might guess that the Python functions we’ve defined already are strong enough to handle this case (and indeed, all higher levels of the hierarchy), and you’re right. At least, partially. Try running some examples of Σ2 or Π2 sentences and see what happens. For example:

## ∀x ∃y (x > y)
A(lambda x: E(lambda y: x > y))

It runs forever! If we were to look into the structure of this program, we’d see that A(Φ) only halts if it finds a counterexample to Φ, and E(Φ) only halts if it finds an example of Φ. In other words A(E(Φ)) only halts if A finds out that E(Φ) is false; but E(Φ) never halts if it’s false! The two programs’ goals are diametrically opposed, and as such, brought together like this they never halt on any input.

The same goes for a sentence like ∃x ∀y (x > y): for this program to halt, it would require that ∀y (x > y) is found to be true for some value of x, But ∀y (x > y) will never be found true, because universally quantified sentences can only be found false! This has nothing to do with the (x > y) being quantified over, it’s entirely about the structure of the quantifiers.

No Turing machine can decide the truth values of Σ2 and Π2 sentences. There’s a caveat here, related to the semantic version of the arithmetic hierarchy. It’s often possible to take a Π2 sentence like ∀x ∃y (y + y = x) and convert it to a logically equivalent but Π1 sentence like ∀x ∃y<x (y + y = x). This translation works, because y + y = x is only going to be true if y is less than or equal to x. Now we have a false Π1 sentence rather than a false Π2 sentence, and as such we can find a counterexample and halt.

We can talk about a sentence’s essential level on the arithmetic hierarchy, which is the lowest level of the logically equivalent sentence. It’s important to note here that “logically equivalent sentence” is a cross-model notion: A and B are logically equivalent if and only if they have the same truth values in every model of PA, not just the standard model. The soundness and completeness of first order logic, and the recursive nature of the axioms of PA, tells us that the set of sentences that are logically equivalent to a given sentence of PA is recursively enumerable. So we can generate these sentences by searching for PA proofs of equivalence and keeping track of the lowest level of the arithmetic hierarchy attained so far.

Even when we do this, we will still find sentences that have no logical equivalents below Σ2 or Π2. These sentences are essentially uncomputable; not just uncomputable in virtue of their form, but truly uncomputable in all of their logical equivalents. However, while they are uncomputable, they would become computable if we had a stronger Turing machine. Let’s take another look at the last example:

## ∀x ∃y (x > y)
A(lambda x: E(lambda y: x > y))

Recall that the problem was that A(E(Φ)) only halts if E(Φ) returns False, and E(Φ) can only return True. But if we had a TM equipped with an oracle for the truth value of E(Φ) sentences, then maybe we could evaluate A(E(Φ))!

Let’s think about that for a minute more. What would an oracle for the truth value of Σ1 sentences be like? One thing that would work is if we could run E(Φ) “to infinity” and see if it ever finds an example, and if not, then return False. So perhaps an infinite-time Turing machine would do the trick. Another way would be if we could simply ask whether E(Φ) ever halts! If it does, then ∃y (x > y) must be true, and if not, then it must be false.

So a halting oracle suffices to decide the truth values of Σ1 sentences! Same for Π1 sentences: we just ask if A(Φ) ever halts and return False if so, and True otherwise.

If we run the above program on a Turing machine equipped with a halting oracle, what will we get? Now we can evaluate the inner existential quantifier for any given value of x. So in particular, for x = 0, we will find that Ey (x > y) is false. We’ve found a counterexample, so our program will terminate and return False.

On the other hand, if our sentence was true, then we would be faced with the familiar feature of universal quantifiers: we’d run forever looking for a counterexample and never find one. So to determine that this sentence is true, we’d need an oracle for the halting problem for this new more powerful Turing machine!

Here’s a summary of what we have so far:

TM = Ordinary Turing Machine
TM2 = TM + oracle for TM
TM3 = TM + oracle for TM2

The table shows what type of machine suffices to decide the truth value of a sentence, depending on where on the arithmetic hierarchy the sentence falls and whether the sentence is true or false.

We’re now ready to generalize. In general, Σn sentences start with a block of existential quantifiers, and then alternate between blocks of existential and universal quantifiers n – 1 times before ending in a Σ0 sentence. Πn sentences start with a block of universal quantifiers, alternates quantifiers n – 1 times, and then ends in a Σ0 sentence. And as you move up the arithmetic hierarchy, it requires more and more powerful halting oracles to decide whether sentences are true:

(TM = ordinary Turing machine, TMn+1 = TM + oracle for TMn)

If we define Σω to be the union of all the Σ classes in the hierarchy, and Πω the union of the Π classes, then deciding the truth value of Σω ⋃ Πω (the set of all arithmetic sentences) would require a TMω – a Turing machine with an oracle for TM, TM2, TM3, and so on. Thus the theory of true arithmetic (the set of all first-order sentences that are true of ℕ), is not only undecidable, it’s undecidable with a TM2, TM3, and TMn for every n ∈ ℕ. At every level of the arithmetic hierarchy, we get new sentences that are essentially on that level (not just sentences that are superficially on that level in light of their syntactic form, but sentences which, in their simplest possible logically equivalent form, lie on that level).

This gives some sense of just how hard math is. Just understanding the first-order truths of arithmetic requires an infinity of halting oracles, each more powerful than the last. And that says nothing about the second-order truths of arithmetic! That would require even stronger Turing machines than TMω – Turing machines that have halting oracles for TMω, and then TMs with oracles for that, and so on to unimaginable heights (just how high we must go is not currently known).

A Self-Interpreting Book

A concept: a book that starts by assuming the understanding of the reader and using concepts freely, and as you go on it introduces a simple formal procedure for defining words. As you proceed, more and more words are defined in terms of the basic formal procedure, so that halfway through, half of the words being used are formally defined, and by the end the entire thing is formally defined. Once you’re read through the whole book, you can start it over and read from the beginning with no problem.

I just finished a set theory textbook that felt kind of like that. It started with the extremely sparse language of ZFC: first-order logic with a single non-logical symbol, ∈. So the alphabet of the formal language consisted of the following symbols: ∈ ( ) ∧ ∨ ¬ → ↔ ∀ ∃ x ‘. It could have even started with a sparser formal language if it was optimizing for alphabet economy: ∈ ( ∧ ¬ ∀ x ‘ would suffice. As time passed and you got through more of the book, more and more things were defined in terms of the alphabet of ZFC: subsets, ordered pairs, functions from one set to another, transitivity, partial orders, finiteness, natural numbers, order types, induction, recursion, countability, real numbers, and limits. By the last chapter it was breathtaking to read a sentence filled with complex concepts and realize that every single one of these concepts was ultimately grounded in this super simple formal language we started with, with a finitistic sound and complete system of rules for how to use each one.

But could it be possible to really fully define ALL the terms used by the end of the book? And even if it were, could the book be written in such a way as to allow an alien that begins understanding nothing of your language to read it and, by the end, understand everything in the book? Even worse, what if the alien not only understands nothing of your language, but starts understanding nothing of the concepts involved? This might be a nonsensical notion; an alien that can read a book and do any level of sophisticated reasoning but doesn’t understand concepts like “and” and “or“.

One way that language is learned is by “pointing”: somebody asks me what a tree is, so I point to some examples of trees and some examples of non-trees, clarifying which is and which is not. It would be helpful if in this book we could point to simple concepts by means of interactive programs. So, for instance, an e-book where an alien reading the book encounters some exceedingly simple programs that they can experiment with, putting in inputs and seeing what results. So for instance, we might have a program that takes as input either 00, 01, 10, or 11, and outputs the ∧ operation applied to the two input digits. Nothing else would be allowed as inputs, so after playing with the program for a little bit you learn everything that it can do.

One feature of such a book would be that it would probably use nothing above first-order logical concepts. The reason is that the semantics of second-order logic cannot be captured by any sound and complete proof system, meaning that there’s no finitistic set of rules one could explain to an alien so that they know how to use the concepts involved correctly. Worse, the set of second-order tautologies is not even recursively enumerable (worse than the set of first-order tautologies, which is merely undecidable), so no amount of pointing-to-programs would suffice. First-order ZFC can define a lot, but can it define enough to write a book on what it can define?