Biblical inerrancy

A puzzling phenomenon is the existence of Biblical inerrantists. It seems to me to be impossible to have both (1) carefully read the Bible and (2) come to the conclusion that it’s inerrant. Bart Ehrman talks about a possible explanation for this phenomenon in the order in which people read the Gospels: if you read the Gospels all the way through, one at a time, rather than reading them simultaneously, side by side, then it’s much easier for you to fail to notice all of the discrepancies. And boy are there discrepancies!

Virtually no story that appears in multiple Gospels is identical in the different tellings. That’s not hyperbole; literally from the story of Jesus’s birth all the way to his crucifixion and resurrection, there are unambiguous contradictions to be found the entire way. I don’t think that these contradictions make it rationally impossible to be a Christian, but they certainly do make it rationally impossible to be a Christian of an inerrantist tradition. And for more liberal Christians, they face a serious challenge of how they justify placing such enormous stock in the wording of a text that is known to be error-ridden.

There are just too many examples of blatant contradictions to go through them all. It’s a remarkable fact that many Christians that have read these stories throughout their lives are completely unaware that they disagree with one another! What I want to do here is just to pick one of the most well-known stories, the empty tomb. As I go through the story as each Gospel tells it, if at any moment you feel skeptical of what I’m saying, just go look at the verses being cited for yourself! The source of all the following quotes is the New International Version (NIV).

I’ve copied the entire “empty tomb” story as it’s told in each of the Gospels and highlighted the differences.

Now, let’s test your reading comprehension! In the story of the empty tomb, how many women come to the tomb, one, two, or three? Depends on who you read! According to Mark, it was three (the two Marys and Salome). According to Matthew, it’s two (the two Marys). According to John it’s just Mary Magdalene. And according to Luke it’s some unspecified number more than 1.

How many entities do they encounter at the tomb, and are they ordinary men or angels? According to Mark, they see one young man already inside the tomb. Luke says that after they enter, two men suddenly appear beside them. Matthew describes a violent earthquake preceding the arrival of an angel from heaven, while still outside the tomb. And John describes two angels inside the tomb (seen by Mary from the outside of the tomb). What’s more, in John and Matthew the woman/women see and talks to Jesus at the tomb! You have to agree that this is a pretty noteworthy thing for Mark and Luke to leave out.

Ok, how about the stone in front of the entrance? When the women/woman arrive(s), is the stone already rolled away from the tomb (as in Mark, Luke, and John), or is it moved later (as in Matthew)?

When the one/two men/angels speak to the woman/women, do they say that Jesus will meet the disciples in Galilee? In Mark and Matthew, yes. But in Luke and John there’s no mention of this! And in fact, in Luke the disciples don’t go to Galilee to meet Jesus at all! Jesus appears to disciples in Jerusalem and tells them to stay there, which they do until the end of the gospel (Luke 24:51-53). He ascends right outside of Jerusalem (1:9,12)

When the woman/women leave the tomb, do they describe what they saw to the disciples or not? According to Luke, Matthew, and John, yes. But not according to Mark; in Mark, the women flee and in their fear “say nothing to anyone”!

Did Peter ever visit the tomb? Not in Matthew or Mark, but in Luke and John he does. Is he by himself or with another disciple? Matthew says he’s by himself, John describes another disciple with him.

So much for the empty tomb! This level of contradiction is not special to this story. Think about Jesus’s death. When did he die? This is one of the most blatant contradictions in the Bible, because both John and Mark take great pains to explicitly lay out their chronology. According to Mark, Jesus and his disciples have their last supper on the evening of the Passover (Mark 14:12-17), and the following morning he is taken to be crucified (Mark 15:1). In John there is no last supper! John explicitly states in John 19:14 that Jesus is taken away from crufixion on “the day of Preparation of the Passover”, that is, the day before Passover!

This business about the last supper is actually pretty interesting; in John there is no last supper, but the author still manages to fit in some of the dinner table discussion early in the narrative. In Matthew, Mark, and Luke, it is during the last supper that Jesus says that the bread is his body (Mark 14:22, Matthew 26:26, Luke 22:19) and the wine his blood (Mark 14:24, Matthew 26:28, Luke 22:20). In John, these things are said some 12 chapters before his arrest and crucifixion (John 6:32-58). The context is ENTIRELY different within John; there, it comes up after the miracle where he multiplies the loaves of bread and fish. His disciples talk to him about this miracle, and he responds with the famous line “I am the bread of life”. This upsets some of the Jews around him, and starts an argument about how Jesus can BE the bread, in response to which Jesus doubles down and says “I am the living bread that came down from heaven” and “Very truly I tell you, unless you eat the flesh of the Son of Man and drink his blood, you have no life in you.” Thus the origin of this doctrine. Given the importance that Catholics place on the wording here, it should be disturbing to them that there’s such dramatic disagreement between the Gospels on the context in which he said it!

Let’s now quickly run through some more Biblical contradictions. Who was the father of Joseph, Jesus’s father? According to Matthew 1:15, “Jacob begat Joseph, the husband of Mary”, whereas according to Luke 3:23, Joseph is said to be “the son of Heli”. The genealogies presented in Matthew and Luke are virtually in complete disagreement starting two generations up from Jesus. Apologists will often argue that one of the two is presenting the maternal lineage rather than the paternal line, but this is far from obvious when you look at the wording, which is specifically about Joseph’s father not Mary’s (plus the fact that in both genealogies, the entire rest of the list follows only the paternal line.)

In Mark 5:21-24, Jairus comes to Jesus before his daughter dies and asks him to heal her (“My little daughter is dying. Please come and put your hands on her so that she will be healed and live”), but in Matthew 9:18-20 the daughter has already died by the time Jairus comes to Jesus (“My daughter has just died. But come and put your hand on her, and she will live”).

In Mark 15:37-39, the curtain that separates the holy of holies from the rest of the temple rips in two after Jesus dies. And in Luke 23:45-46, it rips before.

In Matthew 2:1-23, Joseph Mary and Jesus flee to Egypt (250 miles away) after Jesus’s birth, where they stay until King Herod dies, after which they resettle in Nazareth. But in Luke 2:1-40, Joseph and the fam do their rites of purification in Bethlehem after birth and return to Nazareth directly, 41 days after Jesus’s birth.

Also in Luke 2, it is described that Joseph and Mary travel to Galilee for a census declared by decree by Caesar Augustus to be “taken of the entire Roman world.” The problem with this is that we have good historical records of Caesar Augustus, and no such census took place!

One final one: in Mark 2:25-26, Jesus references an Old Testament passage about David eating unlawfully eating consecrated bread “in the days of Abiathar the high priest.” There’s a big problem with this: Jesus made a mistake! In the Old Testament passage, Abiathar wasn’t the high priest! The high priest was Ahimelech, whose son Abiathar would much later become high priest (1 Sam 21.1-7). So Christians have a choice to make between either Jesus not knowing his Old Testament or Mark not being an inerrant recording of Jesus’s sayings.

✯✯✯

All these contradictions are begging for an explanation. Is one or more of the authors lying? Not necessarily. Lying implies intention, and it’s worth keeping in mind the timeline of the Bible. Jesus is purported to have lived from 0 to 30 AD. Scholars unanimously agree that the earliest of the Gospels is Mark, and that it was originally written around 70 AD. Next were Matthew and Luke, both around 80-85 AD, and finally came John, around 90-95 AD. That’s a gap of 40 to 65 years from the events that are described! What’s more, the authors of Mark, Matthew, and John were almost certainly not the actual historical Mark, Matthew, and John (for a bunch of reasons I won’t get into now, most obviously that these texts were written in Greek by highly educated individuals and all three of these individuals were uneducated and would not have known Greek). And of course, Luke wasn’t a disciple and never met Jesus personally.

So the first texts that are written are from non-eyewitnesses recording an oral tradition that had started forty to sixty-five years before! In a forty-year game of telephone, nobody needs to have lied in order for the original story to become warped beyond recognition. Anybody that doubts that stories can become so dramatically altered over time need only think about the many Trump supporters that to this day insist that Trump’s inauguration had more attendees than Obama’s, despite LITERAL TIME LAPSE FOOTAGE of the entire thing and photographs all throughout. In one poll, Trump and Clinton voters that were handed these two photos, one from Obama’s inauguration and the other from Trump’s:

And guess what? 15% of Trump voters said that the left photo has more people! Suffice it to say, in the presence of emotionally charged topics like religion and politics, human brains start to act funny. Put this in context: this is an event from four years ago that we have video records of. And it’s somehow supposed to be unimaginable that 40 years of word-of-mouth transmission by religious believers made any significant changes to the original stories?

It’s even worse than this. These first texts are not what our modern Gospels are based off of, simply because we don’t have any copies of these first texts! The first copies of the texts that we possess come over A HUNDRED YEARS LATER, meaning that we have more than a hundred years of scribes making copies of copies of copies of the original texts. We know for a fact that these scribes were not perfect copyists, from the thousands of copies of the Gospels we possess, which abound in mistakes as small as spelling errors to as large as entire missing stories or new stories that had never appeared before. I’m sure that you know the story of the adulteress who Jesus defends, saying “he that is without sin among you, let him first cast a stone at her.” Did you know that this story appears in none of our earliest copies of the Gospels? Scholars unanimously agree that this story was added by scribes hundreds of years after the original writing, both because it literally doesn’t appear in copies earlier than that and also because the writing style is different from the rest of John.

So it seems to me like there really is no mystery here once you learn about the actual history of the texts. There are contradictions in the Bible because the Bible is an extremely imperfect copy of a copy of a copy of a … copy of a text written by non-eyewitnesses that heard stories told by people who had heard stories told by… people who had heard stories told by eyewitnesses to the events.

Formal Semantics 1: Historical Prelude and Compositionality

English is really complicated. For a long time, logicians looking at natural languages believed that there could be no formal system detailing their grammar and semantics. They resigned themselves to extremely simple idealized fragments of English, like propositional logic (formalizing “and”, “not”, and “or”) and first-order logic (formalizing “every”, “some”, and “is”).

The slogan of the time was “ordinary language has no logic” (Bertrand Russell and Peter Strawson). Chomsky famously argued that the languages invented by logicians were too artificial and entirely unlike natural languages, and that therefore the methods of logicians simply couldn’t be applied to this more complex realm.

This attitude has changed over time. Perhaps the most important figure in the “logic of natural language” movement is Richard Montague, a student of the giant of logic Alfred Tarski. The first line of his paper English as a Formal Language reads “I reject the contention that an important theoretical difference exists between formal and natural languages”, and he follows this up by more or less single-handedly invented formal semantics, now a thriving field. Hilariously, Montague apparently saw this work as child’s play, writing:

I (…) sat down one day and proceeded to do something that I previously regarded, and continue to regard, as both rather easy and not very important — that is, to analyze ordinary language.

(This had to hit hard for linguists of his time.)

Alright, enough prologue. In the next few posts I want to describe a naive first pass at formalizing a fairly substantial fragment of English, modeled off of Montague semantics. The key concept throughout will be the notion of compositionality, which I’ll briefly describe now.

Compositionality

Compositionality is all about how to construct the meaning of phrases from their smaller components. Take a sentence like “The cat sat on the mat.” The meaning of this sentence clearly has something to do with the meanings of “the cat” and “sat on the mat”. Similarly, the meaning of “sat on the mat” must have something to do with the meanings of “sat”, “on”, “the”, and “mat”.

The compositionality thesis says that this is all that determines the meaning of “the cat sat on the mat.” In other words, the meaning of any phrase is a function of the meanings of the individual words within it. These meanings are composed together in some way to form the meaning of the sentence as a whole.

The natural question that arises now is, what is the nature of this composition? Take a very simple example: “Epstein died.” According to compositionality, the meaning of “Epstein died” depends only on the meanings of “Epstein” and “died”. That seems pretty reasonable. What about: “Epstein died suspiciously”? How do we compose the meanings of the individual words when there are three?

One proposal is to compose all three simultaneously. That’s possible, but a simpler framework would have us build up the meanings of our sentences iteratively, composing two units of meaning at a time until we’ve generated the entire sentence’s meaning.

Let me now introduce some notation that allows us to say this compactly. If X is some word, phrase, or sentence, we’ll denote the meaning of X as ⟦X⟧. Then the principle of binary compositionality is just that there’s some function F such that ⟦X Y⟧ = F(⟦X⟧, ⟦Y⟧).

There’s two major questions that arise at this point.

First, in which order should we compose our units of meaning? Should we combine “Epstein” with “died” first, and then combine that with “suspiciously”? Or should it be “Epstein” and “suspiciously” first, then that with “died”? Or should we combine “Epstein” with the combination of “suspiciously” and “died”?

One might suggest here that the order actually doesn’t matter; no matter what order we combine the meanings in, we should still get the same meaning. The problem with this is that “The Clintons killed Epstein” has a different meaning than “Epstein killed the Clintons.” If order of composition didn’t matter, then we’d expect these to mean the same thing.

Second, how exactly does composing two meanings work? Is there a single rule for composition, or are there multiple different rules that apply in different contexts? It would be most elegant if we could find a single universal rule for generating meanings of complicated phrases from simple ones, but maybe that’s overambitious.

For instance, you might model the meaning of “died” as a set of objects, namely all those objects that died at some moment in the past, and the meaning of “Epstein” as one particular object in the universe. Then we might have our composition rule be the following: ⟦Epstein died⟧ will be a truth value, and it will be True if and only if the object denoted by “Epstein” is within the set of objects denoted by “died”. So in this framework, ⟦X Y⟧ = True if and only if ⟦X⟧ ∈ ⟦Y⟧.

This works nicely for “Epstein died”. But what about “Epstein died suspiciously”? Now we have two compositions to do, and the order of composition will matter. The problem is that no matter how we compose things, it seems not to work. Suppose that we combine “died” and “suspiciously” first, then combine “Epstein” with that. Using our model, ⟦died suspiciously⟧ will be True if and only if ⟦died⟧ ∈ ⟦suspiciously⟧, which is already a little bit weird. But even worse, ⟦Epstein died suspiciously⟧ will be True if and only if ⟦Epstein⟧ ∈ ⟦died suspiciously⟧. But what would it mean for the object denoted by “Epstein” to be an element of a truth value? It looks like in this framework, most three-word sentences end up becoming vacuously false.

Anyway, the last two paragraphs only show us that one particular attempt to formalize composition fails to be universal. It doesn’t show that it’s impossible in general. In fact, we’ll end up doing pretty well with a small set of composition rules centered around function application. The idea can be very simply phrased as: ⟦X Y⟧ = ⟦X⟧(⟦Y⟧). And in particular, the meaning of “Epstein died suspiciously” will be ⟦suspiciously⟧(⟦died⟧)(⟦Epstein⟧). And that’s enough warm-up! Next we’ll explore this idea further and dive into our Montague-style system.

Two more short and sweet proofs of propositional compactness

Proof 1 (countable language)

Let S be a countable set of sentences in a propositional language with atomic sentences p0, p1, p2, ….  Assume that S is finitely satisfiable. We want to build a truth assignment V that satisfies S.

We’ll assign truth values to the atomic sentences one at a time. Vn will be the partial truth assignment after assigning the first n truth values. If W agrees with Vn on its domain, then call W an extension of Vn.

We’ll now prove by induction that for each n, every finite subset of S is satisfied by some extension of Vn.

First of all, V0 is just the empty set, and every function is an extension of the empty set. So the hypothesis follows trivially from the finite satisfiability of S. 

Now, assume that every finite subset of S is satisfied by some extension of Vn. We’ll show that the same holds of Vn+1.

Suppose not. Then both of the following must be true:
(1) Some finite subset S0 of S is not satisfied by any extension of Vn ⋃ {(pn, T)}
(2) Some finite subset S1 of S is not satisfied by any extension of Vn ⋃ {(pn, F)}

S0 ⋃ S1 is not satisfied by any extension of Vn ⋃ {(pn, T)}, or any extension of Vn ⋃ {(pn, F)}. So it’s not satisfied by any extension of Vn, contradicting the inductive hypothesis since S0 ⋃ S1 is a finite set. This proves that every finite subset of S is satisfied by some extension of Vn, for each n.

Define V = U {Vn : n ∈ ω}. We now show that V satisfies S. Consider any sentence φ ∈ S. φ contains a finite number of atomic sentences, so there’s some n large enough that Vn assigns truth values to all sentence letters in φ.  {φ} is a finite subset of S, so some extension of Vn satisfies it. Every extension of Vn agrees on the assignments to the atomic sentences that appear in φ.  So since some extension of Vn makes φ true, all extensions of Vn must make φ true.  In particular, V makes φ true.

Proof 2 (any language)

Suppose S is a finitely satisfiable set of sentences, and let L be the set of atomic sentences.

A partial function g: L → {T, F} is called good if each finite subset of S is satisfied by some extension of g.

The good partial functions are a poset under ⊆, and since S is finitely satisfiable, ∅ is good. Furthermore, the union of any chain of good functions is a good function. Thus every chain has an upper bound, namely its union.

By Zorn’s lemma, there’s a maximal good function g. Since g is maximal, the domain of g is all of L.

We now show that g satisfies every element of S.
… Suppose φ ∈ S.
… Since g is good, it has an extension satisfying φ.
… But g already has all of L as its domain, so g satisfies φ.

So S is satisfiable!

The Theory of Knots

Knots

A knot is a closed loop in three dimensional space that doesn’t intersect itself. Some examples:

Knot diagrams are what you just saw: two dimensional representations of the three dimensional objects. It should be obvious that any one knot has many different diagrams. This raises an immediate question: given two knot diagrams, do they represent the same knot?

We need to be clear on what we mean by “the same knot”, of course. The intuition here is fairly obvious: you can rotate, stretch, and wiggle any section of a knot without fundamentally changing it.

What you CANNOT do is cut and reconnect the knot or pass one section of a knot through another of its sections.

Looking back at our starting five example knots, ask yourself if any of them represent the same knot:

It might not be too obvious, but 2 and 4 represent the same knot, and so do 1 and 5! These equivalences (especially the equivalence between 1 and 5) should hopefully give a flavor for the difficulty of the general problem of determining equivalence between knots. While it’s sometimes possible to answer this by mere examination, it’s more often impossible. Even in cases where it seems very intuitively obvious (like, for instance, that the trefoil is not just an unknot in disguise), it’s hard to come up with a way to PROVE this fact.

When faced by a hard problem, we should begin thinking of the structure of a general solution. What we’d really like is a recipe for an algorithm that takes in two knot diagrams and decides whether they represent the same knots. (For economy of space and ease of reading, I’ll proceed to refer to “knot diagrams that represent equivalent knots” as “equivalent knot diagrams”.)

Here’s a naive first pass at a solution: Take either of the two knot diagrams you’re given. Start wiggling it around in all the allowed ways. At each step, check if you’ve obtained the other knot diagram. If you ever do, return True. Otherwise return False.

Maybe you see the problem here. In fact there are at least two major problems. (I said it was a naive first pass.)

The first big problem is in the “otherwise return False”. At what point in the algorithm do you actually reach this “otherwise” clause? Assuming that the two knot diagrams really do represent different knots, then you can keep manipulating the starting knot diagram in more and more complicated ways forever. At each stage you might get original knot diagrams, and at no stage are you justified in giving up and returning False. In other words, even if this algorithm worked as I described it, it could only ever decide equivalence of two equivalent knot diagrams. If handed two inequivalent knot diagrams, it would run forever. Thus this is really an algorithm for semi-deciding knot-diagram equivalence, not deciding it.

The second big problem is in the “start wiggling it around in all the allowed ways.” How would we describe the set of allowed transformations of a knot diagram to a computer? In particular, could we simplify down the allowed transformations to a finite set that would ensure that we really had done ALL possible manipulations? The answer to this is obviously no. Consider the following transformation:

Each transformation like this produces a new knot diagram, simply by wiggling one side in an arbitrary way. But how many possible wiggles are there? Clearly an uncountable infinity: you can do arbitrarily small disruptions centered on any point along the knot, leaving the rest undisturbed, and get a distinct knot diagram each time.

The “uncountable infinity” aspect of this second problem can actually be quickly resolved by sharpening our original question. Rather than regarding a knot diagram as any two-dimensional projection of a knot, we can think about a knot diagram as defined by its various crossings. For instance, we can label each uninterrupted “arc” in a knot diagram, as well as each crossing point.

We now produce an alternative description of knot diagrams, obtained by describing how each arc interacts with each crossing point.

Crucially, our new description is not affected by the uncountably many small wiggles that do not make any substantial changes to the knot diagram. I.e., they do not affect the way that the various arcs cross under or above each other. We call this “equivalence of knot diagrams up to planar isotopy”, where planar isotopy is meant to refer to changes in a knot that don’t affect the structure of the crossings.

That’s one helpful step towards fixing up our algorithm. However, we are still in need of a finite set of transformations that we are sure generates ALL possible transformations on knot diagrams. I encourage you to think about this problem for a moment, and come up with a set of transformations to knot diagrams that you think are sufficient to produce knot diagrams.

(…)

(…)

I’ll leave a little space for you to try to come up with an answer for yourself. It’s not an easy exercise, but might be fun to test your knot-theoretic intuitions on.

(…)

(…)

Reidemeister Moves

It turns out that a set of six very simple transformations suffices for this task, discovered by Kurt Reidemeister. There are three categories of transformations:

If two knot diagrams are equivalent, then there is a sequence of Reidemeister moves that transforms one to the other (up to planar isotopy).

It’s pretty fun to try to construct such sequences for oneself, and in doing so you’ll develop an intuition for why repeated Reidemeister moves suffice to generate all transformations. Try the earlier example of the two diagrams for the trefoil! I can do it in ten moves, can you do better?

So, this is great news! Our naive first attempt at building an algorithm to determine equivalence can be implemented (though it still only semi-decides equivalence rather than deciding it). Choose one of the diagrams, and run through all finite sequences of Reidemeister moves in any order. At each stage we check if we get the other diagram by application of these moves, and if so return True.

So with the help of the Reidemeister moves, we’ve successfully semidecided the equivalence problem. But it turns out that we can do better. Reidemeister moves not only allow us to determine that two diagrams are equivalent, but they lead us to a powerful tool to determine that two diagrams are inequivalent. The key concept here is invariants.

Invariants

Imagine that we found some quantity Q, calculated from a knot diagram, such that no matter how we alter the knot diagram, that quantity never changes. Now if we’re given two knot diagrams D1 and D2, and find that Q(D1) ≠ Q(D2), then we have conclusively demonstrated the inequivalence of the two diagrams! Why? Because if there was some way to transform D1 into D2, and since every transformation holds fixed Q(D1), then Q(D2) would have to equal Q(D1)!

The hard part, of course, is finding invariants. But this is much easier when we have the help of Reidemeister moves! To show that Q(D) is an invariant quantity, all we need to do is show that the value of Q doesn’t change when we apply any Reidemeister move to the diagram D. This basically amounts to checking six simple cases, which is not too bad.

So, let’s go to our first invariant quantity: three-colorability.

3-Colorability

Three-colorability is quite simple: a knot is three-colorable if it’s possible to color each knot such that at each crossing either all three arcs are the same color, or all three are different.

Notice that any knot can be trivially 3-colored by just letting every arc be the same color; to avoid this we add the requirement that for a 3-coloring to be valid, it must use at least two distinct colors.

It’s a fun exercise to show that this quantity is invariant using Reidemeister moves. The simplest case is the twist. Essentially all we need to do is show that if we apply a twist to a 3-colorable diagram, we can still 3-color the new diagram, AND that there’s a way to “untwist” a 3-colorable diagram so that we still retain 3-colorability. The proof is actually quite trivial; note that the crossing at any twist must be monochrome, since two of the involved arcs are actually the same.

Try convincing yourself that 3-colorabiity is preserved under the other Reidemeister moves!

So, 3-colorability is an invariant. There’s a very nice consequence of this. The trefoil is three-colorable, and the unknot (having only one arc) is not.

So this proves that the trefoil can not be unknotted! This was probably intuitively obvious from the outset, but we can now rest comfortably in the certainty of a mathematical proof.

p-Colorability

A fun exercise: try to 3-color the figure-eight knot, and see why it’s not possible.

This shows that the figure-eight knot is not equivalent to the trefoil. But what about the unknot? Neither the unknot and the figure-eight knot are 3-colorable, and yet they are not equivalent! So 3-colorability, while quite cool, isn’t the whole story. However, there’s a nice generalization of 3-colorability to p-colorability for any prime p.

Think about colors as numbers. So our set of three colors from the last section will just be {0, 1, 2}. A coloring is then a function mapping arcs to {0, 1, 2}. This gives us an algebraic way of expressing the 3-colorability property:

If x, y, and z are the colors of the three arcs connecting at a crossing, with z being the arc that passes above the crossing, then 2z – x – y = 0 (mod 3). It’s easily checkable that this is equivalent to the condition that each crossing is either monochrome, or else the three arcs are different colors.

We generalize this to any prime p as follows:

Our set of colors will be {0, 1, …, p – 1}. If for each crossing, 2z – x – y = 0 mod p, and at least two colors are used, then the knot is p-colorable.

For each prime p, p-colorability is an invariant, meaning that we have now gone from one invariant quantity to a countable infinity of invariants! We also can now distinguish the figure-eight knot from the unknot: the first is 5-colorable, while the unknot is not.

Five-coloring the figure-eight – check for yourself that the algebraic condition (2z – x – y = 0 mod 5) is satisfied

An exercise: try 5-coloring the trefoil. Is it possible?

However, this is not the end of the story. For one thing, it’s not so easy to determine if an arbitrary knot is p-colorable. It’s doable, sure, by trying all the possible colorings, but we want a quicker method. This will be addressed in the next section.

Much more importantly, we’ll see in the next section that there are knot diagrams that have the same colorability profile (i.e. D1 is p-colorable iff D2 is p-colorable, for each p), and yet represent different knots. So we need an even more powerful invariant to discriminate these from one another.

Knot Determinant

Take any knot and label its arcs and crossings with natural numbers.

Then construct a matrix M, where the (i, j) component corresponds to arc i and crossing j. If arc i isn’t involved in crossing j, then Mij = 0. If arc i is involved in crossing j, and passes over it, then Mij = 2. And if arc i passes under crossing j, then Mij = -1.

Now, cross out any row and column of your choice, and take the determinant of the remaining matrix. Take its absolute value, and you have the knot determinant.

You know what I’m going to say next: the knot determinant is an invariant! Not only is it an invariant, but it also fully determines the colorability profile of a knot! The rule is that if a knot has knot determinant N, then for any prime p > 2, N is p-colorable if and only if p divides N.

For instance, say we calculate that a knot has determinant 33. We immediately conclude that the knot is 3-colorable, 11-colorable, and not p-colorable for any other prime.

This is pretty incredible. We now have a simple deterministic procedure for looking at a knot diagram and producing a number that gives us the entire colorability profile of the knot. But wait, there’s more!

Suppose we have two knot diagrams, one with determinant 15, and the other with determinant 45. These two indicate the same colorability profile: 3-colorable, 5-colorable, and nothing else. But since the determinants are different, the two knots are not equivalent! In other words, any two knots with different determinants that have all the same prime factors (but with possibly different powers of those prime factors) will have the exact same colorability profile, and will nonetheless be proven inequivalent by the knot determinant!

We’ve now reached the end of this dive into knot theory, but only because we’ve begun to near the end of my knowledge of the subject. By no means does the theory of knots end here. The knot determinant, amazing though it is, still does not fully pin down the different knots up to equivalence. There are distinct knots with the same determinant. If we were to dive further, we’d next look into a polynomial invariant, the Alexander polynomial. Going deeper, there’s hyperbolic invariants, involving the complement of a knot in hyperbolic space. Then there’s links, which are structures consisting of multiple knots linked together in some way, which have their own theory and periodic table. And we can even consider knotting spheres in four-dimensional space, or in general n-dimensional spheres in n+1-dimensional space! Plus there’s the operation of the knot sum, which we can use to combine distinct knots and get a new one. This operation is commutative and associative, and gives rise to the notion of prime and composite knots. And the coolness just keeps going on from here!

If nothing else, I hope I’ve convinced you that the subject of knots is of great mathematical interest and inspired you to look into it yourself. I’ll leave you with the first few rows of a (necessarily infinite) periodic table of knots, showing the prime knots up to seven crossings:

A Compact Proof of the Compactness Theorem

I’ve written up a proof of the compactness theorem before, but I recently looked it over and think that the proof can be expressed more, heh heh, compactly, than before.

So, what is the compactness theorem? It is the statement that if a set of statements is finitely satisfiable (each of its finite subsets has a model), then it’s satisfiable. The converse of this (a satisfiable set of statements is also finitely satisfiable) is trivial: a model of a set of sentences is also a model of every subset of that set. The following proof will be for propositional logic, but can be easily extended to first order logic.

The short version

Suppose that a set of sentences A is finitely satisfiable. We’ll extend A to a larger set B by giving A an “opinion” on every sentence. We can build up this extension in a series of stages: B0 is just A. Now, take any sentence a. If B0 ⋃ {a} is finitely satisfiable, define B1 to be B0 ⋃ {a}. Otherwise, define B1 to be B0 ⋃ {¬a}. Either way, B1 will be finitely satisfiable, because B0 cannot be inconsistent with both a and ¬a. When we’ve gone through every sentence, we take the union of all these extensions to form B. B is finitely satisfiable, since every finite subset of B is also a finite subset of one of the extensions that make it up. Now, we define a truth assignment V that assigns True to each propositional variable p if and only if p is in B. V satisfies B (which you can show by induction on sentences), and since A is a subset of B, V also satisfies A. So A is satisfiable.

The long(er) version

Suppose that a set of sentences A is finitely satisfiable. If a set of sentences A’ is finitely satisfiable, then for any sentence a, at least one of A’ ⋃ {a} and A’ ⋃ {¬a} is finitely satisfiable. If neither were, then we’d have two finite sets, one that entails ¬a and the other that entails a, and the union of these would be an unsatisfiable finite subset of A’.) So given any well-ordering of the sentences of the language, we can extend A one sentence at a time, maintaining finite satisfiability at each step. The union of all these extensions, call it B, is still finitely satisfiable, because any finite subset of B is also a finite subset of one of the extensions. B is also complete: for any sentence b either b ∈ B or ¬b ∈ B.

Now, define the truth assignment V as follows: for any atomic sentence b, V(b) = True iff b ∈ B. V satisfies B, which we prove by induction on sentences:

  • If b is an atomic sentence, then V(b) = True iff b ∈ B, by construction.
  • If b = ¬c for some c (for which V(c) = True iff c ∈ B), then V(b) = True iff V(c) = False iff c ∉ B iff b ∈ B.
  • If b = c ∧ d for some c and d that satisfy the induction hypothesis, then V(b) = V(c ∧ d) = True iff V(c) = V(d) = True iff c ∈ B and d ∈ B. If both c and d are in B, then ¬(c ∧ d) can’t be in B by finite satisfiability, so c ∧ d ∈ B by completeness of B. And if c ∧ d ∈ B, then neither ¬c nor ¬d can be in B by finite satisfiability. So c and d are both in B by completeness of B.

This proof by induction covers the connectives ¬ and ∧, with which you can express all other connectives, so this shows that V satisfies B. And since B is a superset of A, V satisfies A as well. This shows that any finitely satisfiable set of sentences is also satisfiable. Note that we used the axiom of choice implicitly with the instruction to well-order the language. As the language can be any size, this is equivalent to the well-ordering principle, which is equivalent in ZF to the axiom of choice. The same sort of assumption arises in the proof of the completeness theorems for first-order and propositional logics. If you reject choice, then you should also be skeptical that propositional logic and first-order logics are complete!

ZFC, and getting the right answer by accident

There’s something I’m confused about with regards to ZFC. We already know that no first order theory allows you to perfectly capture the semantics of concepts like the natural numbers or finitely many. The best we can do if restricted to first-order logic (which is where most of modern mathematics tries to stay) is to have theories with surrogates for those concepts.

For instance, there’s this object called ω that stands in for the natural numbers in ZFC, even though there are all these models of ZFC in which ω contains many more objects besides the natural numbers (in fact, uncountably many in some models). And “X is finite” is replaced by “there’s a bijection from X to an element of ω.”

In formalizing these surrogate concepts, we make sure to not include any false statements about the concepts, which gives us a type of soundness of our surrogate concept. I.e., ZFC isn’t going to prove things about ω that are false of the set of natural numbers, because in one model ω is the set of natural numbers.

But it doesn’t give us completeness. We aren’t going to be able to prove ALL the true first-order sentences about the natural numbers, or about the concept of finiteness. (This is of course just a product of the first-order nature of the theories in which we’re defining these surrogates.)

So in each of these cases there will be true statements involving the concept that our theory will be agnostic about. We can look for “really good” surrogates, surrogates which allow us to prove most of the important and interesting true statements about the concept, and only fail in very abstract and unusual settings. The degree to which we can make good surrogates is the degree to which a first-order thinker can make sense of and usefully apply non-first-order concepts. (A first-order thinker being one that has no built-in understanding of non-first-order concepts.)

So one general question is: how good is a given surrogate? And another is, how do we know based on the axioms how good of a surrogate we’ll be getting? This is the thing I’m confused about.

In ZFC, there’s this weird phenomenon of the theory “getting the right answers accidentally.” It’s a little tough to put into words, but here’s an example:

ZFC can prove that |P(X)| > |X| for all X. So for instance ZFC can prove that |P(ω)| > |ω|. Meaning that ZFC can prove that the power set of the naturals is uncountable. But ZFC has countable models! (As does every first-order theory.) In those models, the power set of the naturals is NOT uncountable.

First order logic is sound, so what’s going on ISN’T that ZFC is proving a sentence that’s false in some of its models. It’s that the sentence is false in that model, if interpreted to be about the actual concept, and true in that model if interpreted to be about the surrogate concept. The surrogate for “P(ω) is uncountable” in ZFC is “there’s no bijection from P(ω) to ω”. And in the countable models of ZFC, the bijection that would prove the equinumerosity of ω and P(ω) is missing! So in those models, it’s actually TRUE that “there’s no bijection from P(ω) to ω”, even though P(ω) and ω really do have the same number of elements.

This is the sort of subtle nonsense you get used to in mathematical logic. Two accidents cancel out: first ZFC makes the mistake of having models where P(ω) is countable, and second it makes the mistake of losing track of the bijection from P(ω) and ω. And as a result, ZFC is able to prove the correct thing.

This seems really weird to me, and it’s an unsettling way for a supposed foundations of mathematics to be getting the right answers. This type of thing happens a lot, and it feels like we keep getting “lucky” in that our unintended interpretations of a sentence interfere with each other and cancel out their problematic features. It makes me wonder why we should be confident that ZFC will continue giving us the right answers (as opposed to being agnostic on important basic questions). And in fact we do have some examples of important and basic questions that ZFC is agnostic on, most dramatically that if |X| < |Y|, then |P(X)| < |P(Y)|!

It’s not that I doubt that ZFC’s surrogates for non-first order concepts end up allowing us to prove an enormous amount of the true facts about these concepts, because they clearly do. But is there some principled reason we can give for WHY the axioms of ZFC lead us to such a robust framework for mathematics in which we can prove many true statements?

One thing that suggests this is the power of the axiom of replacement. It’s just an extremely strong and useful axiom that allows us to prove a whole lot. But that doesn’t seem to help explain the “right-by-accident” phenomenon. So what does?

Polish Notation and Garden-Path Sentences

Polish notation is a mathematical notation system that allows you to eliminate parentheses without ambiguity. It’s called “Polish” because the name of its Polish creator, Jan Łukasiewicz, was too difficult for people to pronounce.

A motivating example: Suppose somebody says “p and q implies r”. There are two possible interpretations of this: “(p and q) implies r” and “p and (q implies r)”. The usual way to disambiguate these two is to simply add in parentheses like I just did. Another way is to set an order-of-operations convention, like that “and” always applies before “implies”. This is what’s used in basic algebra, and what allows you to write 2 + 2 ⋅ 4 without any fear that you’ll be interpreted as meaning (2 + 2) ⋅ 4.

Łukasiewicz’s method was to make all binary connectives into prefixes. So “A and B” because “and A B”, “P implies Q” becomes “implies P Q”, and so on. In this system, “(p and q) implies r” translates to “implies and p q r”, and “p and (q implies r)” translates to “and p implies q r”. Since the two expressions are different, there’s no need for parentheses! And in general, no ambiguity ever arises from lack of parentheses when using Polish notation.

If this is your first time encountering Polish notation, your first reaction might be to groan and develop a slight headache. But there’s something delightfully puzzling about reading an expression written in Polish notation and trying to understand what it means. Try figuring out what this means: “implies and not p or q s r”. Algebra can be written in Polish notation just as easily, removing the need for both parentheses AND order-of-operations. “2 + 2 = 4” becomes “+ 2 2 = 4”, or even better, “= + 2 2 4”.

Other binary connectives can be treated in Polish notation as well, creating gems like: “If and you’re happy you know it clap your hands!” “When life is what happens you’re busy making plans.” “And keep calm carry on.” “Therefore I think, I am.” (This last one is by of the author the Meditations). Hopefully you agree with me that these sentences have a nice ring to them, though the meaning is somewhat obscured.

But putting connectives in front of the two things being connected is not unheard of. Some examples in English: “ever since”, “because”, “nonwithstanding”, “whenever”, “when”, “until”, “unless”. Each of these connects two sentences, and yet can appear in front of both. When we hear a sentence like “Whenever he cheated on a test the professor caught him”, we don’t have any trouble parsing it. (And presumably you had no trouble parsing that entire last sentence either!) One could imagine growing up in a society where “and” and “or” are treated the same way as “ever since” and “until”, and perhaps in this society Polish notation would seem much more natural!

Slightly related to sentential connectives are verbs, which connect subjects and objects. English places its verbs squarely between the subject and the object, as does Chinese, French, and Spanish. But in fact the most common ordering is subject-object-verb! 45% of languages, including Hindi, Japanese, Korean, Latin, and Ancient Greek, use this pattern. So for instance, instead of “She burned her hand”, one would say “she her hand burned”. This is potentially weirder to English-speakers than Polish notation; it’s reverse Polish notation!

9% of languages use Polish notation for verbs (the verb-subject-object pattern). These include Biblical Hebrew, Arabic, Irish, and Filipino. In such languages, it would be grammatical to say “Loves she him” but not “She loves him”. (3% of languages are VOS – loves him she – 1% are OVS – him loves she – and just a handful are OSV – him she loves).

Let’s return to English. Binary prepositions like “until” appear out front, but they also swap the order of the two things that they connect. For instance, “Until you do your homework, you cannot go outside” is the same as “You cannot go outside until you do your homework”, not “You do your homework until you cannot go outside”, which sounds a bit more sinister.

I came up with some examples of sentences with several layers of these binary prepositions to see if the same type of confusion as we get when examining Polish notation for “and” or “implies” sets in here, and oh boy does it.

Single connective
Since when the Americans dropped the bomb the war ended, some claimed it was justified.

Two connectives, unlayered
Since when the Americans dropped the bomb the war ended, when some claimed it was an atrocity others argued it was justified.

Still pretty readable, no? Now let’s layer the connectives.

One layer
Whenever he was late she would weep.
She would weep whenever he was late.

Two layers
Since whenever he was late she would weep, he hurried over.
He hurried over, since she would weep whenever he was late.

Three layers
Because since whenever he was late she would weep he hurried over, he left his wallet at home.
He left his wallet at home, because he hurried over since she would weep whenever he was late.

Four layers
Because because since whenever he was late she would weep he hurried over he left his wallet at home, when he was pulled over the officer didn’t give him a ticket.
The officer didn’t give him a ticket when he was pulled over, because he left his wallet at home because he hurried over since she would weep whenever he was late.

Five layers
When he heard because because since whenever he was late she would weep he hurried over he left his wallet at home, when he was pulled over the officer didn’t give the man a ticket, the mayor was outraged at the lawlessness.
The mayor was outraged at the lawlessness when he heard the officer didn’t give the man a ticket when he was pulled over because he left his wallet at home because he hurried over since she would weep whenever he was late.

Read that last one out loud to a friend and see if they believes you that it makes grammatical sense! With each new layer, things become more and more… Polish. That is, indecipherable. (Incidentally, Polish is SVO just like English). Part of the problem is that when we have multiple layers like this, phrases that are semantically connected can become more and more distant in the sentence. It reminds me of my favorite garden-path sentence pattern:

The mouse the cat the dog chased ate was digested.
(The mouse that (the cat that the dog chased) ate) was digested.
The mouse (that the cat (that the dog chased) ate) was digested.

The phrases that are meant to be connected, like “the mouse” and “was digested” are sandwiched on either side of the sentence, and can be made arbitrarily distant by the addition of more “that the X verbed” clauses.

Does anybody know of any languages where “and” comes before the two conjuncts? What about “or”? English does this with “if”, so it might not be too much of a stretch.

Defining uncountability

Most of the shocking results in mathematical logic have to do with how limited we are in terms of producing good deductive systems for mathematical concepts. (Good here means sound, complete, finitary, and computable.) No logic with a good deductive system can categorically define the natural numbers. Adding a “for finitely many” quantifier to first-order logic makes it impossible to have a good deductive system. Ditto for a “for all subsets of” quantifier. And so on.

Once you’ve accustomed yourself to the extreme logical limitations imposed on us, it comes as a bit of a shock when you realize that we still have the ability to describe ANY complicated mathematical concept. What I learned recently was that you can extend first-order logic by adding a “for uncountably many” quantifier, and still have a good deductive system for the logic! It says something that I was so impressed by this fairly moderate ability to distinguish between the countable and the uncountable. But it certainly does feel puzzling that while you can distinguish between the countable and the uncountable, you can’t distinguish between the finite and the infinite, which seems like a much more basic and fundamental division!

What’s even cooler is that the deductive system for this extended logic is extremely simple. It amounts to taking the axioms and inference rules of first-order logic, and then tacking on four additional axiom schemas. We’ll write “for uncountably many x, Φ(x)” as Ux Φ(x).

  1. ∀y∀z¬Ux (x = y ∨ x = z)
  2. ∀x (Φ(x) → Ψ(x)) → (Ux Φ(x) → Ux Ψ(x))
  3. Ux Φ(x) ↔ Uy Φ(y), where y isn’t free in Φ(x)
  4. Uy∃x Φ(x,y) → (∃xUy Φ(x,y) ∨ Ux∃y Φ(x,y))

Each of these axioms schemas is conceptually quite easy to make sense of. The first says that for any two elements y and z, there’s not uncountably many things that are each equal to one or the other. In other words, uncountable is bigger than two.

The second says that if the set of things that satisfy Φ is a subset of the set of things that satisfy Ψ, and uncountably many things satisfy Φ, then uncountably many things satisfy Ψ. In other words, if a set has an uncountable subset, then it must be uncountable itself.

The third is obvious enough that it doesn’t need explanation. The fourth, on the other hand, does. I find it helpful to think of Φ(x,y) as saying “x points at y.” Then Uy∃x Φ(x,y) becomes “uncountably many things are pointed at”, ∃xUy Φ(x,y) becomes “some person points at uncountably many things”, and Ux∃y Φ(x,y) becomes “there are uncountably many people pointing.”

So the fourth axiom just says that if uncountably many things are pointed at, then either somebody is pointing at uncountably many things, or there are uncountably many people pointing at things. Otherwise you’d only have countably many people pointing at countably many things each, and that’s not enough to get uncountably many things pointed at! So this fourth axiom can really be understood as saying that the union of countably many countable sets is countable.

✯✯✯

There’s a caveat to the above claims. This is that first-order logic extended by the “for uncountably many” quantifier only has a good deductive system IF the language is restricted to only allow countably many constant symbols. Here’s the reasoning:

First of all, if a logic has a sound, complete and finitary deductive system, then it is compact. Discussed here.

Second, if a logic L is compact and a set of sentences Σ in L has a countably infinite model, then Σ also has an uncountable model. Why? Simply append to L an uncountable set A of constant symbols, and add to Σ an uncountable set of sentences declaring that each of these constants refer to a distinct object. Now we have an uncountable model of Σ’s extension by compactness (the countably infinite model is a model of every finite subset of Σ’s extension), and this uncountable model must also satisfy Σ. (This is the only place where we make use of uncountably many constant symbols.)

Third, if a logic L is compact, and a set of sentences Σ in L has models of arbitrarily large finite size, then Σ has infinite models. The proof of this is virtually identical to the last paragraph; add infinitely many constant symbols and declare them all to refer to distinct objects. Every finite subset of this extension of Σ has a model, so by compactness the extension of Σ also has a model. This model is infinite by construction, and as a model of an extension of Σ must also be a model of Σ.

That sets up all the background model theory we need. Now, suppose that FOL+U had a good deductive system that applied even with uncountably many constant symbols. By soundness, completeness, and finiteness, FOL + U would be compact. Consider now the FOL+U theory T consisting of the single axiom ¬Ux (x = x). T cannot have uncountable models, by assumption that the quantifier Ux is really capturing the semantics of “for uncountably many x”.

But if T can’t have uncountable models, then it can’t have a countably infinite models either, by the second result above! So T doesn’t have any infinite models. And then by the third result, T cannot have models of arbitrarily large finite size! This means that asserting “there aren’t uncountably many objects”, we actually end up ruling out all countably infinite models, as well as all models larger than some finite size! This is inconsistent with the claim that the U quantifier is correctly capturing the semantics of “uncountably many”; after all, the negation of “for uncountably many” should be “for countably many”, not “for small enough finite”!

Summing up the argument:

  1. If a logic has a sound, complete, and finitary deductive system, then it is compact.
  2. If a logic L is compact, and a set of sentences Σ in L has a countable model, then Σ has an uncountable model.
  3. If a logic L is compact, and a set of sentences Σ in L has models of arbitrary finite size, then Σ has infinite models.
  4. FOL+U = First order logic + “for uncountably many” is sound, complete, and finitary.
  5. The FOL+U theory T = {¬Ux (x = x)} doesn’t have uncountable models.
  6. By 1 and 4, FOL+U is compact.
  7. By 2 and 6, if a set of sentences Σ in FOL+U has a countable model, then Σ has an uncountable model.
  8. Suppose T had countably infinite models. Then by 7, it would also have to have uncountable models, contradicting 5. So T doesn’t have countably infinite models.
  9. Suppose T had arbitrarily large finite models. Then by 3 and 6, T would have infinite models, contradicting 5 and 8. So T doesn’t have arbitrarily large finite models.
  10. Rephrasing 9, there’s some finite n such that T has no models of cardinality larger than n.

✯✯✯

What’s great about the logic FOL+U is that it allows you to talk about the property of uncountability, so long as your theories are all only countably large (which isn’t too bad a restriction). So for instance, consider the FOL+U theory consisting of the axioms of first-order Peano arithmetic plus one additional axiom: ¬Ux (x = x). This theory successfully rules out all uncountable models! This is something that you couldn’t do in ordinary first order logic, since it obeys the upward Lowenheim-Skolem property (a model of one infinite cardinality implies models of every larger infinite cardinality). The order type of all countable nonstandard models of PA is known (each is ω + ℤ⋅ℚ, a natural number line followed by countably many integer lines arranged like the rationals), and much of the difficulty of the model theory of PA involves the uncountable models. So this extension of PA is a big improvement over first-order PA!

Remember that ZFC has those pesky countable models, in which “the set of all subsets” turns out to be missing some subsets? We can get rid of all of those countable models with the FOL+U theory consisting of the axioms of ZFC + the additional axiom Ux (x = x). I’m pretty sure that this does not fix all the issues with correctly pinning down the notion of the power set — that would be too good to be true — but it is certainly an improvement!

Who would win in a fight, logic or computation?

Which paradigm is stronger? When I say logic here, I’ll be referring to standard formal theories like ZFC and Peano arithmetic. If one uses an expansive enough definition to include computability theory, then logic encompasses computation and wins by default.

If you read this post, you might immediately jump to the conclusion that logic is stronger. After all, we concluded that in first order Peano arithmetic one can unambiguously define not only the decidable sets, but the sets decidable with halting oracles, and those decidable with halting oracles for halting oracles, and so on forever. We saw that the Busy Beaver numbers, while undecidable, are definable by a Π1 sentence. And Peano arithmetic is a relatively weak theory; much more is definable in a theory like ZFC!

On the other hand, remember that there’s a difference between what’s absolutely definable and what’s definable-relative-to-ℕ. A set of numbers is absolutely definable if there’s a sentence that is true of just those numbers in every model of PA. It’s definable-relative-to-ℕ if there’s a sentence that’s true of just those numbers in the standard model of PA (the natural numbers, ℕ). Definable-relative-to-ℕ is an unfair standard for the strength of PA, given that PA, and in general no first order system, is able to categorically define the natural numbers. On the other hand, absolute definability is the standard that meets the criterion of “unambiguous definition”. With an absolute definition, there’s no room for confusion about which mathematical structure is being discussed.

All those highly undecidable sets we discussed earlier were only definable-relative-to-ℕ. In terms of what PA is able to define absolutely, it’s limited to only the finite sets. Compare this to what sets of numbers can be decided by a Turing machine. Every finite set is decidable, plus many infinite sets, including the natural numbers!

This is the first example of computers winning out over formal logic in their expressive power. While no first order theory (and in general no theory in a logic with a sound and complete proof system) can categorically define the natural numbers, it’s incredibly simple to write a program that defines that set using the language of PA. In regex, it’s just “S*0”. And here’s a Python script that does the job:

def check(s):
    if s == '':
        return False
    elif s == ‘0’:
        return True
    elif s[0] == ’s’:
        return check(s[1:])
    else:
        return False

We don’t even need a Turing machine for this; we can build a simple finite state machine:

In general, the sets decidable by computer can be much more complicated and interesting than those absolutely definable by a first-order theory. While first-order theories like ZFC have “surrogates” for infinite sets of numbers (ω and its definable subsets), these surrogate sets include nonstandard elements in some models. As a result, ZFC may fail to prove some statements about these sets which hold true of ℕ but fail for nonstandard models. For instance, it may be that the Collatz conjecture is true of the natural numbers but can’t be proven from ZFC. This would be because ZFC’s surrogate for the set of natural numbers (i.e. ω) includes nonstandard elements in some models, and the Collatz conjecture may fail for some such nonstandard elements.

On top of that, by taking just the first Turing jump into uncomputability, computation with a halting oracle, we are able to prove the consistency of ZFC and PA. Since ZFC is recursively axiomatizable, there’s a program that runs forever listing out theorems, such that every theorem is output at some finite point. We can use this to produce a program that looks to see if “0 = 1” is a theorem of ZFC. If ZFC is consistent, then the program will search forever and never find this theorem. But if ZFC is not consistent, then the program will eventually find the theorem “0 = 1” and will terminate at that point. Now we just apply our halting oracle to this program! If ZFC is consistent, then the halting oracle tells us that the program doesn’t halt, in which case we return “True”. If ZFC is not consistent, then the halting oracle tells us that the program does halt, in which case we return “False”. And just like that, we’ve decided the truth value of Con(ZFC)!

The same type of argument applies to Con(PA), and Con(T) for any T that is recursively axiomatizable. If you’re not a big fan of ZFC but have some other favored system for mathematics, then so long as this system’s axioms can be recognized by a computer, its consistency can be determined by a halting oracle.

Some summary remarks.

On the one hand, there’s a very simple explanation for why logic appears over and over again to be weaker than computation: we only choose to study formal theories that are recursively axiomatizable and have computable inference rules. Of course we can’t do better than a computer using a logical theory that can be built into a computer! If on the other hand, we chose true arithmetic, the set of all first-order sentences of PA that are true of ℕ, to be our de jure theory of mathematics, then the theorems of mathematics could not be recursively enumerated by any Turing machine, or indeed any oracle machine below 0(ω). So perhaps there’s nothing too mysterious here after all.

On the other hand, there is something very philosophically interesting about truths of mathematics that cannot be proven just using mathematics, but could be proven if it happened that our universe gave us access to an oracle for the halting problem. If a priori reasoning is thought to be captured by a formal system like ZFC, then it’s remarkable that there are facts about the a priori (like Con(ZFC)) that cannot possibly be established by a priori reasoning. Any consistency proof cannot be provided by a consistent system itself, and going outside to a stronger system whose consistency is more in doubt doesn’t help at all. The only possible way to learn the truth value of such statements is contingent; we can learn it if the universe contains some physical manifestation of a halting oracle!

Computing truth values of sentences of arithmetic, or: Math is hard

Previously I talked about the arithmetic hierarchy for sets, and how it relates to the decidability of sets. There’s also a parallel notion of the arithmetic hierarchy for sentences of Peano arithmetic, and it relates to the difficulty of deciding the truth value of those sentences.

Truth value here and everywhere else in this post refers to truth value in the standard model of arithmetic. Truth value in the sense of “being true in all models of PA” is a much simpler matter; PA is recursively axiomatizable and first order logic is sound and complete, so any sentence that’s true in all models of PA can be eventually proven by a program that enumerates all the theorems of PA. So if a sentence is true in all models of PA, then there’s an algorithm that will tell you that in a finite amount of time (though it will run forever on an input that’s false in some models).

Not so for truth in the standard model! As we’ll see, whether a sentence evaluates to true in the standard model of arithmetic turns out to be much more difficult to determine in general. Only for the simplest sentences can you decide their truth value using an ordinary Turing machine. And the set of all sentences is in some sense infinitely uncomputable (you’ll see in a bit in what sense exactly this is).

What we’ll discuss is a way to convert sentences of Peano arithmetic to computer programs. Before diving into that, though, one note of caution is necessary: the arithmetic hierarchy for sentences is sometimes talked about purely syntactically (just by looking at the sentence as a string of symbols) and other times is talked about semantically (by looking at logically equivalent sentences). Here I will be primarily interested in the entirely-syntactic version of the arithmetic hierarchy. If you’ve only been introduced to the semantic version of the hierarchy, what you see here might differ a bit from what you recognize.

Let’s begin!

The simplest types of sentences have no quantifiers at all. For instance…

0 = 0
2 ⋅ 2 < 7
(2 + 2 = 4) → (2 ⋅ 2 = 4)

Each of these sentences can be translated into a program quite easily, since +, ⋅, =, and < are computable. We can translate the → in the third sentence by converting it into a conjunction:

## (2 + 2 = 4) → (2 ⋅ 2 = 4)
not(2 + 2 == 4 and not 2 * 2 == 4)

Slightly less simple-looking are sentences with bounded quantifiers:

∀x < 10 (x + 0 = x)
∃x < 100 (x + x = x)
∀x < 5 ∃y < 7 (x > 1 → x⋅y = 12)
∃x < 5 ∀y < x ∀z < y (y⋅z ≠ x)

In each of these examples, the bounded quantifier could in principle be expanded out, leaving us with a finite quantifier-free sentence. This should suggest to us that adding bounded quantifiers doesn’t actually increase the computational difficulty.

We can translate sentences with bounded quantifiers into programs by converting each bounded quantifier to a for loop. The translation slightly differently depending on whether the quantifier is universal or existential:

def Aupto(n, phi):
    for x in range(n):
        if not phi(x):
            return False
    return True
def Elessthan(n, phi):
    for x in range(n):
        if phi(x):
            return True
    return False

Note that the second input needs to be a function; reflecting that it’s a sentence with free variables. Now we can quite easily translate each of the examples, using lambda notation to more conveniently define the necessary functions

## ∀x<10 (x + 0 = x)
Aupto(10, lambda x: x + 0 == x)

## ∃x<100 (x + x = x)
Elessthan(100, lambda x: x + x == x)

## ∀x<5 ∃y<7 ((x > 1) → (x*y = 12))
Aupto(5, lambda x: Elessthan(7, lambda y: not (x > 1 and x * y != 12)))

## ∃x<5 ∀y<x ∀z<y (y⋅z ≠ x)
Elessthan(5, lambda x: Aupto(x, lambda y: Aupto(y, lambda z: y * z != x)))

Each of these programs, when run, determines whether or not the sentence is true. Hopefully it’s clear how we can translate any sentence with bounded quantifiers into a program of this form. And when we run the program, it will determine the truth value of the sentence in a finite amount of time.

So far, we’ve only talked about the simplest kinds of sentences, with no unbounded quantifiers. There are two names that both refer to this class: Π0 and Σ0. So now you know how to write a program that determines the truth value of any Σ00 sentence!

We now move up a level in the hierarchy, by adding unbounded quantifiers. These quantifiers must all appear out front and be the same type of quantifier (all universal or all existential).

Σ1 sentences: ∃x1 ∃x2 … ∃xk Phi(x1, x2, …, xk), where Phi is Π0.
Π1 sentences: ∀x1 ∀x2 … ∀xk Phi(x1, x2, …, xk), where Phi is Σ0.

Some examples of Σ1 sentences:

∃x ∃y (x⋅x = y)
∃x (x⋅x = 5)
∃x ∀y < x (x+y > x⋅y)

And some Π1 sentences:

∀x (x + 0 = x)
∀x ∀y (x + y < 10)
∀x ∃y < 10 (y⋅y + y = x)

We can translate unbounded quantifiers as while loops:

def A(phi):
    x = 0
    while True:
        if not phi(x):
            return False
        x += 1

def E(phi):
    x = 0
    while True:
        if phi(x):
            return True
        x += 1

There’s a radical change here from the bounded case, which is that these functions are no longer guaranteed to terminate. A(Φ) never returns True, and E(Φ) never returns False. This reflects the nature of unbounded quantifiers. An unbounded universal quantifier is claiming something to be true of all numbers, and thus there are infinitely many cases to be checked. Of course, the moment you find a case that fails, you can return False. But if the universally quantified statement is true of all numbers, then the function will have to keep searching through the numbers forever, hoping to find a counterexample. With an unbounded existential quantifier, all one needs to do is find a single example where the statement is true and then return True. But if there is no such example (i.e. if the statement is always false), then the program will have to search forever.

I encourage you to think about these functions for a few minutes until you’re satisfied that not only do they capture the unbounded universal and existential quantifiers, but that there’s no better way to define them.

Now we can quite easily translate our example sentences as programs:

## ∃x ∃y (x⋅x = y)
E(lambda x: E(lambda y: x * x == y))

## ∃x (x⋅x = 5)
E(lambda x: x * x == 5)

## ∃x ∀y < x (x+y > x⋅y)
E(lambda x: Aupto(x, lambda y: x + y > x * y))

## ∀x (x + 0 = x)
A(lambda x: x + 0 == x)

## ∀x ∀y (x + y < 10)
A(lambda x: A(lambda y: x + y < 10))

## ∀x ∃y < 10 (y⋅y + y = x)
A(lambda x: Elessthan(10, y * y + y == x))

The first is a true Σ1 sentence, so it terminates and returns True. The second is a false Σ1 sentence, so it runs forever. See if you can figure out if the third ever halts, and then run the program for yourself to see!

The fourth is a true Π1 sentence, which means that it will never halt (it will keep looking for a counterexample and failing to find one forever). The fifth is a false Π1 sentence, so it does halt at the first moment it finds a value of x and y whose sum is 10. And figure out the sixth for yourself!

The next level of the hierarchy involves alternating quantifiers.

Σ2 sentences: ∃x1 ∃x2 … ∃xk Φ(x1, x2, …, xk), where Φ is Π1.
Π2 sentences: ∀x1 ∀x2 … ∀xk Φ(x1, x2, …, xk), where Φ is Σ1.

So now we’re allowed sentences with a block of one type of unbounded quantifier followed by a block of the other type of unbounded quantifier, and ending with a Σ0 sentence. You might guess that the Python functions we’ve defined already are strong enough to handle this case (and indeed, all higher levels of the hierarchy), and you’re right. At least, partially. Try running some examples of Σ2 or Π2 sentences and see what happens. For example:

## ∀x ∃y (x > y)
A(lambda x: E(lambda y: x > y))

It runs forever! If we were to look into the structure of this program, we’d see that A(Φ) only halts if it finds a counterexample to Φ, and E(Φ) only halts if it finds an example of Φ. In other words A(E(Φ)) only halts if A finds out that E(Φ) is false; but E(Φ) never halts if it’s false! The two programs’ goals are diametrically opposed, and as such, brought together like this they never halt on any input.

The same goes for a sentence like ∃x ∀y (x > y): for this program to halt, it would require that ∀y (x > y) is found to be true for some value of x, But ∀y (x > y) will never be found true, because universally quantified sentences can only be found false! This has nothing to do with the (x > y) being quantified over, it’s entirely about the structure of the quantifiers.

No Turing machine can decide the truth values of Σ2 and Π2 sentences. There’s a caveat here, related to the semantic version of the arithmetic hierarchy. It’s often possible to take a Π2 sentence like ∀x ∃y (y + y = x) and convert it to a logically equivalent but Π1 sentence like ∀x ∃y<x (y + y = x). This translation works, because y + y = x is only going to be true if y is less than or equal to x. Now we have a false Π1 sentence rather than a false Π2 sentence, and as such we can find a counterexample and halt.

We can talk about a sentence’s essential level on the arithmetic hierarchy, which is the lowest level of the logically equivalent sentence. It’s important to note here that “logically equivalent sentence” is a cross-model notion: A and B are logically equivalent if and only if they have the same truth values in every model of PA, not just the standard model. The soundness and completeness of first order logic, and the recursive nature of the axioms of PA, tells us that the set of sentences that are logically equivalent to a given sentence of PA is recursively enumerable. So we can generate these sentences by searching for PA proofs of equivalence and keeping track of the lowest level of the arithmetic hierarchy attained so far.

Even when we do this, we will still find sentences that have no logical equivalents below Σ2 or Π2. These sentences are essentially uncomputable; not just uncomputable in virtue of their form, but truly uncomputable in all of their logical equivalents. However, while they are uncomputable, they would become computable if we had a stronger Turing machine. Let’s take another look at the last example:

## ∀x ∃y (x > y)
A(lambda x: E(lambda y: x > y))

Recall that the problem was that A(E(Φ)) only halts if E(Φ) returns False, and E(Φ) can only return True. But if we had a TM equipped with an oracle for the truth value of E(Φ) sentences, then maybe we could evaluate A(E(Φ))!

Let’s think about that for a minute more. What would an oracle for the truth value of Σ1 sentences be like? One thing that would work is if we could run E(Φ) “to infinity” and see if it ever finds an example, and if not, then return False. So perhaps an infinite-time Turing machine would do the trick. Another way would be if we could simply ask whether E(Φ) ever halts! If it does, then ∃y (x > y) must be true, and if not, then it must be false.

So a halting oracle suffices to decide the truth values of Σ1 sentences! Same for Π1 sentences: we just ask if A(Φ) ever halts and return False if so, and True otherwise.

If we run the above program on a Turing machine equipped with a halting oracle, what will we get? Now we can evaluate the inner existential quantifier for any given value of x. So in particular, for x = 0, we will find that Ey (x > y) is false. We’ve found a counterexample, so our program will terminate and return False.

On the other hand, if our sentence was true, then we would be faced with the familiar feature of universal quantifiers: we’d run forever looking for a counterexample and never find one. So to determine that this sentence is true, we’d need an oracle for the halting problem for this new more powerful Turing machine!

Here’s a summary of what we have so far:

TM = Ordinary Turing Machine
TM2 = TM + oracle for TM
TM3 = TM + oracle for TM2

The table shows what type of machine suffices to decide the truth value of a sentence, depending on where on the arithmetic hierarchy the sentence falls and whether the sentence is true or false.

We’re now ready to generalize. In general, Σn sentences start with a block of existential quantifiers, and then alternate between blocks of existential and universal quantifiers n – 1 times before ending in a Σ0 sentence. Πn sentences start with a block of universal quantifiers, alternates quantifiers n – 1 times, and then ends in a Σ0 sentence. And as you move up the arithmetic hierarchy, it requires more and more powerful halting oracles to decide whether sentences are true:

(TM = ordinary Turing machine, TMn+1 = TM + oracle for TMn)

If we define Σω to be the union of all the Σ classes in the hierarchy, and Πω the union of the Π classes, then deciding the truth value of Σω ⋃ Πω (the set of all arithmetic sentences) would require a TMω – a Turing machine with an oracle for TM, TM2, TM3, and so on. Thus the theory of true arithmetic (the set of all first-order sentences that are true of ℕ), is not only undecidable, it’s undecidable with a TM2, TM3, and TMn for every n ∈ ℕ. At every level of the arithmetic hierarchy, we get new sentences that are essentially on that level (not just sentences that are superficially on that level in light of their syntactic form, but sentences which, in their simplest possible logically equivalent form, lie on that level).

This gives some sense of just how hard math is. Just understanding the first-order truths of arithmetic requires an infinity of halting oracles, each more powerful than the last. And that says nothing about the second-order truths of arithmetic! That would require even stronger Turing machines than TMω – Turing machines that have halting oracles for TMω, and then TMs with oracles for that, and so on to unimaginable heights (just how high we must go is not currently known).