Computing truth values of sentences of arithmetic, or: Math is hard

Previously I talked about the arithmetic hierarchy for sets, and how it relates to the decidability of sets. There’s also a parallel notion of the arithmetic hierarchy for sentences of Peano arithmetic, and it relates to the difficulty of deciding the truth value of those sentences.

Truth value here and everywhere else in this post refers to truth value in the standard model of arithmetic. Truth value in the sense of “being true in all models of PA” is a much simpler matter; PA is recursively axiomatizable and first order logic is sound and complete, so any sentence that’s true in all models of PA can be eventually proven by a program that enumerates all the theorems of PA. So if a sentence is true in all models of PA, then there’s an algorithm that will tell you that in a finite amount of time (though it will run forever on an input that’s false in some models).

Not so for truth in the standard model! As we’ll see, whether a sentence evaluates to true in the standard model of arithmetic turns out to be much more difficult to determine in general. Only for the simplest sentences can you decide their truth value using an ordinary Turing machine. And the set of all sentences is in some sense infinitely uncomputable (you’ll see in a bit in what sense exactly this is).

What we’ll discuss is a way to convert sentences of Peano arithmetic to computer programs. Before diving into that, though, one note of caution is necessary: the arithmetic hierarchy for sentences is sometimes talked about purely syntactically (just by looking at the sentence as a string of symbols) and other times is talked about semantically (by looking at logically equivalent sentences). Here I will be primarily interested in the entirely-syntactic version of the arithmetic hierarchy. If you’ve only been introduced to the semantic version of the hierarchy, what you see here might differ a bit from what you recognize.

Let’s begin!

The simplest types of sentences have no quantifiers at all. For instance…

0 = 0
2 ⋅ 2 < 7
(2 + 2 = 4) → (2 ⋅ 2 = 4)

Each of these sentences can be translated into a program quite easily, since +, ⋅, =, and < are computable. We can translate the → in the third sentence by converting it into a conjunction:

## (2 + 2 = 4) → (2 ⋅ 2 = 4)
not(2 + 2 == 4 and not 2 * 2 == 4)

Slightly less simple-looking are sentences with bounded quantifiers:

∀x < 10 (x + 0 = x)
∃x < 100 (x + x = x)
∀x < 5 ∃y < 7 (x > 1 → x⋅y = 12)
∃x < 5 ∀y < x ∀z < y (y⋅z ≠ x)

In each of these examples, the bounded quantifier could in principle be expanded out, leaving us with a finite quantifier-free sentence. This should suggest to us that adding bounded quantifiers doesn’t actually increase the computational difficulty.

We can translate sentences with bounded quantifiers into programs by converting each bounded quantifier to a for loop. The translation slightly differently depending on whether the quantifier is universal or existential:

def Aupto(n, phi):
    for x in range(n):
        if not phi(x):
            return False
    return True
def Elessthan(n, phi):
    for x in range(n):
        if phi(x):
            return True
    return False

Note that the second input needs to be a function; reflecting that it’s a sentence with free variables. Now we can quite easily translate each of the examples, using lambda notation to more conveniently define the necessary functions

## ∀x<10 (x + 0 = x)
Aupto(10, lambda x: x + 0 == x)

## ∃x<100 (x + x = x)
Elessthan(100, lambda x: x + x == x)

## ∀x<5 ∃y<7 ((x > 1) → (x*y = 12))
Aupto(5, lambda x: Elessthan(7, lambda y: not (x > 1 and x * y != 12)))

## ∃x<5 ∀y<x ∀z<y (y⋅z ≠ x)
Elessthan(5, lambda x: Aupto(x, lambda y: Aupto(y, lambda z: y * z != x)))

Each of these programs, when run, determines whether or not the sentence is true. Hopefully it’s clear how we can translate any sentence with bounded quantifiers into a program of this form. And when we run the program, it will determine the truth value of the sentence in a finite amount of time.

So far, we’ve only talked about the simplest kinds of sentences, with no unbounded quantifiers. There are two names that both refer to this class: Π0 and Σ0. So now you know how to write a program that determines the truth value of any Σ00 sentence!

We now move up a level in the hierarchy, by adding unbounded quantifiers. These quantifiers must all appear out front and be the same type of quantifier (all universal or all existential).

Σ1 sentences: ∃x1 ∃x2 … ∃xk Phi(x1, x2, …, xk), where Phi is Π0.
Π1 sentences: ∀x1 ∀x2 … ∀xk Phi(x1, x2, …, xk), where Phi is Σ0.

Some examples of Σ1 sentences:

∃x ∃y (x⋅x = y)
∃x (x⋅x = 5)
∃x ∀y < x (x+y > x⋅y)

And some Π1 sentences:

∀x (x + 0 = x)
∀x ∀y (x + y < 10)
∀x ∃y < 10 (y⋅y + y = x)

We can translate unbounded quantifiers as while loops:

def A(phi):
    x = 0
    while True:
        if not phi(x):
            return False
        x += 1

def E(phi):
    x = 0
    while True:
        if phi(x):
            return True
        x += 1

There’s a radical change here from the bounded case, which is that these functions are no longer guaranteed to terminate. A(Φ) never returns True, and E(Φ) never returns False. This reflects the nature of unbounded quantifiers. An unbounded universal quantifier is claiming something to be true of all numbers, and thus there are infinitely many cases to be checked. Of course, the moment you find a case that fails, you can return False. But if the universally quantified statement is true of all numbers, then the function will have to keep searching through the numbers forever, hoping to find a counterexample. With an unbounded existential quantifier, all one needs to do is find a single example where the statement is true and then return True. But if there is no such example (i.e. if the statement is always false), then the program will have to search forever.

I encourage you to think about these functions for a few minutes until you’re satisfied that not only do they capture the unbounded universal and existential quantifiers, but that there’s no better way to define them.

Now we can quite easily translate our example sentences as programs:

## ∃x ∃y (x⋅x = y)
E(lambda x: E(lambda y: x * x == y))

## ∃x (x⋅x = 5)
E(lambda x: x * x == 5)

## ∃x ∀y < x (x+y > x⋅y)
E(lambda x: Aupto(x, lambda y: x + y > x * y))

## ∀x (x + 0 = x)
A(lambda x: x + 0 == x)

## ∀x ∀y (x + y < 10)
A(lambda x: A(lambda y: x + y < 10))

## ∀x ∃y < 10 (y⋅y + y = x)
A(lambda x: Elessthan(10, y * y + y == x))

The first is a true Σ1 sentence, so it terminates and returns True. The second is a false Σ1 sentence, so it runs forever. See if you can figure out if the third ever halts, and then run the program for yourself to see!

The fourth is a true Π1 sentence, which means that it will never halt (it will keep looking for a counterexample and failing to find one forever). The fifth is a false Π1 sentence, so it does halt at the first moment it finds a value of x and y whose sum is 10. And figure out the sixth for yourself!

The next level of the hierarchy involves alternating quantifiers.

Σ2 sentences: ∃x1 ∃x2 … ∃xk Φ(x1, x2, …, xk), where Φ is Π1.
Π2 sentences: ∀x1 ∀x2 … ∀xk Φ(x1, x2, …, xk), where Φ is Σ1.

So now we’re allowed sentences with a block of one type of unbounded quantifier followed by a block of the other type of unbounded quantifier, and ending with a Σ0 sentence. You might guess that the Python functions we’ve defined already are strong enough to handle this case (and indeed, all higher levels of the hierarchy), and you’re right. At least, partially. Try running some examples of Σ2 or Π2 sentences and see what happens. For example:

## ∀x ∃y (x > y)
A(lambda x: E(lambda y: x > y))

It runs forever! If we were to look into the structure of this program, we’d see that A(Φ) only halts if it finds a counterexample to Φ, and E(Φ) only halts if it finds an example of Φ. In other words A(E(Φ)) only halts if A finds out that E(Φ) is false; but E(Φ) never halts if it’s false! The two programs’ goals are diametrically opposed, and as such, brought together like this they never halt on any input.

The same goes for a sentence like ∃x ∀y (x > y): for this program to halt, it would require that ∀y (x > y) is found to be true for some value of x, But ∀y (x > y) will never be found true, because universally quantified sentences can only be found false! This has nothing to do with the (x > y) being quantified over, it’s entirely about the structure of the quantifiers.

No Turing machine can decide the truth values of Σ2 and Π2 sentences. There’s a caveat here, related to the semantic version of the arithmetic hierarchy. It’s often possible to take a Π2 sentence like ∀x ∃y (y + y = x) and convert it to a logically equivalent but Π1 sentence like ∀x ∃y<x (y + y = x). This translation works, because y + y = x is only going to be true if y is less than or equal to x. Now we have a false Π1 sentence rather than a false Π2 sentence, and as such we can find a counterexample and halt.

We can talk about a sentence’s essential level on the arithmetic hierarchy, which is the lowest level of the logically equivalent sentence. It’s important to note here that “logically equivalent sentence” is a cross-model notion: A and B are logically equivalent if and only if they have the same truth values in every model of PA, not just the standard model. The soundness and completeness of first order logic, and the recursive nature of the axioms of PA, tells us that the set of sentences that are logically equivalent to a given sentence of PA is recursively enumerable. So we can generate these sentences by searching for PA proofs of equivalence and keeping track of the lowest level of the arithmetic hierarchy attained so far.

Even when we do this, we will still find sentences that have no logical equivalents below Σ2 or Π2. These sentences are essentially uncomputable; not just uncomputable in virtue of their form, but truly uncomputable in all of their logical equivalents. However, while they are uncomputable, they would become computable if we had a stronger Turing machine. Let’s take another look at the last example:

## ∀x ∃y (x > y)
A(lambda x: E(lambda y: x > y))

Recall that the problem was that A(E(Φ)) only halts if E(Φ) returns False, and E(Φ) can only return True. But if we had a TM equipped with an oracle for the truth value of E(Φ) sentences, then maybe we could evaluate A(E(Φ))!

Let’s think about that for a minute more. What would an oracle for the truth value of Σ1 sentences be like? One thing that would work is if we could run E(Φ) “to infinity” and see if it ever finds an example, and if not, then return False. So perhaps an infinite-time Turing machine would do the trick. Another way would be if we could simply ask whether E(Φ) ever halts! If it does, then ∃y (x > y) must be true, and if not, then it must be false.

So a halting oracle suffices to decide the truth values of Σ1 sentences! Same for Π1 sentences: we just ask if A(Φ) ever halts and return False if so, and True otherwise.

If we run the above program on a Turing machine equipped with a halting oracle, what will we get? Now we can evaluate the inner existential quantifier for any given value of x. So in particular, for x = 0, we will find that Ey (x > y) is false. We’ve found a counterexample, so our program will terminate and return False.

On the other hand, if our sentence was true, then we would be faced with the familiar feature of universal quantifiers: we’d run forever looking for a counterexample and never find one. So to determine that this sentence is true, we’d need an oracle for the halting problem for this new more powerful Turing machine!

Here’s a summary of what we have so far:

TM = Ordinary Turing Machine
TM2 = TM + oracle for TM
TM3 = TM + oracle for TM2

The table shows what type of machine suffices to decide the truth value of a sentence, depending on where on the arithmetic hierarchy the sentence falls and whether the sentence is true or false.

We’re now ready to generalize. In general, Σn sentences start with a block of existential quantifiers, and then alternate between blocks of existential and universal quantifiers n – 1 times before ending in a Σ0 sentence. Πn sentences start with a block of universal quantifiers, alternates quantifiers n – 1 times, and then ends in a Σ0 sentence. And as you move up the arithmetic hierarchy, it requires more and more powerful halting oracles to decide whether sentences are true:

(TM = ordinary Turing machine, TMn+1 = TM + oracle for TMn)

If we define Σω to be the union of all the Σ classes in the hierarchy, and Πω the union of the Π classes, then deciding the truth value of Σω ⋃ Πω (the set of all arithmetic sentences) would require a TMω – a Turing machine with an oracle for TM, TM2, TM3, and so on. Thus the theory of true arithmetic (the set of all first-order sentences that are true of ℕ), is not only undecidable, it’s undecidable with a TM2, TM3, and TMn for every n ∈ ℕ. At every level of the arithmetic hierarchy, we get new sentences that are essentially on that level (not just sentences that are superficially on that level in light of their syntactic form, but sentences which, in their simplest possible logically equivalent form, lie on that level).

This gives some sense of just how hard math is. Just understanding the first-order truths of arithmetic requires an infinity of halting oracles, each more powerful than the last. And that says nothing about the second-order truths of arithmetic! That would require even stronger Turing machines than TMω – Turing machines that have halting oracles for TMω, and then TMs with oracles for that, and so on to unimaginable heights (just how high we must go is not currently known).

No known unknowns in Peano arithmetic

Reports that say that something hasn’t happened are always interesting to me, because as we know, there are known knowns; there are things we know we know. We also know there are known unknowns; that is to say we know there are some things we do not know. But there are also unknown unknowns—the ones we don’t know we don’t know. And if one looks throughout the history of our country and other free countries, it is the latter category that tend to be the difficult ones.

Donald Rumsfeld

Rumsfeld’s statement may have been correct about American politics, but he was wrong in the context of Peano arithmetic. In Peano arithmetic, every known is a known known, and every unknown is an unknown unknown. Let me explain.

A paragraph for background: PA expresses enough arithmetic to allow it to encode sentences like “φ is provable from the axiom set T” as arithmetic properties of the natural numbers. But Peano arithmetic also has nonstandard models containing objects that aren’t natural numbers. It’s possible for a sentence about arithmetic to be true of these nonstandard models and false of the standard models. And in particular, the sentences that Gödel-encode proofs of other sentences can take on a different meaning in nonstandard models. For instance, the Gödel encoding of a sentence like “PA is consistent” looks like “There is no number with the property that it encodes a proof of 0=1 from the axioms of PA”. This might be true of the standard natural numbers, but false in some nonstandard model. There might be some nonstandard number that satisfies Gödel’s arithmetic formula for a proof of 0=1 from the axioms of PA, but which doesn’t actually encode any such proof (as only standard natural numbers Gödel-encode valid proofs). Gödel encodings are only logically equivalent to the statements they encode in the standard model of the natural numbers.

Of all the sentences of the form “φ is provable from the axiom set T”, which are provable from the axioms of Peano arithmetic? How much does Peano arithmetic know about what arbitrary formal theories prove? For all of the following results, I will assume that PA is consistent.

Our first result: If PA proves “φ is provable from T”, then φ really is provable from T.

This follows from the soundness of first-order logic. If PA proves “φ is provable from T”, then this sentence must be true in all models. In particular, it’s true in the standard model. And if it’s true in the standard model, then it must actually be true that φ is provable from T.

Second result: If φ is provable from T, then PA proves “φ is provable from T.”

This follows from three facts: (1) that the standard natural numbers are a subset of every nonstandard model, (2) that the sentence “φ is provable from T” is a statement of the form “There exists a number with some property”, and (3) that first-order logic is complete.

The Gödel encoding of “φ is provable from T” is something like “there’s some number n with the property that n encodes a proof of φ from T.” Now, suppose that φ is provable from T. Then there’s a standard natural number that encodes this proof. And since every nonstandard model contains every natural number, this number exists in all these models as well! So the statement is true in all models. So by completeness, it’s provable from PA.

So far we have that PA proves “φ is provable from T” if and only if φ is actually provable from T. Loosely, if a theory can prove something, then Peano arithmetic knows this. Even though ZFC is a stronger theory than Peano arithmetic, there’s nothing that ZFC can prove that PA doesn’t know it can prove. In particular, ZFC proves the consistency of PA, so PA proves that ZFC proves the consistency of PA! This of course doesn’t mean that PA proves its own consistency, because PA doesn’t trust ZFC to be true (it doesn’t accept the axioms of ZFC).

Furthermore, for every sentence that PA can prove, PA can prove that PA can prove it. In this sense, everything that PA knows is a known known. What we’ll see next is that for PA, every unknown is an unknown unknown. For Peano arithmetic, and in fact for ANY consistent theory that expresses enough arithmetic to be subject to Gödel’s theorems, there are no known unknowns.

Third result: If φ is not provable from PA, then “φ is not provable from PA” is not provable from PA.

This follows from Gödel’s second incompleteness theorem and the principle of explosion.

Now, the proof of our third result. Suppose “φ is not provable from PA” is provable from PA. By the principle of explosion, if PA was inconsistent then EVERYTHING would be provable from it. So by finding something that isn’t provable from its axioms, PA is able to prove its own consistency! But then, by Gödel’s second incompleteness theorem, PA must be inconsistent. This contradicts our background assumption that PA is consistent.

Note that the sentence “φ is not provable from PA” isn’t the same type of sentence as “φ is provable from PA”. As we saw a moment ago, the second is a sentence of the form “there’s some number n with a particular property,” and all of these sentences are provable from PA if and only if they’re true of the standard naturals. But the first sentence is of the form “there’s no number n with a particular property”, which is not the same! It’s possible for this sentence to be true of the standards but false of the nonstandards. All we need is for there to be no standard numbers and at least one nonstandard number with that property.

So if PA is consistent, then it can never prove that it doesn’t prove something. Now, notice that all that I’ve said applies more generally than just Peano arithmetic. For any consistent mathematical theory that express sufficient arithmetic for Gödel’s incompleteness theorems to apply, every known is a known known and every unknown is an unknown unknown.