This post is lying to you

In classical logic there are exactly two truth values, True and False, and every sentence has exactly one of these truth values. Consider the Liar sentence: “This sentence is false.” If this sentence is True, then it must be False. And if it’s False, then it must not be False. If we apply classical logic to this sentence, then it must have one of the two truth values. But whichever way we go, we find ourselves in trouble. And so the Liar sentence glitches out classical logic and produces an error.

But not so fast. For the Liar sentence to glitch out classical logic, it must first be the case that the sentence can actually be produced in classical logic. We start with an English sentence “This sentence is not true” and reason intuitively about what classical logic would do with this sentence, but we must be able to form this sentence in classical logic in order for classical logic to have anything to say about it.

There are two major hurdles with importing “This sentence is not true” into classical logic: translating “this sentence” and translating “is not true”. The first requires us to be able to have self-referential sentences. It’s not so obvious to see how we accomplish this with the machinery of classical logic. The second requires us to have a truth predicate. This perhaps seems prima facie easier to do, but it turns out that it has its own host of issues.

Let’s deal with the second issue first: how do we make sense of “is not true”? We’ll start by looking at “is true” (once we have this, then we can just negate it to get “is not true”). So, how do we make sense of “is true”?

When we say “X is true”, we aren’t assigning equality between an object X and an object True. We’re instead attributing the property of truthiness to the sentence X. So “is true” seems to act kind of like a predicate in first-order logic. But there’s a problem. Predicates apply to objects in the domain of a model, not to sentences in the language itself. For instance, the predicate “is red” might apply to firetrucks and Clifford, as objects in the universe, but it’s not going to apply to sentences in the language.

With this in mind, it appears that “is true” is actually more like a modal operator. It applies to sentences, not objects. We could introduce a modal operator T such that TX is interpreted as “X is true”, and add an axiom schema that says for every sentence φ, φ ↔ Tφ. The problem with this is that we will never get self reference with this approach. We want to create a sentence P that says “P is not true”. We could try something like P ↔ ¬TP, but this ends up just being a false sentence, not paradoxical. What’s missing is that the sentence “P ↔ ¬TP” is not equivalent to the sentence P: they’re just different sentences.

So the modal approach to interpreting “is true” has failed for our purposes. It’s simply not subtle enough to allow us to express self-reference. So let’s return to the predicate approach. The problem was that predicates apply to objects, not sentences. But what if the sentences were themselves objects? Of course, the sentences cannot literally be objects: they are purely syntactical items, whereas objects exist in the semantics (the interpretation of the language). But each sentence could have some sort of representative object in the domain.

What Gödel showed is that this is indeed possible. He designed a coding technique such that every sentence in the language gets assigned a particular natural number, and no two sentences have the same number. And if sentences correspond to numbers, then properties of those sentences can be translated into properties of those numbers!

Now if our language is sufficiently expressive to talk about natural number arithmetic, then our sentences can express properties of other sentences! In other words, we want a theory in a logic that has ℕ as a model. And we also want it to be sufficiently expressive to be able to talk about properties of numbers, like “being prime” or “being twice-divisible by 7”. Then we can imagine a predicate True(x), such that True(x) is True if and only if the sentence encoded by the number x is True.

For notational convenience, we’ll write “the number that encodes the sentence P” as ⟦P⟧. Then what we want of our truth predicate is that for every sentence φ, True(⟦φ⟧) ↔ φ.

Now, returning to the Liar sentence, we’ve dealt with “is true”, but now have to deal with “this sentence”. Remember that we want a sentence φ that asserts that the truth predicate does not apply to itself. In other words, we want φ to be the same thing as ¬True(⟦φ⟧). But how can this be? Clearly these are two different sentences, no?

Well, it’s not so obvious that φ and ¬True(⟦φ⟧) are actually distinct sentences. Remember that ⟦φ⟧ is just some number. So the sentence φ might be ¬True(9129828475651384). This is only a genuine liar sentence if 9129828475651384 encodes the sentence φ.

So really what we need to do is to look for some natural number n such that the sentence encoded by n is exactly “¬True(n)”. This would be a sentence which if true must be false, and if false must be true. It’s not at all obvious that such a natural number exists. But in 1934, Carnap proved the diagonal lemma, the tool necessary to construct such a number.

The diagonal lemma says that in any theory that can express natural number arithmetic (specifically, a theory that can define all primitive recursive functions), and for every predicate P(x), there’s a sentence ψ such that ψ ↔ P(⟦ψ⟧) is provable. Let P(x) be equal to ¬True(x), and we get that there’s a sentence ψ such that ψ ↔ ¬True(⟦ψ⟧) is provable!

In other words, there’s a sentence ψ encoded by a number n, such that ψ is true if and only if ¬True(n). This is exactly the liar paradox! We’ve succeeded at sneaking in a contradiction to classical logic! So what does this mean? Is classical logic ultimately all inconsistent? Do we need to rebuild logic from the ground up?

Not quite! Notice that to actually get to the point where we could express the Liar sentence, we had to take on a lot of assumptions. Let’s list them all out:

  1. Our language allows us to express natural number arithmetic.
  2. Our theory of natural numbers is strong enough to allow Gödel coding.
  3. Our theory of natural numbers is strong enough to express every primitive recursive function.
  4. There is a truth predicate.

From these assumptions, we were able to prove an inconsistency. But this doesn’t mean that classical logic is therefore inconsistent! Rather, it means that any consistent theory has to violate at least one of these assumptions! In particular, if we have a consistent theory that allows us to do both Gödel coding and to express primitive recursive functions, then this theory cannot have a truth predicate!

It’s important to understand that #4 here really is an assumption. When I described a truth predicate, I said “we can imagine that such a predicate exists.” I never showed you how to explicitly construct it! We could always explicitly add in a truth predicate T to a theory of arithmetic, and then assert as axioms φ ↔ T(⟦φ⟧) for every sentence φ. And the above line of reasoning shows us that doing so will render our theory inconsistent. If we don’t explicitly add in a truth predicate, then we could try to construct it from primitive relation and function symbols of the language. But the above line of reasoning shows us that no matter how hard we try, we will never succeed in this construction!

It’s interesting to note that (2) and (3) are actually different assumptions. (3) implies (2), but (2) doesn’t imply (3). In other words, you can have very weak theories of arithmetic that are expressive enough to do Gödel coding, but not expressive enough to prove the diagonal lemma! The amazing feature of these theories is that it’s possible for them to prove their own consistency without becoming inconsistent!

Finally, notice that the diagonal lemma was quite a bit more powerful than what we strictly required for our reasoning above. In particular, it allowed us to talk about ANY predicate whatsoever, not just a truth predicate. Consider what happens when instead of using “is true” as our predicate, we use “is provable”. You might get a somewhat interesting result!

Formal Semantics 1: Historical Prelude and Compositionality

English is really complicated. For a long time, logicians looking at natural languages believed that there could be no formal system detailing their grammar and semantics. They resigned themselves to extremely simple idealized fragments of English, like propositional logic (formalizing “and”, “not”, and “or”) and first-order logic (formalizing “every”, “some”, and “is”).

The slogan of the time was “ordinary language has no logic” (Bertrand Russell and Peter Strawson). Chomsky famously argued that the languages invented by logicians were too artificial and entirely unlike natural languages, and that therefore the methods of logicians simply couldn’t be applied to this more complex realm.

This attitude has changed over time. Perhaps the most important figure in the “logic of natural language” movement is Richard Montague, a student of the giant of logic Alfred Tarski. The first line of his paper English as a Formal Language reads “I reject the contention that an important theoretical difference exists between formal and natural languages”, and he follows this up by more or less single-handedly invented formal semantics, now a thriving field. Hilariously, Montague apparently saw this work as child’s play, writing:

I (…) sat down one day and proceeded to do something that I previously regarded, and continue to regard, as both rather easy and not very important — that is, to analyze ordinary language.

(This had to hit hard for linguists of his time.)

Alright, enough prologue. In the next few posts I want to describe a naive first pass at formalizing a fairly substantial fragment of English, modeled off of Montague semantics. The key concept throughout will be the notion of compositionality, which I’ll briefly describe now.

Compositionality

Compositionality is all about how to construct the meaning of phrases from their smaller components. Take a sentence like “The cat sat on the mat.” The meaning of this sentence clearly has something to do with the meanings of “the cat” and “sat on the mat”. Similarly, the meaning of “sat on the mat” must have something to do with the meanings of “sat”, “on”, “the”, and “mat”.

The compositionality thesis says that this is all that determines the meaning of “the cat sat on the mat.” In other words, the meaning of any phrase is a function of the meanings of the individual words within it. These meanings are composed together in some way to form the meaning of the sentence as a whole.

The natural question that arises now is, what is the nature of this composition? Take a very simple example: “Epstein died.” According to compositionality, the meaning of “Epstein died” depends only on the meanings of “Epstein” and “died”. That seems pretty reasonable. What about: “Epstein died suspiciously”? How do we compose the meanings of the individual words when there are three?

One proposal is to compose all three simultaneously. That’s possible, but a simpler framework would have us build up the meanings of our sentences iteratively, composing two units of meaning at a time until we’ve generated the entire sentence’s meaning.

Let me now introduce some notation that allows us to say this compactly. If X is some word, phrase, or sentence, we’ll denote the meaning of X as ⟦X⟧. Then the principle of binary compositionality is just that there’s some function F such that ⟦X Y⟧ = F(⟦X⟧, ⟦Y⟧).

There’s two major questions that arise at this point.

First, in which order should we compose our units of meaning? Should we combine “Epstein” with “died” first, and then combine that with “suspiciously”? Or should it be “Epstein” and “suspiciously” first, then that with “died”? Or should we combine “Epstein” with the combination of “suspiciously” and “died”?

One might suggest here that the order actually doesn’t matter; no matter what order we combine the meanings in, we should still get the same meaning. The problem with this is that “The Clintons killed Epstein” has a different meaning than “Epstein killed the Clintons.” If order of composition didn’t matter, then we’d expect these to mean the same thing.

Second, how exactly does composing two meanings work? Is there a single rule for composition, or are there multiple different rules that apply in different contexts? It would be most elegant if we could find a single universal rule for generating meanings of complicated phrases from simple ones, but maybe that’s overambitious.

For instance, you might model the meaning of “died” as a set of objects, namely all those objects that died at some moment in the past, and the meaning of “Epstein” as one particular object in the universe. Then we might have our composition rule be the following: ⟦Epstein died⟧ will be a truth value, and it will be True if and only if the object denoted by “Epstein” is within the set of objects denoted by “died”. So in this framework, ⟦X Y⟧ = True if and only if ⟦X⟧ ∈ ⟦Y⟧.

This works nicely for “Epstein died”. But what about “Epstein died suspiciously”? Now we have two compositions to do, and the order of composition will matter. The problem is that no matter how we compose things, it seems not to work. Suppose that we combine “died” and “suspiciously” first, then combine “Epstein” with that. Using our model, ⟦died suspiciously⟧ will be True if and only if ⟦died⟧ ∈ ⟦suspiciously⟧, which is already a little bit weird. But even worse, ⟦Epstein died suspiciously⟧ will be True if and only if ⟦Epstein⟧ ∈ ⟦died suspiciously⟧. But what would it mean for the object denoted by “Epstein” to be an element of a truth value? It looks like in this framework, most three-word sentences end up becoming vacuously false.

Anyway, the last two paragraphs only show us that one particular attempt to formalize composition fails to be universal. It doesn’t show that it’s impossible in general. In fact, we’ll end up doing pretty well with a small set of composition rules centered around function application. The idea can be very simply phrased as: ⟦X Y⟧ = ⟦X⟧(⟦Y⟧). And in particular, the meaning of “Epstein died suspiciously” will be ⟦suspiciously⟧(⟦died⟧)(⟦Epstein⟧). And that’s enough warm-up! Next we’ll explore this idea further and dive into our Montague-style system.

A Self-Interpreting Book

A concept: a book that starts by assuming the understanding of the reader and using concepts freely, and as you go on it introduces a simple formal procedure for defining words. As you proceed, more and more words are defined in terms of the basic formal procedure, so that halfway through, half of the words being used are formally defined, and by the end the entire thing is formally defined. Once you’re read through the whole book, you can start it over and read from the beginning with no problem.

I just finished a set theory textbook that felt kind of like that. It started with the extremely sparse language of ZFC: first-order logic with a single non-logical symbol, ∈. So the alphabet of the formal language consisted of the following symbols: ∈ ( ) ∧ ∨ ¬ → ↔ ∀ ∃ x ‘. It could have even started with a sparser formal language if it was optimizing for alphabet economy: ∈ ( ∧ ¬ ∀ x ‘ would suffice. As time passed and you got through more of the book, more and more things were defined in terms of the alphabet of ZFC: subsets, ordered pairs, functions from one set to another, transitivity, partial orders, finiteness, natural numbers, order types, induction, recursion, countability, real numbers, and limits. By the last chapter it was breathtaking to read a sentence filled with complex concepts and realize that every single one of these concepts was ultimately grounded in this super simple formal language we started with, with a finitistic sound and complete system of rules for how to use each one.

But could it be possible to really fully define ALL the terms used by the end of the book? And even if it were, could the book be written in such a way as to allow an alien that begins understanding nothing of your language to read it and, by the end, understand everything in the book? Even worse, what if the alien not only understands nothing of your language, but starts understanding nothing of the concepts involved? This might be a nonsensical notion; an alien that can read a book and do any level of sophisticated reasoning but doesn’t understand concepts like “and” and “or“.

One way that language is learned is by “pointing”: somebody asks me what a tree is, so I point to some examples of trees and some examples of non-trees, clarifying which is and which is not. It would be helpful if in this book we could point to simple concepts by means of interactive programs. So, for instance, an e-book where an alien reading the book encounters some exceedingly simple programs that they can experiment with, putting in inputs and seeing what results. So for instance, we might have a program that takes as input either 00, 01, 10, or 11, and outputs the ∧ operation applied to the two input digits. Nothing else would be allowed as inputs, so after playing with the program for a little bit you learn everything that it can do.

One feature of such a book would be that it would probably use nothing above first-order logical concepts. The reason is that the semantics of second-order logic cannot be captured by any sound and complete proof system, meaning that there’s no finitistic set of rules one could explain to an alien so that they know how to use the concepts involved correctly. Worse, the set of second-order tautologies is not even recursively enumerable (worse than the set of first-order tautologies, which is merely undecidable), so no amount of pointing-to-programs would suffice. First-order ZFC can define a lot, but can it define enough to write a book on what it can define?

On Self-Hating Theories of Arithmetic

Gödel’s second incompleteness theorem tells us that no (sufficiently powerful) consistent theory can prove the statement of its own consistency. But did you know that such a theory can prove the statement of its own inconsistency? A consistent theory that claims to be inconsistent is what I’ll call a self-hating theory.

My convention in what follows: ℕ refers to the real, true natural numbers, the set consisting of {0, 1, 2, 3, …} and nothing else. ω refers to the formal object that exists in a theory of arithmetic that is “supposed to be” ℕ, but (in first-order logic) cannot be guaranteed to be so.

When I write ℕ ⊨ ψ, I am saying that the sentence ψ is true of the natural numbers. When I write T ⊢ ψ (resp. T ⊬ ψ), I am saying that the sentence ψ can be (resp. can’t be) proven from the axioms of the theory T. And when I write T ⊨ ψ, I am saying that the axioms of T semantically entail the truth of ψ (or in other words, that ψ comes out true in all models of T). The next two paragraphs will give some necessary background on Gödel’s encoding, and then we’ll explore the tantalizing claims I started with.

Gödel’s breathtakingly awesome insight was that within any language that is expressive enough to talk about natural number arithmetic, one can encode sentences as numbers and talk about syntactic features of these sentences as properties of numbers. When a number n encodes a sentence ψ, we write n = ⟦ψ⟧. Then Gödel showed that you can have sentences talking about the provability of other sentences. (The next step, of course, was showing that you can have sentences talking about their own provability – sneaking in self-reference through the back door of any structure attempting to describe arithmetic.)

In particular, in any theory of natural number arithmetic T, one can write a sentence that on its surface appears to just be a statement about the properties of natural numbers, but when looked at through the lens of Gödel’s encoding, ends up actually encoding the sentence “T ⊢ ψ”. And this sentence is itself encoded as some natural number! So there’s a natural number n such that n = ⟦T ⊢ ψ⟧. It’s a short step from here to generating a sentence that encodes the statement of T’s own consistency. We merely need to encode the sentence “¬∃n (n = ⟦T ⊢ 0=1⟧)”, or in English, there’s no number n such that n encodes a proof of “0=1” from the axioms of T. In even plainer English, no number encodes a proof of contradiction from T (from which it follows that there IS no proof of contradiction from T, as any proof of contradiction would be represented by some number). We write this sentence as Con(T).

Okay, now we’re in a position to write the original claim of this post more formally. If a theory T is consistent, then ℕ ⊨ Con(T). And Gödel’s second incompleteness theorem tells us that if ℕ ⊨ Con(T), then T ⊬ Con(T). But if T doesn’t prove the sentence Con(T), then no contradiction can be derived by adding ¬Con(T) as an axiom! So (T + ¬Con(T)) is itself a consistent theory, i.e. ℕ ⊨ Con(T + ¬Con(T)). But hold on! (T + ¬Con(T)) can prove its own inconsistency! Why? Because (T + ¬Con(T)) ⊢ ¬Con(T), i.e. it proves that a contradiction can be derived from the axioms of T, and it also has as axioms every one of the axioms of T! So the same number that encodes a proof of the inconsistency of T, also counts as a proof of the inconsistency of (T + ¬Con(T))!

Summarizing this all:

ℕ ⊨ Con(T)

T ⊬ Con(T)

ℕ ⊨ Con(T + ¬Con(T)),
but
(T + ¬Con(T)) ⊢ ¬Con(T + ¬Con(T))

There we have it, a theory that is consistent but proves its own inconsistency!

Expressed another way:

T ⊢ ∃n (n = ⟦T ⊢ 0=1⟧),
but
T ⊬ 0=1

Ok, so believe it or not, a lot of the strangeness of this can be explained away by thinking about the implications of nonstandard models of arithmetic. One easy way to see this is to reflect on the fact that, as we saw above, “T is consistent” becomes in Gödel’s encoding, “There is no natural number n such that n encodes a proof of T’s inconsistency.” Or more precisely, “T is consistent” becomes “There is no natural number n such that n = ⟦T ⊢ 0=1⟧.”

Now, no first-order theory can pin down the natural numbers.

(I’ve written about this here and here.) I.e. no first order theory can express a quantification like “there is no natural number N such that …”. You can try, for sure, by defining some object ω and adding axioms to restrict its structure to look more and more like ℕ, but no matter how hard you try, no matter how many axioms you add, there will always be models of the theory in which ω ≠ ℕ. In particular, ω will be a strict superset of ℕ in all of these nonstandard models (ℕ ⊂ ω), so that ω contains all the naturals but also additional nonstandard numbers.

So now consider what happens when we try to quantify over the naturals by saying “∀x ∈ ω”. This quantifier inevitably ranges over ALL of the elements of ω in each model, so it also touches the nonstandard numbers in the nonstandard models. This means that the theory only semantically entails quantified statements that are true of all possible nonstandard numbers! (Remember, T ⊨ ψ means that ψ is true in ALL models of T.)

One nice consequence of this is that if T has a model in which ω = ℕ then in this model “∀x∈ω Φ(x)” is true only if Φ(x) is true of all natural numbers. By the completeness of first-order logic, this means that T can’t prove “∀x∈ω Φ(x)” unless it’s true of ℕ. This is reassuring; if T ⊢ ∀x∈ω Φ(x) and T has a model in which ω = ℕ, then ℕ ⊨ ∀x∈ω Φ(x).

But the implication doesn’t go the other way! ℕ ⊨ ∀x∈ω Φ(x) does not guarantee us that T ⊢ ∀x∈ω Φ(x), because T can only prove that which is true in EVERY model. So T can only prove “∀x∈ω Φ(x)” if Φ(x) is true of all the naturals and every nonstandard number in every model of T!

This is the reason that we don’t know for sure that if Goldbach’s conjecture is true of ℕ, then it’s provable in Peano arithmetic. On the face of it, this looks quite puzzling; Goldbach’s conjecture can be written as a first-order sentence and first-order logic is complete, so if it’s true then how could we possibly not prove it? The answer is hopefully clear enough now: Goldbach’s conjecture might be true of all of ℕ but false of some nonstandard models of Peano arithmetic (henceforth PA).

You might be thinking “Well if so, then we can just add Goldbach’s conjecture as an axiom to PA and get rid of those nonstandard models!” And you’re right, you will get rid of those nonstandard models. But you won’t get rid of all the nonstandard models in which Goldbach’s conjecture is true! You can keep adding as axioms statements that are true of ℕ but false of some nonstandard model, and as you do this you rule out more and more nonstandard models. At the end of this process (once your theory consists of literally all the first-order sentences that are true of ℕ), you will have created what is known as “True Arithmetic”: {ψ | ℕ ⊨ ψ}.

But guess what! At this point, have you finally ruled out all the nonstandard models? No! There’s still many many more (infinitely many, in fact! Nonstandard models of every cardinality! So many models that no cardinality is large enough to describe how many!) Pretty depressing, right? There are all these models that agree with ℕ on every first order sentence! But they are still not ℕ (most obviously because they contain numbers larger than 0, 1, 2, and all the rest of ℕ).

The nonstandard models of True Arithmetic are the models that are truly irremovable in any first-order theory of arithmetic. Any axiom you add to try to remove them will also remove ℕ as a model. And when you remove ℕ as a model, some pretty wacky stuff begins to happen.

Fully armed now with new knowledge of nonstandard numbers, let’s return to the statement I started with at the top of this post: there are consistent theories that prove their own inconsistency. The crucial point, the thing that explains this apparent paradox, is that all such theories lack ℕ as a model.

If you think about this for a minute, it should make sense why this must be the case. If a theory T is consistent, then the sentence “∀x∈ω (x ≠ ⟦T ⊢ 0 = 1⟧)” is true in a model where ω = ℕ. So if T has such a model, then T simply can’t prove its own inconsistency, as it’s actually not inconsistent and the model where ω = ℕ will be able to see that! And once more, T can only prove what’s true in all of its models.

Okay, so now supposing T is consistent (i.e. ℕ ⊨ Con(T)), by Gödel’s second incompleteness theorem, T cannot prove its own consistency. This means that (T + ¬Con(T)) is a consistent theory! But (T + ¬Con(T)) no longer has ℕ as a model. Why? Because ℕ ⊨ Con(T) and (T + ¬Con(T)) ⊨ ¬Con(T). So for any consistent theory T, (T + ¬Con(T)) only has nonstandard models. What does this mean about the things that T + ¬Con(T) proves? It means that they no longer have to be true of ℕ. So for instance, even though ℕ ⊨ Con(T + ¬Con(T)), (T + ¬Con(T)) might end up proving ¬Con(T + ¬Con(T)). And in fact, it does prove this! As we saw up at the top of this post, a moment’s examination will show that (T + ¬Con(T)) asserts as an axiom that a contradiction can be derived from the axioms of T, but also contains all the axioms of T! So by monotonicity, (T + ¬Con(T)) proves ¬Con(T + ¬Con(T)).

What do we say of this purported proof of contradiction from (T + ¬Con(T))? Well, we know for sure that it’s not a standard proof, one that would be accepted by a mathematician. I.e., it asserts that there’s some n in ω that encodes a proof of contradiction from (T + ¬Con(T)). But this n is not actually a natural number, it’s a nonstandard number. And nonstandards encode proofs only in the syntactical sense; a nonstandard proof is a proof according to Gödel’s arithmetic encoding, but Gödel’s arithmetic encoding only applies to natural numbers. So if we attempted to translate n, we’d find that the “proof” it encoded was actually nonsense all along: a fake proof that passes as acceptable by wearing the arithmetic guise of a real proof, but in actuality proves nothing whatsoever.

Summarizing:

In first order logic, every theory of arithmetic has nonstandard models that foil our attempts to prove all the truths of ℕ. Theories of arithmetic with ONLY nonstandard models and no standard model can prove things that don’t actually hold true of ℕ. In particular, since theories of arithmetic can encode statements about their own consistency, theories that don’t have ℕ as a model can prove their own inconsistency, even if they really are consistent.

So much for first order logic. What about

Second Order Logic?

As you might already know, second order logic is capable of ruling out all nonstandard models. There are second order theories that are categorical for ℕ. But there’s a large price tag for this achievement: second order logic has no sound and complete proof system!

Sigh. People sometimes talk about nature being tricky, trying to hide aspects of itself from us. Often you hear this in the context of discussions about quantum mechanics and black holes. But I think that the ultimate trickster is logic itself! Want a logic that’s sound and complete? Ok, but you’ll have to give up the expressive power to allow yourself to talk categorically about ℕ. Want to have a logic with the expressive power to talk about ℕ? Ok, but you’ll have to give up the possibility of a sound and complete proof system. The ultimate structure of ℕ remains shifty, slipping from our view as soon as we try to look closely enough at it.

Suppose that T is a second order theory that is categorical for ℕ. Then for every second-order sentence ψ that is true of ℕ, T ⊨ ψ. But we can’t make the leap from T ⊨ ψ to T ⊢ ψ without a complete proof system! So there will be semantic implications of T that cannot actually be proven from T.

In particular, suppose T is consistent. Then T ⊨ Con(T), but T ⊬ Con(T), by Gödel’s second. And since T ⊬ Con(T), (T + ¬Con(T)) is consistent. But since T ⊨ Con(T), (T + ¬Con(T)) ⊨ Con(T). So (T + ¬Con(T)) ⊨ Con(T) ∧ ¬Con(T)!

In other words, T + ¬Con(T) actually has no model! But it’s consistent! There are consistent second-order theories that are actually not logically possible – that semantically entail a contradiction and have no models. How’s that for trickiness?

Finiteness can’t be captured in a sound, complete, finitary proof system

Consider the sentence “This blog has finitely many posts.” Do you understand what this sentence means? Then no set of rules (even infinite, even uncomputable!) can model your reasoning. This claim may sound shocking, but it can be justified on solid metamathematical grounds.

Another example: the sentence “There are finitely many planets in the universe.” You don’t have to think it’s true, you just have to think you understand what it means. What’s the common theme? It’s the notion of there being ‘finitely many’ of some class of objects. Let’s imagine building a language that has the expressive resources of first-order logic (which are quite modest), plus an additional quantifier F, whose semantics are given by the following rule: Fx φ(x) is satisfied by a model iff there are only finitely many objects in that model that satisfy φ(x).

It turns out that any language consisting of first order logic plus the quantifier F can’t be axiomatized in any sound, complete, and finitary proof system. Notice that effective enumerability of the rules of the proof system is not a requirement here! So long as the language is strong enough to express the semantics of {∧, ¬, ∀, F, variables xn, and relations Rn}, no set of sentences and sentence-manipulation rules in that language will suffice to capture these semantics.

Here’s a proof: consider the first-order theory of Peano arithmetic. This theory has nonstandard models (as any theory of arithmetic must have in a logic that is compact). All of these nonstandard models have the following feature: that there are numbers that are larger than infinitely many numbers. Think about what this means: this is a common feature of all nonstandards, so if we could write a sentence to express this feature then we could rule them all out! This is where the quantifier F steps in. With F, we can write the following simple sentence and add it to PA as an axiom:

∀x Fy (y < x)

In English: every number has finitely many numbers less than it. And with that, we’ve ruled out all nonstandard models! So now we have a theory that is categorical for ℕ. And that’s a big deal, metamathematically speaking!

Why? Well, as I’ve talked about in a previous post, you can prove some pretty strong limitative results about any logic that can produce a theory that’s categorical for ℕ. In particular, if we can produce such a theory then its logic cannot be compact. Quick proof: suppose a set of sentences Σ models ℕ. Add to Σ a constant c and the axioms “c ≠ 0”, “c ≠ 1”, “c ≠ 2”, and so on, and call this new set Σ’. Every finite subset of Σ’ models ℕ. So by compactness, Σ’ has a model. But this model is nonstandard – it contains numbers that aren’t natural numbers. And since Σ is a subset of Σ’, any model of Σ’ is also a model of Σ.

So compactness implies that no theory is categorical for ℕ. But compactness follows from the following three properties: a sound and complete proof system (Σ ⊢ α if and only if Σ ⊨ α), and that all proofs are only finitely long (try expressing this property without F!). Quick proof: If a set of sentences is finitely satisfied, then every finite subset of it has a model (by definition), so no finite subset of it can be refuted (by soundness), so the entire set can’t be refuted (by finite proofs), so the entire set is satisfied (by completeness).

So soundness + completeness + finiteness ⇒ compactness ⇒ the existence of nonstandard models of arithmetic in any theory that models ℕ. Which means that the semantics of F cannot be captured in any sound, complete, and finite proof system!

Take your pick: either you don’t really understand the semantics of the “finitely many” quantifier F, or no set of rules (not even requiring this set to be finite or computable) can fully capture your reasoning in finite-length proofs.

More information about related extensions of first-order logic and their limitations can be found here. The result I describe here is a rephrasing of results discussed there.

Meaning ain’t in the brain

I don’t know if there’s a name for the position that the meanings of our terms is pinned down by facts about the brain. The closest I know is semantic internalism, but a semantic internalist could think that meaning is pinned down by facts about qualia, which happen to not be facts about the brain. So I’ll make up a name for this position: call it physicalist semantic internalism.

Now, here’s an argument against physicalist semantic internalism that seems totally right to me.

What I mean by “second-order logical concepts” is the concepts of “and”, “or”, “not”, second-order quantifiers (“for all” and “for some”, ranging over not just objects but properties of objects), and the notions of functions, relations, and concepts.

  1. The semantics of second order logic captures what I mean when I use second-order logical concepts.
  2. No finite set of rules (and correspondingly no finite machine) can pin down the semantics of second order logic.
  3. So no finite machine pins down what I mean when I use second-order logical concepts.
  4. My brain is a finite machine.
  5. So my brain does not pin down what I mean when I use second-order logical concepts.

And here’s another argument along similar lines:

  1. The truth values of sentences about integers are determined by what we mean by integers.
  2. The statement of the satisfiability of each Diophantine equation has a determinate truth value.
  3. The statement of the satisfiability of each Diophantine equation is a statement about integers.
  4. So the satisfiability of each Diophantine equation is fixed by what we mean by integers.
  5. No finite machine can fix the satisfiability of each Diophantine equation.
  6. Our brain is a finite machine.
  7. So the meaning of integers is not contained in the brain.