Constructing ordinal numbers in ZFC

Today I want to talk about ordinal numbers in ZFC set theory. VSauce does a great job introducing his viewers to the concepts of ordinal vs cardinal numbers, and giving a glimpse into the weird and wild world of mathematical infinity. I want to go a bit deeper, and show exactly how the ordinals are constructed in ZFC. Let’s begin!

Okay, so first of all, we’re talking about first-order ZFC, which is an axiomatic formalization of set theory in first-order logic. As a quick reminder, first order logic gives us access to the following alphabet of symbols: ( ) , ∧ ∨ ¬ → ↔ ∀ ∃ =, as well as an infinite store of variables (x, y, z, w, v, u, and so on). A first order language also includes a store of constant symbols, relation symbols, and function symbols.

For first-order set theory, we are going to add only a single extra-logical symbol to our alphabet: the “is-an-element-of” relation ∈. This is pretty remarkable when you consider that almost all of mathematics can be done with just ZFC! In some sense, you can give a pretty good description of mathematics as the study of the elementhood relation! Using just ∈ we can define everything we need, from ⊆ and ⋃ and ⋂ to the empty set ∅ and the power set function P(x). In fact, as we’ll see, we’re even going to define numbers using ∈!

The elementhood relation ∈ is given its intended meaning by the axiom of extensionality: ∀x∀y (x=y ↔ ∀z (z∈x ↔ z∈y)). In plain English this says that two sets are the same exactly when they have all the same elements.

The semantics of first order logic has two parts: a “universe” of individuals that are quantified over by ∀ and ∃, and an interpretation of each of the constant symbols (as individual objects in the universe), the relation symbols (as maps from the universe to truth values), and the function symbols (as maps from the universe to itself).

Our universe is going to be entirely composed of sets. This means that sets won’t be composed of non-set elements; the elements of non-empty sets are always themselves sets. And those sets themselves, if non-empty, are made out of sets. It’s sets all the way down!

Now, the topic of this essay is ordinal numbers in ZFC. So if everything in ZFC is a set, guess what ordinal numbers will be? You got it, sets! What sets? We can translate from ordinals to sets in a few words: the ordinal 0 is translated as the empty set, and every other ordinal is translated as the set of all smaller ordinals.

This tells us that 1 = {0}, 2 = {0, 1}, 3 = {0, 1, 2}, and so on. If we were to write these ordinals entirely in set notation, it would look like: ∅, {∅}, {∅,{∅}}, {∅,{∅},{∅,{∅}}}, and so on. The choice to associate these particular sets with the natural numbers is a convention introduced by John Von Neumann (it is, however, an exceedingly wise convention, and has many virtues that competing conventions do not have, as will become clearer once we ascend to the transfinite).

So, now we know that we are going to associate the finite ordinals with the empty set and supersets of the empty set. But of course, we haven’t yet even shown that the empty set exists in ZFC! To actually construct the empty set in ZFC, we have to add some more axioms. Let’s start with the obvious one: a sentence that asserts the existence of the empty set:

Axiom of Empty Set: ∃x∀y ¬(y∈x)

In other words, there’s some set x that contains no sets. Notice that we didn’t refer to the empty set by name. We can’t refer to it by name in our axioms, because we haven’t included any constant symbols in our language! To actually talk about the particular set ∅ (equivalently, 0), we can use the rule of existential instantiation, which allows us to remove any existential quantifier, as long as we change the name of the quantified variable to something that has not been previously used. So, for example, in any particular proof, we can do the following:

1.  ∃x∀y ¬(y∈x) (Axiom of Empty Set)
2. ∀y ¬(y∈0) (from 1 by existential instantiation)

This is allowed so long as the symbol 0 has not appeared anywhere previously in the proof.

Now that we have the empty set, we need to be able to construct 1={0} and 2={0,1}. To do this, we introduce the axiom of pairing:

Axiom of Pairing: ∀x∀y∃z∀w (w∈z ↔ (w=x ∨ w=y))

This says that we can take any two sets x and y and form a new set z = {x, y}. We can right away use this axiom to construct the number 1.

3. ∀x∀y∃z∀w (w∈z ↔ (w=x ∨ w=y)) (Axiom of Pairing)
4. ∃z∀w (w∈z ↔ (w=0 ∨ w=0)) (from 3 by universal instantiation)
5. ∀w (w∈1 ↔ (w=0 ∨ w=0)) (from 4 by existential instantiation)
6. ∀w (w∈1 ↔ w=0)

We went from 3 to 4 by using universal instantiation (instantiating both variables x and y as 0), and from 4 to 5 by using existential instantiation (instantiating z as 1, which is allowed because we haven’t used the symbol 1 yet). The step from 5 to 6 is technically skipping a bunch of steps. Even though it’s obvious that we can replace (w=0 ∨ w=0) with w=0 inside the formula, there isn’t any particular rule of inference in first order logic that does it in one step. But since it is obvious that 5 semantically entails 6, and since first order logic has a sound and complete proof system, we know that 5 also syntactically entails 6.

To construct 2, we can simply use pairing again with 0 and 1:

7. ∃z∀w (w∈z ↔ (w=0 ∨ w=1)) (from 3 by universal instantiation)
8. ∀w (w∈2 ↔ (w=0 ∨ w=1)) (from 4 by existential instantiation)

We might think that we could simply use pairing once more to get 3 = {0,1,2}. But this won’t quite work. If we use pairing on 1 and 2 we get {1,2}, and if we use pairing again on this and 0 we get {0,{1,2}}, not {0,1,2}. In fact, we can easily see that any usage of pairing always produces a set with exactly two elements. And the set 3 has three elements! So pairing is not enough to get us where we want to go. To get to 3, we need a stronger axiom, which will allow us to take the union of sets.

Axiom of Union: ∀x∃y∀z (z∈y ↔ ∃w(z∈w ∧ w∈x))

This says that for any set x, we can construct a new set y which consists of the union of all sets in x. In other words, if z is an element of y, then z must be contained in some set w that’s an element of x.

Now, here’s how we’re going to construct 3. First we construct the set {2} by pairing 2 with itself. Then we construct the set {2,{2}} with pairing again. Then we union all the elements of this set to get 2⋃{2}. Now remember that 2 = {0,1}, so 2⋃{2} = {0,1}⋃{2} = {0,1,2}. And that’s 3!

Constructing {2}:

9. ∃z∀w (w∈z ↔ (w=2 ∨ w=2)) (from 3 by universal instantiation)
10. ∀w (w∈{2} ↔ (w=2 ∨ w=2)) (from 9 by existential instantiation)
11. ∀w (w∈{2} ↔ w=2)

Constructing {2,{2}}:

12. ∃z∀w (w∈z ↔ (w=2 ∨ w={2})) (from 3 by universal instantiation)
13. ∀w (w∈{2,{2}} ↔ (w=2 ∨ w={2})) (from 12 by existential instantiation)

Constructing 3:

14. ∀x∃y∀z (z∈y ↔ ∃w(z∈w ∧ w∈x)) (Axiom of Union)
15. ∃y∀z (z∈y ↔ ∃w(z∈w ∧ w∈{2,{2}})) (from 9 by universal instantiation)
16. ∀z (z∈3 ↔ ∃w(z∈w ∧ w∈{2,{2}})) (from 10 by existential instantiation)

Technically, we’ve now constructed 3. But let’s neaten this up and show that this set is really what we wanted (using more of the intuitive semantic arguments from before to skip many tedious steps).

17. ∀z (z∈3 ↔ ∃w(z∈w ∧ (w=2 ∨ w={2}))) (from 13,16)
18. ∀z (z∈3 ↔ ∃w((z∈w ∧ w=2) ∨ (z∈w ∧ w = {2})))
19. ∀z (z∈3 ↔ (z∈2 ∨ z∈{2}))
20. ∀z (z∈3 ↔ (z∈2 ∨ z=2)) (from 11,19)
21. ∀z (z∈3 ↔ (z=0 ∨ z=1 ∨ z=2)) (from 8,20)

We can construct 4 in pretty much the exact same way: first use pairing to construct {3} and then {3,{3}}, and then use union to construct 3⋃{3} = {0,1,2}⋃{3} = {0,1,2,3} = 4. I’ll go through this all formally one more time, more quickly than before:

22. ∀w (w∈{3} ↔ w=3) (pair 3 with 3)
23. ∀w (w∈{3,{3}} ↔ (w=3 ∨ w={3})) (pair 3 with {3})
24. ∀z (z∈4 ↔ ∃w(z∈w ∧ w∈{3,{3}})) (union {3,{3}})
25. ∀z (z∈4 ↔ (z=0 ∨ z=1 ∨ z=2 ∨ z=3)) (neatening up)

It should be clear that this process can continue indefinitely. From the ordinal 4 we can construct 5 by taking the union of 4 and {4}, and from 5 we can construct 6 = 5⋃{5}. And so on. In fact, we can define the successor of any set x as exactly the set x⋃{x}: S(x) = x⋃{x}. And using the above construction, we know that this successor set will always exist!

Wonderful! So now we have an outline for the construction of every natural number. What’s next? What comes after 0, 1, 2, 3, 4, and so on? The first infinite ordinal, ω! Just as 4 is the set {0,1,2,3}, and 10 is the set of all the ordinals before it, ω is defined as exactly the set of all previous ordinals. In other words, ω is the set of all natural numbers! ω = {0,1,2,3,…}.

Now, how can we construct ω in ZFC? Can we do it using the axioms we have so far? You might be tempted to say something like “Sure! You’ve just demonstrated a process that constructs n+1 from any n, and we know how to construct 0 already. So don’t we already have the ability to construct all the natural numbers?”

Well, hypothetical you, it’s true that we now know how to construct each natural number. But constructing the infinite set containing all natural numbers is an entirely different matter. Remember that proofs are only allowed to be finitely long! So in any proof using only the methods we’ve used so far, we can only construct finitely many natural numbers (proportional to the length of the proof). To get ω, we need something more than the axioms we have so far. Introducing: the axiom of infinity!

But before we get there, I want to construct a handy bit of shorthand which will make what comes next a lot easier to swallow. What we’ll do is write out as a first-order sentence the assertion “x is the successor of y”, as well as the sentence “x is the empty set”, and then introduce a shorthand notation for them. Trust me, it will make life a lot easier.

First, “x is the successor of y”, which we can also write as “x = y⋃{y}”. Try this for yourself before reading on! Ok, now that you’re back, here it is: ∀z (z∈x ↔ (z∈y ∨ z=y)). We’ll call this sentence Succ(x,y). So if you ever see “Succ(x,y)” in the future, read it as “x is the successor of y” and know that if we wanted to be fully formal about it we could replace it with “∀z (z∈x ↔ (z∈y ∨ z=y))”.

Good! Now, let’s do the same with the sentence “x is the empty set”, which is the same thing as “x is 0”. Try it for yourself! And now, here it is: ∀y ¬(y∈x). We’ll call this sentence isZero(x).

Now we’re ready for the axiom of infinity!

Axiom of Infinity: ∃x∀y ((isZero(y) → y∈x) ∧ (y∈x → ∃z (Succ(z,y) ∧ z∈x)))

If this axiom looks like a lot to comprehend, imagine it without our shorthand! Conceptually, what’s going on with this axiom is actually pretty simple. We’re just asserting that there exists an infinite set x that contains 0, and that is closed under the successor operation. So this set is guaranteed to contain 0, as well as the successor of 0 (1), and the successor of the successor of 0 (2), and the successor of this (3), and so on forever. (Bonus question: what does the set theoretic universe look like if we remove the axiom of infinity and add its negation as an axiom instead? What mathematical structure is it isomorphic to?)

Quiz question for you: have we now constructed ω? That is, the axiom of of infinity does guarantee us the existence of a set, but are we sure that that set is exactly the set of natural numbers and nothing more?

The answer is no. The axiom of infinity does guarantee us the existence of an infinite set, and we know for sure that this set contains all the natural numbers, but there’s nothing guaranteeing that it doesn’t also contain other sets! To actually obtain ω, we need one more axiom. This axiom will be the most powerful one we’ve seen yet: the axiom of comprehension.

Axiom of Comprehension: ∀x∃y∀z (z∈y ↔ (z∈x ∧ φ(z)))

This tells us that for any set x, we can construct a new set y, which consists of exactly the elements of x that have a certain property φ. In set-builder notation, we can write: y = {z∈x: φ(z)}. (Bonus question: why do we have to define y as the subset of x that satisfies φ? Why not just say that there exists a set of all sets that satisfy φ? This unrestricted comprehension axiom appeared in the early formalizations of set theory, but there was a big problem with it. What was it?)

You may notice that there’s something different about this axiom than the previous ones. What’s up with that symbol φ(z)? Well, φ(z) is a stand-in for any well-formed formula in the language of ZFC, so long as φ contains only z as a free variable. What that means is that there’s actually not one single axiom of comprehension, but a countably infinite axiom schema, one for each well-formed formula φ.

For instance, we have as one instance of the axiom of comprehension that ∀x∃y∀z (z∈y ↔ (z∈x ∧ z=z)). As another instance of the axiom, we have ∀x∃y∀z (z∈y ↔ (z∈x ∧ z≠z)). Both of these are pretty trivial examples: in the first case the set y is exactly the same as x (as all sets are self-identical), and in the second case y is the empty set (as no set z satisfies the property z ≠ z). But we can do the same thing for any property whatsoever, so long as it can be encoded in a sentence of first-order ZFC. (Another bonus question: One of the axioms I’ve mentioned before has now been obviated by the introduction of these new axioms. Can you figure out which it is, and produce its derivation?)

We use the axiom of comprehension to “carve ω out” from the set whose existence is guaranteed by the axiom of infinity (remember, we already know for sure that this set contains all the natural numbers, it’s just that it might contain more elements as well). So what we need is to construct a sentence φ(z) such that the only set z that satisfies the sentence is the set of all natural numbers ω.

There are several such sentences. I’ll briefly present one simple one here. Again we’ll introduce a convenient shorthand for the sake of sanity. Take a look at this sentence that we saw earlier: “∀y ((isZero(y) → y∈x) ∧ (y∈x → ∃z (Succ(z,y) ∧ z∈x)))”. What this sentence says is that x is a superset of ω (it contains 0 and is closed under successorhood). So we’ll call this sentence “hasAllNaturals(x)”.

Now, we can write the following sentence: ∀x (hasAllNaturals(x) → z∈x).

Consider what this sentence says. It tells us that z is an element of every set that contains all the naturals. But one such set is the smallest set containing all the naturals, i.e. ω! So z must be an element of ω. In other words, z is a natural number. So this sentence will do for our definition of φ(z).

φ(z): ∀x (hasAllNaturals(x) → z∈x)

Just like with hasAllNaturals(z) and Succ(x,y) and isZero(x), you should read φ(z) as simply a shorthand for the above sentence. If we really wanted to torture ourselves with formality, we could write out the entire sentence using only the allowed symbols of first-order ZFC.

Now we can finally get ω. Let’s continue with our proof progression from earlier. We left off at 25, so:

26. ∃x∀y ((isZero(y) → y∈x) ∧ (y∈x → ∃z (Succ(z,y) ∧ z∈x))) (Axiom of Infinity)
27. ∀y ((isZero(y) → y∈inf) ∧ (y∈inf → ∃z (Succ(z,y) ∧ z∈inf)))
28. ∀x∃y∀z (z∈y ↔ (z∈x ∧ φ(z))) (Axiom of Comprehension)
29. ∃y∀z (z∈y ↔ (z∈inf ∧ φ(z)))
30. ∀z (z∈ω ↔ (z∈inf ∧ φ(z)))

In going from line 26 to 27, we gave the infinite set guaranteed us by the axiom of infinity a placeholder name, “inf”. Line 30 is what we’ve been aiming for for the last few hundred words, and it honestly looks a little underwhelming. It’s not so immediately clear from this line that ω has all the properties that we want of the natural numbers. But at the same time, we couldn’t write something like “∀z (z∈ω ↔ (z=0 ∨ z=1 ∨ z=2 ∨ …)), because first-order logic is finitary (we aren’t allowed infinitely long sentences). So we have to make do with a definition of ω that may look a little more abstract that we may like. Suffice it to say that line 30 really does serve as an adequate definition of ω. It tells us that ω is the smallest set that contains all natural numbers. From this, we can pretty easily show that any particular natural number is an element of ω, and (less easily) that any other set (say, the set {2} or {5,1}) is not an element of ω.

If you’ve followed so far, give yourself a serious pat on the back. Together we’ve ascended past the realm of the finite to our first transfinite ordinal. This is no small accomplishment. But our journey does not end here. In fact, it has only barely begun. We’re going to start picking up speed from here on out, because as you’ll see, the ground we have yet to cover is much much greater than the ground we’ve covered so far.

The first step is easy. We already saw earlier how you can construct for any set x its successor set x⋃{x}. This construction didn’t rely on our sets being finite, it works just as well for the set ω. So ω has a successor! We’ll call it ω+1! Don’t believe me? I’ll prove it to you:

31. ∀x (x∈{ω} ↔ x=ω) (pair ω with ω)
32. ∀x (x∈{ω,{ω}} ↔ (x=ω ∨ x={ω})) (pair ω with {ω})
33. ∀x (x∈ω+1 ↔ ∃y(x∈y ∧ y∈{ω,{ω}})) (union {ω,{ω}})
34. ∀x (x∈ω+1 ↔ (x∈ω ∨ x=ω)) (neatening up)

There we have it! ω+1 = {0,1,2,3,…,ω}

And it doesn’t stop there: we can construct ω+2 = {0,1,2,3,…,ω,ω+1}. And ω+3 = {0,1,2,3,…,ω,ω+1,ω+2}. And so on, forever! By allowing the existence of one infinity, we’ve actually entailed the existence of an infinity of infinities!

But what’s next? We now have all the finite ordinals, and all the infinite ordinals of the form ω+n for finite n. What comes after this? Clearly, the next ordinal is just the set of all finite ordinals as well as all ordinals of the form ω+n! This ordinal is the smallest ordinal that’s larger than ω+n any finite n. So a natural name for it is ω+ω, or ω⋅2!

So conceptually ω⋅2 makes sense, but can we actually construct it? At first glance, this may seem unlikely. The axiom of infinity contains no guarantee that the infinite set it grants us contains ω, to say nothing of ω+1, ω+2, and the rest. And no finite amount of constructing successors will allow us to make the jump to ω⋅2 (for much the same reason as we needed the axiom of infinity to make the jump to ω). So perhaps we need to have a new axiom asserting the existence of ω⋅2? And then maybe we need a new axiom for ω⋅3, and ω⋅4, and so on forever! That would be a sad situation.

Well, things aren’t quite that bad. It turns out that we do need a new axiom. But we can do better than just guaranteeing the existence of ω⋅2. We’ll introduce an axiom schema that is by far the most powerful of all the axioms of ZFC. This axiom schema will take us far beyond ω⋅2, beyond ω⋅ω even, and far far beyond ω^ω and ω^ω^ω^… with infinitely many exponents. Introducing: the Axiom of Replacement! (dramatic music plays)

But first, we’re going to need to talk a little bit about the set theoretic notion of functions. I promise, it’ll be as quick as I can manage. Remember how we chose our language for ZFC to have no function symbols, only the single relation symbol ∈? This means that we have to build in functions through different means. We’ve already seen a hint of it when we talked about the sentence Succ(x,y) which said that x was the successor of y. This sentence is true for exactly the sets x and y such that x is the successor of y. We can think of the sentence as “selecting” ordered pairs (x,y) such that x = {y}. In other words, this sentence is filling the role of defining the function y ↦ {y}.

The same applies more generally. For any function f(x), we can construct the sentence F(x,y) which asserts “y = f(x)”. For instance, the identity function id(x) = x will be defined by the sentence Id(x,y): “y=x”. Notice that not all functions from sets to sets can be defined in this way, as we only have countably many sentences in our language to work with.

Suppose we’re handed a sentence φ(x,y). How are we to tell if φ(x,y) represents a function or just an ordinary sentence? Well, functions have the property that any input is mapped to exactly one output. We can write this formally: “∀x∀y∀z ((φ(x,y) ∧ φ(x,z)) → y=z)”. For shorthand, we’ll call this sentence isAFunction(φ).

And now we’re ready for the axiom schema of replacement.

Axiom of Replacement: isAFunction(φ) → ∀x∃y∀z (z∈y ↔ ∃w (w∈x ∧ φ(w,z)))

In English, this says that for any definable function f (defined by φ), and for any set of inputs x, the image of x under f exists. Like with the axiom schema of comprehension, this is a countably infinite collection of axioms, one for each well formed formula φ(w,z).

Let’s use this axiom to construct ω⋅2. First we define a function f that takes any finite number n to the ordinal ω+n. (Challenge: Try to explicitly define this function! Notice that the most obvious method for defining the function requires second-order logic. Try to come up with a trick that allows it to be defined in first-order ZFC!) Then we prove that this really is a function. Then we apply the axiom of replacement with our domain as ω (the set of all natural numbers). The image of ω under f is the set {ω,ω+1,ω+2,ω+3,…}. Not quite what we want. But we’re almost there!

Next we use the axiom of pairing to pair this newly constructed set with ω itself, giving us {{0,1,2,…}, {ω,ω+1,ω+2,…}}. And finally, we apply the axiom of union to this set, giving us {0,1,2,…,ω,ω+1,ω+2,…} = ω⋅2!

Phew, that was a lot of work just to make the jump to from ω to ω⋅2. But it was worth it! Now we can also jump to ω⋅3 in the exact same way!

Define a function f that maps n to ω⋅2+n. Now use replacement with ω as the domain to construct the set {ω⋅2, ω⋅2+1, ω⋅2+2, …}. Pair this set with ω⋅2, and union the result. This gives ω⋅3!

In exactly the same way, we can use the axiom of replacement to get ω⋅4, ω⋅5, ω⋅6, and so on to ω⋅n for any finite n! But it doesn’t stop there. We’ve just described a procedure to get ω⋅n for any finite n. So we write it as a function!

Define f as the function that maps n to ω⋅n. Now use replacement of f with ω as the domain to get the set {0, ω, ω⋅2, ω⋅3, ω⋅4, ω⋅5, …}. Apply the axiom of union to this set, and what do you get? Well it has to contain ω⋅n for every finite n, since each ω⋅n is contained in ω⋅m for every larger m. So it’s larger than all ordinals of the form ω⋅n for finite n. This new ordinal we’ve constructed is called ω⋅ω, or ω2.

But of course, our journey doesn’t stop there. We can use replacement to generate ω2 + ω, and ω2 + ω⋅2, ω2 + ω⋅3, and so on. But we can go further and use replacement to generate ω2 + ω2, or ω2⋅2! And from there we get ω2⋅3, ω2⋅4, and so on. And applying replacement to the function from n to ω2⋅n, we get a new ordinal larger than everything before, called ω3.

As you might be suspecting, this goes on forever. We can continually apply the axiom of replacement to get mind-bogglingly large infinities, each greater than the last. But here’s the kicker: all of these infinite ordinals that we’ve created so far? They all have the same cardinality.

Yes, that’s right. You may have thought that we had transcended ω in cardinality long ago, but no. For each infinite ordinal we’ve created so far, there’s a one-to-one mapping between it and ω. Think about it: every time we used replacement, we constructed a function that mapped an existing ordinal to a new one of the same cardinality. And in applying replacement to this function, we guaranteed that the new ordinal we created cannot be a larger cardinality than the existing ordinal! So each time we use replacement with a countable domain, we get a new countable set, each of whose elements is countable. And then if we use pairing or union, we’re always going to stay in the domain of the countable (unioning two countable sets just gets you another countable set, as does unioning a countable infinity of countable sets).

So now turn your mind to the following set: the set of all countable ordinals. Is this set itself an ordinal? It is if it’s the smallest uncountable ordinal, as then it contains every ordinal smaller than itself by definition! And what’s the cardinality of this set? It can’t be countable, because then it would have to be an element of itself! (Bonus question: Why is it that no ordinal can be an element of itself? Use the axiom of extensionality! Bonus question 2: In general, no set can be an element of itself. But this is not guaranteed by the axioms I’ve presented so far. The axiom that does the job is called the axiom of regularity/foundation, and it says that every set must have an element that it is disjoint with. Why does this prevent us from having sets that contain themselves?)

So this ordinal is the first uncountable ordinal. Its name is ω1. The cardinality of ω1 is the smallest cardinality that follows the cardinality of ω, and it’s called “aleph one” and written א‎1.

Now, I’ve already said that the same old paradigm we’ve used so far can’t get us to ω1. So can we get there with ZFC? It turns out that the answer is yes, and at the moment I’m not quite sure exactly how. It turns out that not only can ZFC construct ω1, it can also construct ω2 (the first ordinal with a larger cardinality than ω1), ω3, ω4, and so on forever.

So does ZFC have any limits? Or can we in principle construct every ordinal, using ever more ingenious means to transcend the limitations of our previous paradigms? The answer is no: ZFC is limited. There are ordinals so large that even ZFC cannot prove their existence (these ordinals have what’s called inaccessible cardinalities). To construct these new infinities, we must add them in as axioms, just as we had to for infinity (and indeed, for 0).

One might think that when we get to ordinals that are this mind-bogglingly large, nothing of any consequence could follow from asserting their existence. But this is not the case! Remarkably, if you add these new infinities to your theory, you can prove the consistency of ZFC. (That is, the consistency of the theory I’ve presented so far, which does not have these large cardinal axioms.) And to prove the consistency of this new theory, you must add even larger infinities. And now to prove the consistency of this one, you must expand the universe of sets again! And so on forever.

One might ask: “So how big is the universe of sets really?” At what point do we content ourselves to stop axiomatically asserting new and larger infinities, and say that we’ve obtained an adequate formalization of what the universe of sets looks like? I’m really not sure how to think about this question. Anyway, next up will be ordinal notation, and the notion of computable ordinals!

Meaning ain’t in the brain

I don’t know if there’s a name for the position that the meanings of our terms is pinned down by facts about the brain. The closest I know is semantic internalism, but a semantic internalist could think that meaning is pinned down by facts about qualia, which happen to not be facts about the brain. So I’ll make up a name for this position: call it physicalist semantic internalism.

Now, here’s an argument against physicalist semantic internalism that seems totally right to me.

What I mean by “second-order logical concepts” is the concepts of “and”, “or”, “not”, second-order quantifiers (“for all” and “for some”, ranging over not just objects but properties of objects), and the notions of functions, relations, and concepts.

  1. The semantics of second order logic captures what I mean when I use second-order logical concepts.
  2. No finite set of rules (and correspondingly no finite machine) can pin down the semantics of second order logic.
  3. So no finite machine pins down what I mean when I use second-order logical concepts.
  4. My brain is a finite machine.
  5. So my brain does not pin down what I mean when I use second-order logical concepts.

And here’s another argument along similar lines:

  1. The truth values of sentences about integers are determined by what we mean by integers.
  2. The statement of the satisfiability of each Diophantine equation has a determinate truth value.
  3. The statement of the satisfiability of each Diophantine equation is a statement about integers.
  4. So the satisfiability of each Diophantine equation is fixed by what we mean by integers.
  5. No finite machine can fix the satisfiability of each Diophantine equation.
  6. Our brain is a finite machine.
  7. So the meaning of integers is not contained in the brain.

On philosophical progress

A big question in the philosophy of philosophy is whether philosophers make progress over time. One relevant piece of evidence that gets brought up in these discussions is the lack of consensus on age old questions like free will, normative ethics, and the mind body problem. If a discipline is progressing steadily towards truth with time, the argument goes, then we should expect that questions that have been discussed for thousands of years should be more or less settled by now. After all, that is what we see in the hard sciences; there are no lingering disputes over the validity of vitalism or the realm of applicability of Newtonian mechanics.

There are a few immediate responses to this line of argument. It might be that the age old questions of philosophy are simply harder than the questions that get addressed by physicists or biologists. “Harder” doesn’t mean “requires more advanced mathematics to grapple with” here, but something more like “it’s unclear what even would count as a conclusive argument for one position or another, and therefore much less clear how to go about building consensus.” Try to imagine what sort of argument would convince you of the nonexistence of libertarian free will with the same sort of finality as a demonstration of time dilation convinces you of the inadequacy of nonrelativistic mechanics.

A possible rejoinder at this point would be to take after the logical positivists and deny the meaningfulness or at least truth-aptness of the big questions of philosophy as a whole. This may go too far; it may well be that a query is meaningful but, due to certain epistemic limitations of ours, forever beyond our ability to decide. (We know for sure that such queries can exist, due to Gödelian discoveries in mathematics. For instance, we know of the existence of a series of numbers that are perfectly well defined, but for which no algorithm can exist to enumerate all of them. The later numbers in this sequence will forever be a mystery to us, and not for lack of meaningfulness.)

I think that the roughly correct position to take is that science is largely about examining empirical facts-of-the-matter, whereas philosophy is largely about analyzing and refining our conceptual framework. While we have a fairly clear set of standards for how to update theories about the empirical world, we are not in possession of such a set of standards for evaluating different conceptual frameworks. The question of “what really are the laws governing the behavior of stuff out there” has much clearer truth conditions than a question like “what is the best way to think about the concepts of right and wrong”; i.e. It’s clearer what counts as a good answer and what counts as a bad answer.

When we’re trying to refine our concepts, we are taking into account our pre-theoretical intuitions (e.g. any good theory of the concept of justice must have something to do with our basic intuitive conception of justice). But we’re not just satisfied to describe the concept solely as the messy inconsistent bundle of intuitions that constitute our starting position on it. We also aim to describe the concept simply, by developing a “theory of justice” that relies on a small set of axioms and from which (the hope is) the rest of our conclusions about justice follow. We want our elaboration of the concept to be consistent, in that we shouldn’t simultaneously affirm that A is an instance of the concept and that A is not an instance of the concept. Often we also want our theory to be precise, even when the concept itself has vague boundaries.

Maybe there are other standards besides these, intuitiveness, simplicity, consistency, and precision. And the application of these standards is very rarely made explicit. But one thing that’s certain is that different philosophers have different mixes of these values. One philosopher might value simplicity more or less than another, and it’s not clear that one of them is doing something wrong by having different standards. Put another way, I’m not convinced that there is one unique right set of standards for conceptual refinement.

We may want to be subjectivists to some degree about philosophy, and say that there are a range of rationally permissible standards for conceptual refinement, none better than any other. This would have the result that on some philosophical questions, multiple distinct answers may be acceptable but some crazy enough answers are not. Maybe compatibilism and nihilism are acceptable stances on free will but libertarianism is not. Maybe dualism and physicalism are okay but not epiphenomenalism. And so on.

This view allows for a certain type of philosophical progress, namely the gradual ruling out of some philosophical positions as TOO weird. It also allows for formation of consensus, through the discovery of philosophical positions that are the best according to all or most of the admissible sets of standards. I think that one example of this would be the relatively recent rise of Bayesian epistemology in philosophy of science, and in particular the Bayesian view of scientific evidence as being quantified by the Bayes factor. In brief, what does it mean to say that an observation O gives evidence for a hypothesis H? The Bayesian not only has an answer to this, but to the more detailed question of to what degree O gives evidence for H. The quantity is cr(O | H) / cr(O), where cr(.) is a credence function encoding somebody’s beliefs before observing O. If this quantity is equal to 1, then O is no evidence for H. If it is greater than 1, then O is evidence for H. And if it’s less than 1, then O is evidence against H.

Not everything in Bayesian epistemology is perfectly uncontroversial, but I would argue that on this particular issue – the issue of how to best formalize the notion of scientific evidence – the Bayesian definition survives all its challenges unscathed. What are some other philosophical questions on which you think there has been definite progress?

Logic on Planet Zorko

A group of Zorkan mathematicians are sitting around having a conversation in a language that you are unfamiliar with. You are listening in with a translator. This translator is an expert in formal logic, and has decided to play the following game with you. He says:

“After listening to the full conversation, I will translate all the sentences that were said for you. But I won’t translate them into English; I want something more universal. Instead, I will choose a formal language that captures the mathematical content of all the sentences said, while leaving out the vagaries and subtleties of the Zorkan language. I will describe to you the semantics of the formal language I choose, if you don’t already know it.”

“Furthermore,” (says the translator) “I happen to be intimately familiar with Zorkan society and culture. The Zorkans are having a discussion about one particular mathematical structure, and I know which one that is. The mathematicians are all fantastically precise reasoners, such that none of them ever says a sentence that is false of the structure that they are discussing.”

(So for instance if they are talking about the natural numbers, then no mathematician will say “0 = 1”, and if they are talking about abelian groups, then no mathematician will say “∃x∃y (xy ≠ yx)”. But they could say “∃x∃y (xy ≠ yx)” if they are talking about non-abelian groups.)

You know nothing about Zorkan psychology, besides that the Zorkan way of life is so utterly foreign to you that you cannot reliably assume that the mathematical structures that come most naturally to you will also come naturally to them. It might be, for instance, that nonstandard models of arithmetic are much more intuitive to them than the natural numbers. You cannot assume that the structure they are discussing is the one that you think is “most natural”; you can only conclude this if one of them says a sentence that is true of that model and no others.

The conversation finishes, and you are tasked with answering the following two questions:

(1) What structure are they talking about?
(2) Can you come up with a verification procedure for the mathematicians’ sentences (including possible future sentences they might say on the topic)?

So, that’s the setup. Now, the question I want you to consider is the following: Suppose that the structure that the mathematicians have in mind is actually the natural numbers. Is there some conversation, any conversation at all (even allowing infinitely long conversations, and uncomputable conversations – conversations which cannot be produced as the output of any Turing machine), that the mathematicians could have, and some translation of this conversation, such that you can successfully answer both (1) and (2)? If so, what is that conversation? And if not, then why not?

✯✯✯

Let’s work out some simple examples.

Example 1

Suppose the conversation is translated into a propositional language with three atomic propositions {P, Q, R}.

Mathematician A: “P ∨ Q”
Mathematician B: “(Q ∨ R) → (¬P)”
Mathematician C: “R”

From this conversation, you can deduce that the model they are talking about is the one that assigns “False” to P, “True” to Q, and “True” to R.

M: {P is false, Q is true, R is true}

This is the answer to the question 1!

As for the second question, we want to know if there’s some general procedure that produces all the future statements the mathematicians could make. For instance, the set generated by our procedure should include (Q ∧ R) but not (Q ∧ P).

It turns out that such a procedure does exist, and is not too difficult to write out and implement.

Example 2

Take the above conversation and modify it slightly:

Mathematician A: “P ∨ Q”
Mathematician B: “(Q ∨ R) → (¬P)”
Mathematician C: “¬R”

If you work it out, you’ll see that question 1 can no longer be answered unambiguously. The problem is that there are multiple models of the sentences that the mathematicians are saying:

M1: {P is false, Q is true, R is false}
M2: {P is true, Q is false, R is false}

So even though they have one particular structure in mind, you don’t have enough information from their conversation to figure out exactly what that structure is.

Now let’s think about the answer to question 2. We don’t know whether the mathematicians are thinking about M1 or M2, and M1 and M2 differ in what truth value they assign the proposition P. So we can’t construct an algorithm that will generate the set of all their possible future statements, as this would require us to know, in particular, whether P is true or false in the model that they have in mind.

We might suspect that this holds true generally: if you can’t answer question 1, then you won’t be able to answer question 2 either. But we might also wonder: if we can answer question 1, then can we also always answer question 2?

The answer is no, as the next example will show.

Example 3

For this conversation, the translation is in second-order logic. This will allow us to talk about more interesting mathematical structures than before; namely, structures that have a domain of objects on which functions and predicates can act. In particular, we’re in a second-order language with one constant symbol “c” and one function symbol “f”. Here’s the conversation:

Mathematician A: ¬∃x (f(x) = c)
Mathematician B: ¬∃x∃y ((f(x) = f(y)) ∧ ¬(x = y))
Mathematician C: ∀R (R(c) ∧ ∀x(R(x) → R(f(x))) → ∀x R(x))

Notice that the only truly second-order sentence is the third one, in which we quantify over a predicate variable R rather than an individual variable x, y, z, …. But the second-order status of this sentence it makes it that the translator could not have possibly translated this conversation into a first-order language, much less a propositional language.

This time, questions 1 and 2 are much harder to answer than before. But if you work it out, you’ll see that there is exactly one mathematical structure that satisfies all three of the mathematicians’ statements. And that structure is the natural numbers!

So, we know exactly what structure the mathematicians have in mind. But can we also answer question 2 in the positive? Can we produce some verification procedure that will allow us to generate all the future possible sentences the mathematicians could say? Unfortunately, the answer is no. There is no sound and complete proof system for second-order logic, so in particular, we have no general algorithm for producing all the truths in this second order language. So sad.

Example 4

Now let’s move to first-order logic for our final example. The language of translation will be a first order language with a constant symbol for every natural number {0,1,2,3,…}, function symbols for ordinary arithmetic {+, ×}, and relation symbols for orders {≥}

Imagine that the conversation consists of literally all the first-order sentences in the language that are true of the natural numbers. Anything which you can say in the language, and which is true as a statement about ℕ, will be said at some point. This will obviously be a very long conversation, and in fact infinitely long, but that’s fine. It will include sentences like “0 ≠ 1”, “0 ≠ 2”, “0 ≠ 3”, and so on.  (These Zorkans are extremely thorough.)

Given this conversation, can we answer (1) and (2)? Take a guess; the answer may surprise you!

It turns out that even though we can answer (2) positively – we can actually produce an algorithm that will generate one-by-one all the possible future statements of the mathematicians (which really means all the sentences in the language that are true of the natural numbers), we cannot answer (1) positively! There are multiple distinct mathematical structures that are compatible with the entirety of true statements about natural numbers in the language. Earlier we hypothesized that any time we have a negative answer to (1), we will also have a negative answer to (2). But this is not true! We can verify all the true statements about natural numbers in the language… without even knowing that we’re actually talking about the natural numbers! This is an important and unintuitive consequence of the expressive limitations (and in particular, of the compactness) of first-order logic.

The Takeaway

We had an example where we could answer both (1) and (2) for a simple mathematical structure (a model of propositional logic). And we saw examples for natural numbers where we could answer (1) but not (2), as well as examples where we could answer (2) but not (1). But we haven’t yet seen an example for natural numbers where we had both (1) and (2). This is no coincidence!

It is actually a consequence of the theorem I proved and discussed in my last post that no such such conversation can exist. When structures at least as complicated as the natural numbers are being discussed in some language (call it L), you cannot simultaneously (1) know for sure what structure is being talked about and (2) have an algorithmic verification system for L-sentences about the structure.

A result on the incompleteness of mathematics

Suppose somebody comes up to you (keeping a healthy six-foot distance) and claims to have a formal proof system that is simultaneously

(1) sound (if x is provable from X then x is logically entailed by X)
(2) complete (if x is logically entailed by X then x is provable from X), and
(3) finite (all proofs are finitely long).

They also tell you that this proof system is in a language that is rich enough to express natural number arithmetic. Then what you can immediately conclude is that their language and semantics are insufficient to define the natural numbers. That is, there is no set of sentences X in their language such that ℕ is the only model of X. For any set of sentences you choose, there will be other non-standard models of arithmetic that are consistent with all these sentences.

I want the strength of this to sink in. Our only assumptions are syntactic in nature (our proof system being sound and complete, and proofs only being finitely long). And our result is a semantic limitation: that the structure of natural numbers are not definable by any set of sentences. And importantly, when I say “any set of sentences”, I mean any set of sentences, not just any recursively enumerable set of sentences. You could literally just define the set of all sentences in L that correspond to true statements about the natural numbers, and that would still not be enough to uniquely pin down the natural numbers! There will always be these extra non-standard interpretations of all your sentences that correspond to very different mathematical structures with bizarre properties.

Here’s an outline of the proof.

Let F be a proof system for some semantics Ω such that the following hold:

  1. F is sound.
    If X ⊢ x, then X ⊨ x.
  2. F is complete.
    If X ⊨ x, then X ⊢ x.
  3. Every proof in F is finite.

From these three assumptions we can prove compactness. Compactness is the property of a logic that says that a set of sentences X has a model if and only if every finite subset of X has a model. And from compactness, there’s a really nice little proof that ℕ is not definable.

So, our proof is in two steps: first we show compactness, then we show that compactness entails the undefinability of ℕ. Let’s go through in more detail now.

Compactness

Definition of compactness: A set of sentences X is satisfiable if and only if every finite subset of X is satisfiable.

First, briefly:

X is satisfiable
iff (by soundness and completeness)
X is consistent
iff (by finite proofs)
Every finite subset of X is consistent
iff (by soundness and completeness)
Every finite subset of X is satisfiable.

And now, in more detail:

  • X is satisfiable
    iff (by definition)
  • X has a model
    iff (by definition)
  • For no a does X ⊨ a and X ⊨ (¬a)
    iff (by soundness and completeness)
  • For no a does X ⊢ a and X ⊢ (¬a)
    iff (by finite proofs)
  • For every finite Y ⊆ X, for no a does Y ⊢ a and Y ⊢ (¬a)
    iff (by soundness and completeness)
  • For every finite Y ⊆ X, for no a does Y ⊨ a and Y ⊨ (¬a)
    iff (by definition)
  • For every finite Y ⊆ X, Y has a model
    iff (by definition)
  • For every finite Y ⊆ X, Y is satisfiable

And that’s the proof! Ok, now that we’ve established compactness, let’s establish some of its consequences (of which there are many. Compactness has massive consequences for the expressive capabilities of your logic.)

Undefinability of ℕ

Suppose that ℕ were definable in some language L where compactness holds. Then there is some set of L-sentences X such that ℕ is a model of X, and no other structures are a model of X. Intuitively, this means that the set of sentences X “pins down” the natural numbers, so that you know that when you start by assuming X, you really are talking about the natural numbers and no other mathematical structures. Now we’ll derive a contradiction.

We start by adding to L a new constant symbol, call it c. We now create a new set of sentences Y by starting with X and adding assertions that c ≠ n, for every natural number n. Y = X ∪ {c ≠ 0, c ≠ 1, c ≠ 2, …}. (The exact terms for 0, 1, 2, and so on will depend on your language; for instance in standard first order PA, Y would be X ∪ {c ≠ 0, c ≠ S0, c ≠ SS0, …}, where S is the successor function).

Now we ask: Does Y have a model? The answer is yes! To prove this we apply compactness. Y has a model as long as every finite subset of Y has a model. But any finite subset of Y only includes finitely many of the sentences in X (for which we already know ℕ is a model) as well as finitely many of the sentences of the form “c ≠ n”. But this means that no finite subset of Y asserts c to be distinct from every natural number! For any finite subset of the sentences in Y, we can always find some natural number to assign to c by just choosing one that hasn’t yet been referenced! And therefore ℕ is a model of every finite subset of Y.

So since every finite subset of Y has a model, Y itself has a model! But this model must satisfy every sentence in Y, and Y contains the assertion that c is none of the natural numbers. So in this model, there is some number that is distinct from all the natural numbers. In other words, this model is distinctly not ℕ! Let’s call it ℕ*.

Alright, so we have that the new set of sentences Y that we constructed has a non-standard model ℕ*. But what about our original set of sentences X, which was supposed to categorically describe ℕ? Unfortunately, ℕ* is also a model of X! Why? Well, X is strictly weaker than Y. We got from X to Y by adding axioms, so all that we could have done in that process is remove potential models, not add any new ones. Any model of Y has to have been a model of X from the start! So X does not uniquely describe the natural numbers.

Proof by contradiction! ℕ cannot be defined in a compact logic.

Summary

From just three simple and intuitively desirable properties of proof systems (soundness, completeness, and finite proofs), we derived compactness. And then we showed that no compact logic is capable of defining natural numbers. It turns out that from compactness you can also prove the undefinability of the collection of finite groups, of connected graphs, of the real numbers, of finite-diameter graphs, and many other structures of interest.

This result is pretty stunning in the limitations it implies for the axiomatic method of mathematics. In one sense, it’s actually stronger than the incompleteness theorems, because we never said anything about our set of sentences used to characterize the natural numbers being effectively enumerable, much less decidable. What this means is that no language that has a proof system meeting the above three requirements has the expressive capabilities to pin down the natural numbers. Even by just collecting all the true sentences of arithmetic in the language, we still don’t have enough to fully specify the structure we’re trying to talk about!

To phrase it in the converse: If you want to talk about the natural numbers, you have to be willing to accept a logic which is either unsound, incomplete, or has infinitely long proofs. And none of these seem very appealing to me! The choice is yours: if you want to speak about natural numbers, the language you use to do so will either have insufficient expressive capabilities to actually pin down the topic of conversation (such that everything you say might be interpreted as having to do with a completely different mathematical structure), OR it will have an unsatisfactory proof system (such that you have no good procedure for verifying the truth of arbitrary sentences).

A Gödelian Logic Puzzle

There’s an island on which there lives exactly two types of people: truthers and liars. Truthers always say true statements, and liars always say false statements. One day a brilliant logician comes to visit the island. The logician knows all of the above-stated facts about the island. It also happens that the logician is a perfectly sound reasoner – he never proves anything that is false.

The logician encounters an individual named ‘Jal’ that lives on the island. The logician knows that Jal lives on the island, and so is either a truther or a liar. Now, Jal makes a statement from which it logically follows that Jal is a truther. But the logician could never possibly prove that Jal is a truther! (Remember, we never asserted that the logician proves all true things, just that the logician proves only true things). What type of statement could accomplish this?

This puzzle is from a paper by Raymond Smullyan on mathematical logic. Try to answer it for yourself before reading on!

(…)

Alright, so here’s one possible answer. Jal could say to the logician: “You will never prove that I am a truther.” I claim that this sentence logically entails that Jal is a truther, and yet the logician cannot possibly prove it.

First of all, why does it entail that Jal is a truther? Let’s prove it by contradiction. Suppose that Jal is not a truther. Then, since Jal is either a truther or a liar, Jal must be a liar. That means that every statement Jal makes must be false. So in particular, Jal’s statement that “you will never prove that I am a truther” must be false. This entails that the logician must eventually prove that Jal is a truther. But we assumed that Jal isn’t a truther! So the logician must eventually prove a falsehood. But remember, we assumed that our logician’s proofs were always sound, so that he will never prove a falsehood. So we have a contradiction.

Therefore, Jal is a truther.

Now, why can the logician not prove that Jal is a truther? This can be seen very straightforwardly: we just proved that Jal is a truther, which means that all of Jal’s statements must be true. So in particular, Jal’s statement that “you will never prove that I am a truther” must be true. So in other words, it’s true that the logician will never prove that Jal is a truther!

So there you have it, a statement that appears to satisfy both of the criteria!

But now the next question I have for you is a bit trickier. It appears from the line of reasoning above that we have just proven that Jal is a truther. So why couldn’t the logician just run through that exact same line of reasoning? It appears to be perfectly valid, and to use nothing more advanced than basic predicate logic.

But if the logician does go through that line of reasoning, then he will conclude that Jal is a truther, which will make Jal’s statement false, which is a contradiction! So we’ve gone from something which was maybe just unintuitive to an actual paradox. Can you see how to resolve this paradox? (Again, see if you can figure it out yourself before reading on!)

(…)

Okay, so here’s the resolution. If we say that the logician can go through the same line of reasoning as us, then we reach a contradiction (that a truther tells a false statement). So we must deny that the logician can go through the same line of reasoning as us. But why not? As I said above, the reasoning is nothing more complicated than basic predicate logic. So it’s not that we’re using some magical supernatural rules of inference that no mortal logician could get his hands on. It must be that one of the assumptions we used in the argument is an assumption that the logician cannot use.

So look back through the argument, and carefully consider each of the assumptions we used:

First of all, why does it entail that Jal is a truther? Let’s prove it by contradiction. Suppose that Jal is not a truther. Then, since Jal is either a truther or a liar, Jal must be a liar. That means that every statement Jal makes must be false. So in particular, Jal’s statement that “you will never prove that I am a truther” must be false. This entails that the logician must eventually prove that Jal is a truther. But we assumed that Jal isn’t a truther! So the logician must eventually prove a falsehood. But remember, we assumed that our logician’s proofs were always sound, so that he will never prove a falsehood. So we have a contradiction.

In order, we made use of the assumptions that (1) Jal is either a truther or a liar, (2) every statement made by a liar is false, and (3) the logician is a sound reasoner.

I told you at the beginning that facts (1) through (2) are all known to the logician, but I did not say the same of (3)! The logician can only run through this argument if he knows that he is a sound reasoner (that he only proves true things). And this is the problem assumption, which must be rejected.

It’s not that no logician can actually ever be sound (a logician who only ever reasons in first order logic and nothing more fancy would be sound). It’s that the logician, though he really is sound, cannot know himself to be sound. In other words, no sound system can prove its own soundness!

This is very similar to Gödel’s second incompleteness theorem. The only proof system which can assert its own consistency is an inconsistent proof system, and the only type of logician that can prove his own soundness will end up being unsound. Here’s the argument that the logician might make if they believe in their own soundness:

Supposing Jal is a liar, then his statement is false, so I could eventually prove that he is a truther. But then I’d have proven something false, which I know I can never do, so Jal must not be a liar. So he must be a truther. 

Since the logician has now produced a proof that Jal is a truther, Jal’s statement is false. This means that Jal cannot be a truther, so the logician has proven a false statement!

Crazy conditionals

It’s well known that the material implication → of propositional logic does not do a perfect job of capturing what we mean when we make “if… then…” statements in English. The usual examples of failure rest on the fact that any material conditional with a false antecedent is vacuously true (so “if 2 is odd then 2 is even” turns out to be true). But over time, philosophers have come up with a whole lot of different ways in which → can catch us by surprise.

Here’s a list of some such cases. In each case, I will present an argument using if…then… statements that is clearly invalid, but which is actually valid in propositional logic if the if…then… statements are translated as the material conditional!

1. Harper

If I put sugar into my coffee, it will taste fine.
Therefore, if I put sugar and motor oil into my coffee, it will taste fine.

S → F
(S ∧ M) → F

2. Distributivity

If I pull both switch A and switch B, the engine will start.
Therefore, either the engine will start if I pull switch A or the engine will start if I pull switch B.

(A ∧ B) → S
(A → S) ∨ (B → S)

3. Transitivity

If Biden dies before the election, Trump will win.
If Trump wins the election, Biden will retire to his home.
Therefore, if Biden dies before the election, Biden will retire to his home.

B → T
T → R
B → R

4. Principle of Explosion

Either zombies will rise from the ground if I bury a chicken head in my backyard, or zombies will rise from the ground if I don’t bury a chicken head in my backyard.

(B → D) ∨ (¬B → D) is a tautology

5. Contraposition

If I buy a car, I won’t buy a Pontiac.
Therefore, if I buy a Pontiac, I won’t buy a car.

C → ¬P
P → ¬C

6. Simplification

If John is in London then he’s in England, and if he’s in Paris then he’s in France.
Therefore, either (1) if John’s in London he’s in France or (2) if John’s in Paris then he’s in England.

(L → E) ∧ (P → F)
(L → F) ∨ (P → E)

7. False Antecedent

It’s not the case that if God exists then human life is a product of random chance.
Therefore, God exists.

¬(G → C)
G

8. True Consequent

If I will have eternal life if I believe in God, then God must exist.
I do not believe in God.
Therefore, God exists.

(B → E) → G
~B
G

You can check for yourself that each of these is logically valid! Can you figure out what’s going wrong in each case?

A Dice Puzzle

Today I have a wonderfully counterintuitive puzzle to share!

You and a friend each throw a dice. Each of you can see how your own die landed, but not how your friend’s die landed. Each of you is to guess how the other’s die landed. If you both guess correctly, then you each get a reward. But if only one of you guesses correctly, neither of you get anything.

The two die rolls are independent and you are not allowed to communicate with your friend after the dice have been thrown, though you can coordinate beforehand. Given this, you would expect that you each have a 1 in 6 chance of guessing the other’s roll correctly, coming out to a total chance of 1 in 36 of getting the reward.

The question is: Is it possible to do any better?

Answer below, but only read on after thinking about it for yourself!

 

(…)

 

(…)

 

(Spoiler space)

 

(…)

 

(…)

 

The answer is that remarkably, yes, you can do better! In fact, you can get your chance of getting the reward as high as 1 in 6. This should seem totally crazy. You and your friend each have zero information about how the other die roll turned out. So certainly each of you has a 1 in 6 chance of guessing correctly. The only way for the chance of both guessing correctly to drop below 1 in 36 would be if the two guesses being correct were somehow dependent on each other. But the two die rolls are independent of one another, and no communication of any kind is allowed once the dice have been rolled! So from where does the dependence come? Sure you can coordinate beforehand, but it’s hard to imagine how this could help out.

It turns out that the coordination beforehand does in fact make a huge difference. Here’s the strategy that both can adopt in order to get a 1 in 6 chance of getting the reward: Each guesses that the others’ die lands the same way that their own die landed. So if my die lands 3, I guess that my friend’s die landed 3 as well. This strategy succeeds whenever the dice actually do land the same way. And what’s the chance of this? 6 out of 36, or 1 out of 6!

1 1       2 1       3 1       4 1       5 1       6 1
1 2       2 2       3 2       4 2       5 2       6 2
1 3       2 3       3 3       4 3       5 3       6 3
1 4       2 4       3 4       4 4       5 4       6 4
1 5       2 5       3 5       4 5       5 5       6 5
1 6       2 6       3 6       4 6       5 6       6 6

Proving the Completeness of Propositional Logic

The completeness of a logic is a really nice property to establish. For a logic to be complete, it must be that every semantic entailment is also syntactically entailed. Said more simply, it must be that every truth in the language is provable. Gödel’s incompleteness theorems showed us that we cannot have such high hopes for mathematics in general, but we can still establish completeness for some simple logics, such as propositional and first order logic.

I want to post a proof of the completeness of propositional logic here in full for future reference. Roughly the first half of what’s below is just establishing some necessary background, so that this post is fairly self-contained and doesn’t reference lemmas that are proved elsewhere.

The only note I’ll make before diving in is that the notation I(A,P) is a way to denote the smallest set that contains A and is closed under the operations in P. It’s a handy way to inductively define sets that would be enormously complicated to define otherwise. With that out of the way, here we go!

First we define the proof system for propositional logic.

Axiom 1: α→(β→α)
Axiom 2: (α→(β→γ)) → ((α→β)→(α→γ))
Axiom 3: ((¬α)→(¬β)) → (β→α)

The α, β, and γ symbols in these axioms are meant to stand for any well-formed formula. What this means is that we actually have a countable infinity of axioms that fall into the three categories above. For simplicity, I’ll keep calling them “Axioms 1, 2, and 3”, and assume you don’t find it too confusing.

You might also notice that the axioms only involve the symbols → and ¬, but neglect ∧ and ∨. This is okay because → and ¬ are adequate connectives for the semantics of propositional logic (which is to say that any truth function can be expressed in terms of them).

Axioms = {Axiom 1, Axiom 2, Axiom 3}
Deduction rule = Modus Ponens (MP)
….. MP(α, α→β) = β

The set of all provable sentences is just the set of all sentences that includes the axioms and is closed under modus ponens.

Theorems = I(Axioms, MP)

We can also easily talk about the set of sentences that can be proven from assumptions in a set Σ:

Th(Σ) = I(Axioms ∪ Σ, MP)
Notation: Σ ⊢ α iff α ∈ Th(Σ)

With that out of the way, let’s establish some basic but important results about the propositional proof system.

Monotonicity: If Σ ⊆ Σ’, then Th(Σ) ⊆ Th(Σ’).

Strong monotonicity: If Σ ⊢ Σ’, then Th(Σ’) ⊆ Th(Σ).

Intuitively, monotonicity says that if you expand the set of assumptions, you never shrink the set of theorems. Strong monotonicity says that if Σ can prove everything in Σ’, then Σ’ cannot be stronger than Σ. Both of these follow pretty directly from the definition of Th(Σ).

Soundness: If ⊢ α, then ⊨ α.
Proof by structural induction
….. Each axiom is a tautology.
….. Tautology is closed under MP (if ⊨ α and ⊨ (α→β), then ⊨ β).

Extended Soundness: If Σ ⊢ α, then Σ ⊨ α.
Proof by structural induction
….. Σ ⊨ α ∈ Axioms and Σ ⊨ α ∈ Σ.
….. If Σ ⊨ α and Σ ⊨ (α→β), then Σ ⊨ β.

Law of identity: ⊢ (α→α)
….. α→((α→α)→α), Axiom 1
….. (α→((α→α)→α))→((α→(α→α))→(α→α)), Axiom 2
….. (α→(α→α))→(α→α), MP
….. α→(α→α), Axiom 1
….. α→α, MP

Principle of Explosion: If Σ ⊢ α and Σ ⊢ (¬α), then Σ ⊢ β.
….. By strong monotonicity, it suffices to show that Σ ∪ {α} ∪ {¬α} ⊢ β.
……….. (¬α)→((¬β)→(¬α)), Axiom 1
……….. ¬α, Assumption
……….. (¬β)→(¬α), MP
……….. ((¬β)→(¬α))→(α→β), Axiom 3
.………. α→β, MP
……….. α, Assumption
……….. β, MP

And finally, our most important background theorem:

Deduction Theorem: Σ ⊢ (α→β) iff Σ ∪ {α} ⊢ β.
Proof =>
….. Suppose Σ ⊢ (α→β).
….. By monotonicity, Σ ∪ {α} ⊢ (α→β).
….. Also, clearly Σ ∪ {α} ⊢ α.
….. So Σ ∪ {α} ⊢ β.
Proof <=
….. Suppose Σ ∪ {α} ⊢ β.
….. Base cases
………. β ∈ Axioms. (β, β→(α→β), α→β).
………. β ∈ Σ. (β, β→(α→β), α→β).
………. β = α. (⊢ (α→α), so by monotonicity Σ ⊢ (α→α)).
….. Inductive step
………. Suppose Σ ⊢ (α→γ) and Σ ⊢ (α→(γ→𝛿)).
………. By strong monotonicity, suffices to show Σ ∪ {α→γ, α→(γ→𝛿)} ⊢ (α→𝛿)
……………. (α→(γ→𝛿)) → ((α→γ)→(α→𝛿)), Axiom 2
……………. α→(γ→𝛿), Assumption
……………. (α→γ)→(α→𝛿), MP
……………. (α→γ), Assumption
……………. (α→𝛿). MP

Now, let’s go into the main body of the proof. The structure of the proof is actually quite similar to the proof of the compactness theorem I gave previously. First we show that every consistent set of sentences Σ has a maximally consistent extension Σ’. Then show that Σ’ is satisfiable. Now since Σ’ is satisfiable and it’s an extension of Σ, Σ must also be satisfiable. From there it’s a simple matter to show that the logic is complete.

So, let’s define some of the terms I just used.

Σ is consistent iff for no α does Σ ⊢ α and Σ ⊢ (¬α)
….. Equivalently: iff for some α, Σ ⊬ α

Σ is maximally consistent iff Σ is consistent and for every α, either Σ ∪ {α} is inconsistent or Σ ⊢ α.

One final preliminary result regarding consistency before diving into the main section of the proof:

If Σ is satisfiable, then Σ is consistent.
Proof
….. Suppose Σ is inconsistent.
….. Then there’s an α such that Σ ⊢ α and Σ ⊢ (¬α).
….. By extended soundness, Σ ⊨ α and Σ ⊨ (¬α).
….. So Σ is not satisfiable.

This is the converse of the result we actually want, but it’ll come in handy. Now, let’s begin to construct our extension!

Any consistent Σ can be extended to a maximally consistent Σ’
….. Choose any ordering {αn} of well-formed-formulas.
….. Define Σ0 = Σ.
….. Σn+1 = Σn if Σn ⊢ (¬αn+1), and Σn ∪ {αn+1} otherwise.
….. For each n, (i) Σn is consistent and (ii) either Σn ⊢ αn or Σn ⊢ (¬αn)
……….. Base case: Σ0 is consistent by assumption, and (ii) doesn’t apply.
……….. Inductive step: Suppose Σn satisfies (i) and (ii). Two cases:
…………….. If Σn ⊢ (¬αn+1), then Σn+1 = Σn. Clearly consistent and satisfies (ii).
…………….. If Σn ⊬ (¬αn+1), then Σn+1 = Σn ∪ {αn+1}. Clearly satisfies (ii), but is it consistent?
………………….. Suppose not. Then Σn+1 ⊢ (¬αn+1), by explosion.
………………….. So Σn+1 ∪ {αn+1} ⊢ (¬αn+1).
………………….. So Σn+1 ⊢ (αn+1 → (¬αn+1)).
………………….. ⊢ ((α→¬α)→¬α), so Σn+1 ⊢ (¬αn+1). Contradiction!

….. Define Σ’ = ∪ Σn. Σ’ is maximally consistent.
……….. Maximality
…………….. Suppose not. Then for some αn, Σ’ ⊬ αn and Σ’ ∪ {αn} is consistent.
…………….. But Σn ⊆ Σ’, and either Σn ⊢ αn or Σn ⊢ (¬αn).
…………….. If Σn ⊢ αn, by monotonicity Σ ⊢ αn. Contradiction. So Σn ⊢ (¬αn).
…………….. By monotonicity, Σ ⊢ (¬αn), so Σ ∪ {αn} ⊢ (¬αn).
…………….. But Σ ∪ {αn} ⊢ αn. So Σ ∪ {αn} is inconsistent. Contradiction!
……….. Consistency
…………….. Suppose Σ’ is inconsistent. Then for some α, Σ’ ⊢ α and Σ’ ⊢ (¬α).
…………….. So there are proofs of α and (¬α) from Σ’.
…………….. Proofs are finite, so each proof uses only a finite number of assumptions from Σ’.
…………….. So we can choose an n such that Σn contains all the needed assumptions.
…………….. Now both proofs from Σ’ are also proofs from Σn.
…………….. So Σn ⊢ αn and Σ’ ⊢ (¬αn).
…………….. So Σn is inconsistent. Contradiction!

Alright, we’re almost there! So now we have that for any consistent Σ, there’s an extension Σ’ that is maximally consistent. We’ll take it a little further and prove that not only is Σ’ maximally consistent, it’s also complete! (This is the purely syntactic sense of completeness, which is that for every sentence α, either Σ’ proves α or refutes α. This is different from the sense of logical completeness that we’re establishing with the proof.)

Σ’ is complete.
….. Σn ⊆ Σ’, and Σn ⊢ αn or Σn ⊢ (¬αn).
….. So by monotonicity Σ’ ⊢ αn or Σ’ ⊢ (¬αn).

Now we have everything we need to show that Σ’, and thus Σ, is satisfiable.

If Σ is consistent, then Σ is satisfiable.
Proof
….. Let Σ’ be a maximally consistent extension of Σ.
….. Define vΣ’(p) over propositional variables p:
….. VΣ’(p) = T if Σ’ ⊢ p and F if Σ’ ⊬ p
….. ṼΣ’(α) = T iff Σ’ ⊢ α
……….. Base case: Let α be a propositional variable. Then ṼΣ’(α) = T iff Σ’ ⊢ α by definition of VΣ’.
……….. Inductive steps:
……….. (¬α)
…………….. If Σ’ ⊢ (¬α), then by consistency Σ’ ⊬ α, so ṼΣ’(α) = F, so ṼΣ’(¬α) = T.
…………….. If Σ’ ⊬ (¬α), then by completeness Σ’ ⊢ α. So ṼΣ’(α) = T, so ṼΣ’(¬α) = F.
……….. (α→β)
…………….. Suppose Σ’ ⊢ (α→β). By completeness Σ’ ⊢ α or Σ’ ⊢ (¬α).
………………….. If Σ’ ⊢ α, then Σ’ ⊢ β, so ṼΣ’(β) = T, so ṼΣ’(α→β) = T.
………………….. If Σ’ ⊢ (¬α), then ṼΣ’(α) = F, so ṼΣ’(α→β) = T.
……………. Suppose Σ’ ⊬ (α→β).
………………….. By completeness Σ’ ⊢ ¬(α→β).
………………….. ⊢ (β→(α→β)), so Σ’ ⊬ β on pain of contradiction. So ṼΣ'(β) = F.
………………….. Suppose ṼΣ’(α→β) = T. Then ṼΣ’(α) = F, so Σ’ ⊢ (¬α).
………………….. ⊢ (¬α→(α→β)). So Σ’ ⊢ (α→β). Contradiction.
………………….. So ṼΣ’(α→β) = F.
….. So vΣ’ satisfies Σ’.
….. Σ ⊆ Σ’, so vΣ’ satisfies Σ.
….. So Σ is satisfiable!

Now our final result becomes a four-line proof.

If Σ ⊨ α, then Σ ⊢ α.
Proof
….. Suppose Σ ⊬ α.
….. Then Σ ∪ {¬α} is consistent.
….. So Σ ∪ {¬α} is satisfiable.
….. So Σ ⊭ α.

And we’re done! We’ve shown that if any sentence α is semantically entailed by a set of sentences Σ, then it must also be provable from Σ! If you’ve followed this proof all the way, pat yourself on the back.

With the Completeness Theorem in hand, the proof of the Compactness Theorem goes from several pages to a few lines. It’s so nice and simple that I just have to include it here.

If Σ is finitely satisfiable, then Σ is satisfiable.

….. Suppose Σ is not satisfiable.
….. Then Σ is not consistent.
….. So there is some α for which Σ ⊢ α and Σ ⊢ (¬α).
….. Since proofs are finite, there must be some finite subset Σ* of Σ such that Σ* ⊢ α and Σ* ⊢ (¬α).
….. By soundness, Σ* ⊨ α and Σ* ⊨ (¬α).
….. So Σ* is not satisfiable!

In other words, if Σ is not satisfiable, then there’s some finite subset of Σ that’s also not satisfiable. This is the Compactness Theorem! Previously we proved it entirely based off of the semantics of propositional logic, but now we can see that it is also provable as a consequence of the finite nature of our proof system!

Sum and Product Puzzle

X and Y are two integers.

X < Y
X > 1
Y > 1
X + Y < 100

S and P are two perfect logicians. S knows X + Y, and P knows X × Y.

Everything I’ve just said is common knowledge. S and P have the following conversation:

S: “P, you don’t know X and Y”
P: “Now I do know X and Y!”
S: “And now so do I!”

What are X and Y?

Once you figure out that, here’s a question: If instead of saying that X + Y < 100, we say X + Y < N, then what’s the range of values of N for which this puzzle has a unique solution?