The subtlety of Gödel’s second incompleteness theorem

Gödel’s second incompleteness theorem is an order of magnitude more subtle than his first. It’s commonly summarized as “no consistent theory strong enough to do arithmetic can prove its own consistency.” But there’s a lot of subtlety in both the “strong enough to do arithmetic” and the “prove its own consistency.” First of all, what exactly counts as strong enough to do arithmetic? Peano arithmetic certainly does, but it has an infinite axiom schema. Do any finite theories meet the criterion of “strong enough to do arithmetic”? It urns out that the answer to this is yes! Robinson arithmetic, which is what you get if you remove the infinite axiom schema of induction from PA, meets the requirement.

There are weaker theories of arithmetic, like Presburger arithmetic (which has only addition) and Skolem arithmetic (only multiplication), that don’t meet the criterion, and are therefore not subject to the incompleteness theorems. And it turns out that both of these theories are actually decidable! This is even stronger than being sound and complete; soundness and completeness tell us that there’s an algorithm that determines after a finite amount of time that a given sentence is a tautology, but not necessarily that such an algorithm exists to determine that a sentence is NOT a tautology. Decidability gives us both: not only can we classify any tautology as such in a finite time, we can also classify any non-tautology as such in a finite time.

The amount of arithmetic we need is exactly the amount that Gödel uses to prove the incompleteness theorems. This has two parts: one, the theory must express enough arithmetic to do Gödel encoding (and therefore to express the notion of “provability”), and two, the theory must be able to formalize diagonalization. Dan Willard came up with theories that formalize enough arithmetic to do the first but not the second: these are theories that can talk about their own provability via Gödel coding, but are still too weak to be subject to the incompleteness theorems. Thus, these theories can actually prove their own consistency! These fascinating theories are called self-verifying theories.

Everything I’ve said so far has been about the subtlety of the notion of “strong enough to do arithmetic”. Now I want to talk about the subtlety of the notion of “proving one’s own consistency.” I’ll do this by first taking a brief interlude to talk about an argument I recently saw against the Consistency Thesis in philosophy of math that uses this notion. A friend of mine writes:

Consistency is a weak soundness property.

Here’s what I’ll call the Consistency Thesis: (Mathematical sentences are justified and/or true when they are part of a consistent theory.) What I want to do is to raise a problem for that thesis. Maintaining the Consistency Thesis leads to bizarre mathematical beliefs which don’t lend themselves to being true or justified. A merely ‘consistent’ theory can claim P(0), P(1), and so on, and yet claim that there’s some n for which ¬P(n). Such a theory is omega-inconsistent.

There are mathematical theories which despite their syntactic consistency, claim to be inconsistent. They will claim to be able to derive that 0=1 and yet never derive 0=1. One example of a consistent but unsound theory would be this: [suppose we add to ZFC the new axiom “ZFC is inconsistent”. If ZFC is consistent, then the new theory ZFC’ is consistent (by Godel’s second incompleteness theorem), even though it falsely proves that ZFC is inconsistent.]

So far, I’ve only mentioned omega-inconsistent theories. I should also mention that there are omega-consistent theories which are arithmetically unsound. A much stronger soundness property would be soundness in omega-logic, which entails consistency, omega-consistency, and arithmetical soundness.

Here’s the response I gave:

You say that an adherent to the Consistency Thesis believes in mathematical theories that are in fact consistent (no contradictions can be proved from them) but claim to be inconsistent; for instance ZFC + ¬Con(ZFC). This bit about “claiming to be inconsistent” is subtler than it might initially seem. There’s a very important difference between Con(ZFC) and “ZFC is consistent”. A first order theory can’t talk directly about its own consistency, it can only talk about properties of the objects in its models. We are allowed an indirect method to talk about consistency of theories, Gödel encoding. But this method has problems.

Gödel encoding allows us to write down statements that, if understood to be about the natural numbers, are equivalent to the assertion that a theory proves a contradiction. But this “if understood to be about the natural numbers” is a very important qualification, because no first order theory categorically defines the natural numbers (i.e. no first order theory has as its only model the natural numbers). More generally, no theory within a logic that has a sound, complete, and finitary proof system categorically describes the natural numbers (these are the only logics that a Formalist will see as well-defined, by the way).

What this means is that when we write “Con(ZFC)”, we’re actually using a short-hand for a complicated sentence about the objects in our models, and this complicated sentence is NOT equivalent to the claim that no contradiction can be proven from ZFC. Con(ZFC) could be false in a model even if ZFC is consistent, and Con(ZFC) could be true in a model even if ZFC is inconsistent, so long as that model is not the standard natural numbers.

So the adherent of the Consistency Thesis is not actually committed to weird beliefs about a theory being consistent and claiming its own consistency; they are just committed to the belief that the natural numbers are not well-defined.

The same objection applies to the claim that they have to accept as valid theories that claim P(0), P(1), and so on, but also that there’s some n for which ¬P(n). That’s true, and that’s fine! One can just say that such theories are not theories of the standard natural numbers; the n for which ¬P(n) is some other type of mathematical object that is not a natural number.

A TL;DR for my response: “Con(T)” only means “T is consistent” if T is about the natural numbers. Furthermore, the theories that assert their own inconsistency never have the natural numbers as a model. So it’s ultimately not very weird that these theories assert “¬Con(T)”… this statement doesn’t actually mean “T is inconsistent” in any of the models of the theory!

The Anti-Set Program

In ZFC set theory, we specify a collection of sentences within a first-order language to count as our axioms. The models of this collection of sentences are the set-theoretic universes (and many of these models are “unintended” – pesky perversions in which the set of naturals ω is uncountably large, as one example – but we’ll put this aside for this post). Most of the axioms act as constraints that all sets must follow. For instance, the axiom of pairing says that “For any sets x and y, there must exist another set containing as elements just x and y and nothing else”. This is an axiom that begins with universal quantification over the sets of the universe, and then states some requirement that must hold of all these sets.

The anti-set program is what you get when you take each of these restriction axioms, and negate the restriction it imposes on all sets. So, for instance, the axiom of anti-pairing says that for any sets x and y, there must NOT exist any set {x, y}. Contrast this with the simple negation of pairing, which would tell us only that there exist two sets x and y such that their pair doesn’t exist. The anti-axiom is much stronger than the negated axiom, in that it requires NO pairs to exist.

Not all axioms begin with universal quantifiers, in particular the axiom of infinity, which simply asserts that a set exists that satisfies a certain property. To form the axiom of anti-infinity, we simply negate the original axiom (so that no sets with that property exist).

As it turns out, the anti-set program, if applied to ALL the axioms of ZFC, ends in disaster and paradox. In particular, a contradiction can be derived from anti-comprehension, from anti-replacement, and from anti-extensionality. We don’t handle these cases the same way. Anti-comprehension and anti-replacement are simply discarded, being too difficult to patch. By contrast, anti-extensionality is replaced by ordinary extensionality. What’s up with that? The philosophical justification is simply that extensionality, being the most a priori of the bunch, is needed to justify us calling the objects in our universe sets at all.

There’s one last consideration we must address, which regards the axiom of anti-choice. I am currently uncertain as to whether adding this axiom makes the theory inconsistent. One thing that is currently known about the axiom of anti-choice is that with its addition, out go all finite models (there’s a really pretty proof of this that I won’t include here). In the rest of this post, I will be excluding anti-choice from the axioms and only exploring models of anti-ZF.

With that background behind us, let me list the axioms of anti-ZF.

Anti-Pairing
∀x∀y∀z∃w (w ∈ z ∧ w ≠ x ∧ w ≠ y)
No set is the pair of two others.

Anti-Union
∀x∀y∃z (z ∈ y ∧ ¬∃w (z ∈ w ∧ w ∈ x))
No set is the union of another.

Anti-Powerset
∀x∀y∃z (z ∈ y ∧ ∃w (w ∈ z ∧ w ∉ x))
No set contains all existing subsets of another.

Anti-Foundation
∀x∀y (y ∈ x → ∃z (z ∈ x ∧ z ∈ y))
Every set’s members have at least one element in common with it.

Anti-Infinity
∀x∃y ((y is empty ∧ y ∉ x) ∨ (y ∈ x ∧ ∃z (z = S(y) ∧ z ∉ x)))
Every set either doesn’t contain all empty sets, or has an element whose successor is outside the set.

If the axioms of ZF are considered to be a maximally nice setting for mathematics, then perhaps the axioms of anti-ZF can be considered to be maximally bad for mathematics.

We need to now address some issues involving the axiom of anti-infinity. First, the abbreviations used: “y is empty” is shorthand for “∀z (z ∉ y)” and “z = S(y)” is shorthand for “∀w (w ∈ z ↔ (w ∈ y ∨ w = y))” (i.e. z = y ⋃ {y}).

Second, it turns out that from the other axioms we can prove that there are no empty sets. So the first part of the disjunction is always false, meaning that the second part must always be true. Thus we can simplify the axiom of anti-infinity to the following statement, which is logically equivalent in the context of the other axioms:

Anti-Infinity
∀x∃y (y ∈ x ∧ ∃z (z = S(y) ∧ z ∉ x))
Every set has an element whose successor is outside the set.

Let’s now prove some elementary consequences of these axioms.

Theorems

>> There are no empty sets.
Anti-Union ⊢ ¬∃x∀y (y ∉ x)
Suppose ∃x∀y (y ∉ x). Call this set ∅. So ∀y (y ∉ ∅) Then ⋃∅ = ∅. But this implies that the union of ∅ exists. This contradicts anti-union.

>> There are no one-element sets.
Anti-Pairing ⊢ ¬∃x∃y∀z (z ∈ x ↔ z = y)
Suppose that ∃x∃y∀z (z ∈ x ↔ z = y). Then ∃x∃y∀z (z ∈ x ↔ (z = y ∨ z = y)). But this is a violation of anti-pairing, as then x would be the pair of y and y. Contradiction.

>> There are no two-element sets.
Anti-Pairing ⊢ ¬∃x∃y∃z∀w (w ∈ x ↔ (w = y ∨ w = z))
Suppose that ∃x∃y∃z∀w (w ∈ x ↔ (w = y ∨ w = z)). Then w is the pair of y and z, so anti-pairing is violated. Contradiction.

>> No models of anti-ZF have just N-element sets (for any finite N).
Suppose that a model of anti-ZF had only N-element sets. Take any of these sets and call it X. By anti-infinity, X must contain an element with a successor that is outside the set. Call this element Y and its successor S(Y). Y cannot be its own successor, as then its successor would be inside the set. This means that Y ∉ Y. Also, Y is an N-element set by assumption. But since Y ≠ S(Y), S(Y) must contain all the elements of Y in addition to Y itself. So S(Y) contains N+1 elements. Contradiction.

>> There is no set of all sets.
Suppose that there is such a set, and call it X. By Anti-Infinity, X must contain an element Y with a successor that’s outside of X. But no set is outside of X, by assumption! Contradiction.

>> No N-element model of anti-ZF can have N-1 sets that each contain N-1 elements.
Suppose that this were true. Consider any of these sets that contain N-1 elements, and call it X. By the same argument made two above, this set must contain an element whose successor contains N elements. But there are only N sets in the model, so this is a set of all sets! But we already know that no such set can exist. Contradiction.

>> Every finite set of disjoint sets must be its own choice function.
By a choice function for a set X, I mean a set that contains exactly one element in common with each element of X. Suppose that X is a finite set of disjoint sets. Let’s give the elements of X names: A1, A2, A3, …, AN. By anti-foundation, each An must contain at least one element of X. Since the Ans are disjoint, these elements cannot be the same. Also, since there are only N elements of X, no An can contain more than one element of X. So each element of X contains exactly one element of X. Thus X is a choice function for X!

The next results come from a program I wrote that finds models of anti-ZF of a given size.

>> There are no models of size one, two, three or four.

>> There are exactly two non-isomorphic models of size five.

Here are pictures of the two:

Now, some conjectures! I’m pretty sure of each of these, especially Conjecture 2, but haven’t been able to prove them.

Conjecture 1: There’s always a set that contains itself.

Conjecture 2: There can be no sets of disjoint sets.

Conjecture 3: In an N-element model, there are never less than three sets with fewer than N-1 elements.

Fundamentals of Logic: Syntax, Semantics, and Proof

A while back, I wrote a several-part introduction to mathematical logic. I wrote this when I was first wading into the subject and noticed that a lot of the stuff I was learning was not easy to find online in a concise beginner-level essay. Since then, I’ve learned an enormous amount more and become a lot more comfortable with the basics of the field. Looking back on my introduction series, I’m pretty happy with it. There’s nothing in there that’s incorrect. But I do notice that when I read it over, I wish that I had framed certain things slightly differently. In particular, one of the distinctions that I think is absolutely central to mathematical logic – the distinction between syntax, semantics, and proof – is muddled a bit in the write-up, and proof is bundled in with syntax in a way that I’m not totally satisfied with.

Given that, and given that this series is for whatever reason now one of the most popular reads on my blog, I want to provide a brief update post. I sort of did this a week or so ago, with this post, which linked to the slides for a presentation I gave for a discussion group. But I want to do it for REAL now, as an actual post of its own. And honestly, I also just love this subject and am happy for any excuse to write about it more.

So, let me begin by laying out what seems to me to be the primary trichotomy that you’ll see pop up over and over again in logic: syntax, semantics, and proof. One at a time, then.

Syntax

The syntax of a logic is a detailing of the allowed symbols of a language and the conditions for their grammatical use. It’s a set of characters, along with a function that defines the grammar of the language by taking in (finite, in most logics) strings of these characters and returning true or false. This function is usually inductively defined. So for instance in propositional logic we have some set of characters designated to be our propositional variables (we’ll denote them p, q, r, …), and another set of characters designated to be our logical symbols (∧, ∨, ¬, →, and parentheses). We inductively define our grammar function G as follows: Every propositional variable is grammatical. If X and Y are grammatical strings, then so are (¬X), (X ∧ Y), (X ∨ Y), and (X → Y). And no other string is grammatical. This definition is what allows us to say that (p ∧ (¬(q ∨ r))) is grammatical but (p ∧ (¬(q ∨ ¬))) is not. The grammar for propositional logic is especially nice and simple, and first-order logic is only mildly more complicated. The important thing is that the syntax for these logics as well as higher order logics is algorithmically checkable; it’s possible to write a simple program that verifies whether any given input string counts as grammatical or not.

Semantics

The semantics of a logic is a system for assigning some sort of meaning to the grammatical strings of the language. There are different types of meanings that might be assigned, but the primary one for classical logic is truth and falsity. For propositional logic, our semantics is a set of truth functions, which are functions that take in grammatical strings and return either true or false (you’ve encountered these before if you’ve seen truth tables; each row in a truth table corresponds to a particular truth function). Not just any such function will do; these functions will have to satisfy certain constraints, such as that whatever a truth function assigns to a string X, it must assign the opposite value to the string (¬X), and that (A ∧ B) is assigned true if and only if both A and B are assigned true. These constraints on our functions are really what endow our strings with meaning; they induce a sort of structure on the truth values of strings that lines up with our intended interpretation of them.

For propositional logic, we give an inductive definition for our collection of valid truth functions. We construct a truth function by first doing any assignment of truth values to the propositional variables (p, q, r, and so on), and then defining what the function does to the rest of the strings in terms of these truth values. So for any strings X and Y, the truth function assigns true to (¬X) if and only if it assigns false to X. It assigns true to (X ∧ Y) if and only if it assigns true to X and true to Y. And so on. By our construction of the grammar of propositional logic, we’ve guaranteed that our function is defined on all grammatical strings. Of course, there are many such truth functions (one for every way of assigning truth values to the propositional variables) – and this collection of truth functions is what defines our semantics for propositional logic.

First order logic is again more complicated than propositional logic in its semantics, but not hugely so. When talking about the semantics of first-order logic, we call its truth functions models (or structures), and each model comes with a universe of objects that the quantifiers of the logic range over. An important difference between the semantics of propositional logic and the semantics of first order logic is that in the former, we can construct an algorithm that directly searches through the possible truth functions. In the latter, getting a similar sort of direct access to the models is much more difficult. Consider, for instance, that some models will have universes with infinitely many elements, meaning that to verify the truth of a universally quantified statement requires verifying it for infinitely many elements.

Last thing I need to say about semantics before moving on to proof: there is the all-important concept of semantic entailment. A set of sentences (call it A) is said to semantically entail another sentence (call it X), if every truth function that assigns everything in A true also assigns X true. When this is the case, we write: A ⊨ X.

Examples
{(p ∧ q)} ⊨ p
{(p ∨ q), (¬q)} ⊨ p
{p, (p→q), (q→r)} ⊨ r

Proof

Notice that we defined our semantics using complicated words like “functions” and “inductive definition”. More troubling, perhaps, we seem to have engaged in a little bit of circularity when we said in our definition of the semantics that (A ∧ B) is assigned true only when A and B are both assigned true. Suppose that some pesky philosopher were to come up to us and ask us what we meant by “only when A and B are both assigned true”. What would we have to say? We couldn’t refer to our formal definition of the semantics of ∧ to ground our meaning; as this definition uses the concept of “and” in it explicitly! The circularity is even more embarrassingly vivid when in first order logic we’re asked to define the semantics of “∀”, and we say “∀x φ(x) is true whenever φ(x) is true of every object in our model”. In other words, there’s a certain sense in which our semantics is a little hand-wavy and non-rigorous. It relies on the reader already having an understanding of mathematical concepts that are usually much more advanced than the ones being defined, and indeed implicitly relies on the reader’s understanding of the ones being defined in the first place!

With this criticism of semantics fresh in our heads, we might want to ask for a purely mechanical system that would take in some starting strings and churn out new strings that follow from them according to the semantics. Such a system wouldn’t rely on the user’s understanding of any higher-order concepts; instead a mere understanding of the concept of string manipulations would suffice. That is, we want to rigorously define such a procedure so that the results of the procedure mirror the structure that we know the semantics has, but without relying on any high-level concepts or circularity. We can instead just point to the mechanical process by which strings of these symbols are manipulated and be content with that. A proof system is just such a mechanical system.

There are multiple styles of proof systems, varying primarily with respect to how many inference rules and axioms they have. On the side of many axioms and few inference rules we have Hilbert-style systems. And on the side of many inference rules and few (sometimes zero) axioms we have Gentzen-style natural deduction systems. But hold on, what are these “axioms” and “inference rules” I’m suddenly bringing up? Axioms are simply strings that can be used in a proof at any time whatsoever. Inference rules are functions that take in some set of strings and produce new ones. A proof is just an ordered list of strings, where each is either an axiom or a result of applying some inference rule to earlier strings.

For propositional logic, one common proof system involves just three axioms and one inference rule. The three: X → (Y → X), (X → (Y → Z)) → ((X → Y) → (X → Z)), and ((¬Y) → (¬X)) → (X → Y). The inference rule is the famous modus ponens, which takes the form of a function that takes as input X and X → Y, and produces as output Y.

(The discerning amongst you might notice that the “three axioms” are actually three infinite axiom schemas; for each you can populate X, Y, and Z with any three grammatical strings you like and you get a valid instance of the axioms. The even more discerning might notice that these axioms only involve the → and ¬ symbols, but we seem to be missing ∧ and ∨. That’s correct; we neglect them for economy of our proof system and are allowed to do so because all sentences with ∧ and ∨ can be translated into sentences using just → and ¬. To expand our proof system to deal with ∧ and ∨, we simply add a few more axioms to incorporate these translations.)

Soundness and Completeness

I’ve said that the purpose of a proof system is to mimic the semantics of a logic. I will now say more precisely what this mimicry amounts to.

There is a close cousin to the semantic-entailment relation (⊨). This is known as the syntactic entailment relation, and is written ⊢. We say that a set of sentences A syntactically entails a sentence X if there exists some proof that uses the sentences in A as assumptions and concludes with the sentence X. So A ⊢ X can be thought of as “X can be proven, provided that we assume A.”

To say that a proof system for a logic mimics the semantics of the logic is to say that the syntactic entailments and semantic entailments line up perfectly. This has two components:

Soundness: If A ⊢ X, then A ⊨ X.
Completeness: If A ⊨ X, then A ⊢ X.

There’s one other feature of a proof system that’s totally essential if it’s to do the job we want it to do; namely, that it should be actually computable. We should be able to write an algorithm that verifies whether any given sequence of strings counts as a valid proof according to the proof system. This is known as effective enumerability.

To have a sound, complete, and effectively enumerable proof system for a logic is to know that our semantics is on solid ground; that we’re not just waving our hands around and saying things that cannot be cashed out in any concrete way, but that in fact everything we are saying can be grounded in some algorithm. There’s a general philosophy in play here, something like “if I can compute it then it’s real.” We can rest a lot more comfortably if we know that our logic is computable in this sense; that our deductive reasoning can be reduced to a set of clear and precise rules that fully capture its structure.

Perhaps predictably, our next question is whether we really do have a sound, complete, and effectively enumerable proof system for a logic. It turns out that the answer depends on which logic is being discussed. This is already perhaps a little disconcerting; we might have hoped that every logic of any use can be nailed down concretely with a proof system. But alas, it’s not the case.

With that said, there do exist sound, complete, and effectively enumerable proof systems for propositional logic. Maybe that’s not so exciting, since propositional logic is pretty limited in terms of what sort of math you can do with it. But it turns out that sound complete and effective proof systems ALSO exist for first-order logic! And first-order logic is where we form theories like Peano arithmetic (for natural number arithmetic) and ZFC (for set theory).

In other words, there’s a logic within which one can talk about things like numbers and sets, and the semantics of this logic can be made precise with a proof system. That’s great news! Math is on sound grounds, right? Well…

You might be skeptical right now, the phrase “incompleteness theorems” bubbling into your mind, and you’re right to be. While we can form languages in first order logic whose sentences look to be describing natural numbers (in PA) or sets (in ZFC), it turns out that none of these languages have the expressive power to actually pin down these structures uniquely. Said more carefully, there is no first-order collection of sentences which has as a unique model the natural numbers. There’s a theorem called the Löwenheim-Skolem Theorem, which is one of the most shocking results in the field of mathematical logic. It says that if a first-order theory has any infinite model, then it has infinite models of every cardinality. So if we want to describe the natural numbers, we have to also be simultaneously describing structures of every cardinality. At this point, this is getting into well-covered territory for this blog, so I’ll just direct you to these posts for more detail about how this works: here, here, and here.

So, in first order logic you have a proof system that perfectly mirrors your semantics but no theory whose semantics line up with the natural numbers. There are many ways we can strengthen our semantics to give us the strength to talk about the natural numbers categorically, the most popular of which is second-order logic. In second order logic, we are allowed to quantify not only over objects in our universe, but over sets of objects as well. This turns out to be a really big deal. In second-order logic, we can actually write down a set of sentences that uniquely pick out the natural numbers as their only model. But guess what? Second-order logic has no sound and complete proof system!

This is a pattern that holds very generally. Either we have a logic with the expressive power to talk categorically about the natural numbers, in which case we have no sound and complete proof system, OR we have a logic with a sound and complete proof system, in which case we lack the ability to talk categorically about the natural numbers. No matter how we go, we end up at a loss as to how to ground the semantics of ℕ in an algorithmic procedure. And as a result, we also end up unable to have any algorithmic procedure that enumerates all the truths about natural numbers. This is what has become known as the incompleteness phenomenon.

I’ll stop there for now! Here’s a brief summary of the main concepts I’ve discussed. A logic defined by its syntax, semantics, and proof system. The syntax details the allowed symbols of the language and which combinations are grammatical. The semantics details the possible truth assignments to these sentences, consistent with our intended interpretation of the symbols. And the proof system provides a mechanical procedure for deriving new strings from old strings. A set of strings X semantically entails a string Y if every truth assignment that satisfies all the strings in X also satisfies Y. A set of strings X syntactically entails Y if there’s some proof of Y from the assumptions of X. A proof system is said to be sound and complete if its syntactic entailment relation perfectly mirrors the semantic entailment relation. And it’s said to be effectively enumerable if there’s an algorithm for determining whether a given sequence of strings counts as a proof or not. First-order logic has a sound, complete, and effectively enumerable proof system. But it lacks the expressive power to talk categorically about the natural numbers. No set of sentences in a first-order language has the natural numbers as a unique model. Second-order logic has the expressive power to talk categorically about the natural numbers, but it has no sound and complete proof system. In general, no logic has both (1) a sound, complete, and effectively enumerable proof system, and (2) theories that are categorical for the natural numbers.

The Transfinite Factorial

Chapter 1: When Circularity is not Paradoxical

Circular definitions can be quite bad. Take the definition “a is the natural number that is equal to a”, or worse, “a is the natural number that is NOT equal to a”. Or, for an even more perverse one, “a+1, where a is the smallest number that can be defined in fewer than 50 words.” But just as not all self-reference is paradoxical, similarly not all circularity is vicious. Circular definitions form a crucial component of every area of math, where they go under the more polite term “recursion.”

The key component of a benign circular definition is that it avoids infinite regress. So “a=a” is unhelpful, because to evaluate a you must first know what a evaluates to. But what about “an = n ⋅ an-1, and a0 = 1″? Now to know a500, you only need to know a499, which in turn only requires you know a498, which requires… and so on, each step moving a little closer to the base case “a0 = 1″, at which point you can travel all the way back up the chain to discover that a500 = 500!. (That’s 500 factorial, not an excitable 500.)

Douglas Hofstadter has a parable about the “porpuquine”, a rare cousin of the porcupine that grows smaller porpuquines on its back rather than quills. The market value of such a beast is determined by the total number of porpuquine noses you can find on it. And if you’re worried about an infinite regress, don’t fear! There’s a smallest porpuquine, which is entirely bald.

Recursive definitions are extremely nice when we want to define a function whose outputs are related to each other in a more simple way than they are to the inputs. Some examples (in which S stands for the successor function, which takes 0 to 1 to 2 to …):

Addition
∀x (x + 0 = x)
∀x∀y (x + S(y) = S(x+y))

Multiplication
∀x (x ⋅ 0 = 0)
∀x∀y (x ⋅ S(y) = x⋅y + x)

Notice the similarity between these two definitions. In each case we define what happens when we add 0 to a number, and then we define what happens when we add a successor to a number. This covers all the natural numbers! Let’s show this in action by proving that 9+3 = 12.

9 + 3
9 + S(2)
S(9 + 2)
S(9 + S(1))
S(S(9 + 1))
S(S(9 + S(0)))
S(S(S(9 + 0)))
S(S(S(9)))
S(S(10))
S(11)
12

The colorful step is where we hit our base case (evaluating x+0), at which point we need merely to pass our answers up the levels to get back to the solution of our original problem.

Another: Let’s show that 3⋅2 = 6.

3⋅2
3⋅S(1)
3⋅1 + 3
3⋅S(0) + 3
(3⋅0 + 3) + 3
(0 + 3) + 3
(0 + S(2)) + 3
S(0 + 2) + 3
S(0 + S(1)) + 3
S(S(0 + 1)) + 3
S(S(0 + S(0))) + 3
S(S(S(0 + 0))) + 3
S(S(S(0))) + 3
S(S(1)) + 3
S(2) + 3
3 + 3
3 + S(2)
S(3 + 2)
S(3 + S(1))
S(S(3 + 1))
S(S(3 + S(0)))
S(S(S(3 + 0)))
S(S(S(3)))
S(S(4))
S(5)
6

Each green line is where we use a “base case rule” – the first is the base case for multiplication, the second and third for addition. I like that you can see the complexity growing in the length of the line until you hit the base case, at which point the lines start shortening until we next need to apply the recursive step.

Chapter 2: Infinite Recursion

Now what happens if we want to add x + y but y is neither 0 nor a successor. “Huh? Isn’t every non-zero number a successor of some number?” I hear you ask. Not infinity! Or more precisely, not ω. Our recursive definitional scheme is all well and good for finite numbers, but it fails when we enter the realm of transfinite ordinals.

No problem! There’s an easy patch. The problem is that we don’t know how to add or multiply x and y when y is a limit ordinal. So we just add on an extra rule to cover this case!

Addition
α + 0 = α
α + S(β) = S(α + β)
α + λ = U{α + β | β ∈ λ}

Multiplication
α ⋅ 0 = 0
α ⋅ S(β) = α⋅β + α
α + λ = U{α ⋅ β | β ∈ λ}

If you’re scratching your head right now at the notion of “the union of a set of numbers” or “a number being an element of another number”, go read this for a thorough summary. In short, in ZFC we encode each number as the set of all smaller numbers (e.g. 3 = {0, 1, 2}), and ω (the first infinite number) is defined to be the set of all natural numbers (ω = {0, 1, 2, 3, …}). So 3 ⋃ 5 = {0,1,2} ⋃ {0,1,2,3,4} = {0,1,2,3,4} = 5. In general, the union of a set of numbers is just the maximum number among them (or if there is no maximum number, then it’s the supremum of all the numbers).

Outside the context of ZFC, we can equally write α + λ = lim (α + β) as β → λ. We can also write α + λ = sup {α + β | β < λ}. And for those more familiar with ZFC, we don’t need to take the union over all β ∈ λ, we can make do with any cofinal subset (we’ll do this to simplify calculations greatly later in the post).

Now we’ve extended + and × into the infinite, and what we get is a little unusual. The first major sign that things have changed is that addition and multiplication are no longer commutative.

1 + ω = ⋃{1 + n: n ∈ ω} = U{1, 2, 3, 4, …} = ω
ω + 1 = ω + S(0) = S(ω + 0) = S(ω)

ω certainly doesn’t equal S(ω), so ω + 1 ≠ 1 + ω.

Distributivity also fails! For instance (ω + 3) ⋅ 2 = ω⋅2 + 3, not ω⋅2 + 6. But we don’t lose every nice algebraic feature. Associativity still holds for both addition and multiplication! I.e. α + (β + γ) = (α + β) + γ and α ⋅ (β ⋅ γ) = (α ⋅ β) ⋅ γ. This result is really important.

Earlier we showed that 1 + ω = ω. One can easily show that in general, n + ω = ω for any finite n, and the same holds for α + β is “infinitely smaller” than β. This notion of “infinitely smaller” is a little subtle; ω is not infinitely smaller than ω⋅2, but it is infinitely smaller than ω2. (It should be noted that everything up to here has been standard textbook stuff, but from here on I will be introducing original notions that I haven’t seen elsewhere. So treat what follows this with skepticism.)

A useful definition for us will be α << β (read as “α is infinitely less than β”) if α + β = β. We’ll also say that α and β are “comparable” if neither is infinitely smaller than the other: α ~ β if ¬(α << β) and ¬(β << α). Some conjectures I have about the comparability relation:

Conjecture 1: Commutativity holds between comparable ordinals.

Conjecture 2: α ~ β if and only if there’s some finite n such that α < β⋅n and β < α⋅n.

I’m actually fairly confident that Conjecture 1 is false, and would love to see a counterexample.

Here are some more useful identities for transfinite addition and multiplication:

Theorem 1: For finite n and limit ordinal λ, α⋅(λ + n) = α⋅λ + α⋅n

Proof sketch: α⋅(λ + n) = α⋅(λ + (n-1)) + α = α⋅(λ + (n-2)) + α + α = … = α⋅(λ + 0) + α + α + … + α = α⋅λ + α⋅n.

Theorem 2: For finite n and β << α, (α + β)⋅n = α⋅n + β.

Proof sketch: (α + β)⋅n = (α + β)⋅(n-1) + (α + β) = (α + β)⋅(n-2) + (α + β) + (α + β) = … = (α + β)⋅0 + (α + β) + … + (α + β) = (α + β) + (α + β) + … + (α + β) + (α + β) = α + (β + α) + (β + … + α) + β = α + α + α + … + α + β = α⋅n + β.

In case it’s not clear what’s going with this proof, the idea is that we expand (α + β)⋅n out to (α + β) + … + (α + β) (n times). Then we shift parentheses to enclose all the middle βs, making them look like (β + α). But then by assumption, since β << α, the β vanishes in each of these middle terms, leaving just the final one: α + α + α + … + α + β = α⋅n + β.

Infinite exponentiation, tetration and so on can be defined in the exact same way as addition and multiplication: first cover the 0 case, then the successor case, and finally the limit case. And the limit case is defined by just taking the union of all smaller cases.

Chapter 3: Transfinite Factorials

Now, what’s the first function you think of when you think of recursively defined functions? That’s right, the factorial! I mentioned it at the beginning of this post, long ago: n! = n⋅(n-1)! and 0! = 1. So let’s extend this definition to obtain a transfinite factorial!

Factorial
0! = 1
S(α)! = α! ⋅ S(α)
λ! = U{β! | β ∈ λ}

Now we can ask ω! is.

Theorem: ω! = ω

Proof: ω! = U{n! | n ∈ ω} = U{1, 2, 6, 24, 120, …} = ω (since {1, 2, 6, 24, 120, …} is a cofinal subset of ω).

Next, (ω+1)! = ω! ⋅ (ω + 1) = ω ⋅ (ω + 1) = ω⋅ω + ω⋅1 = ω2 + ω. Note that to prove this, we’ve used Theorem 1 from before.

And we can keep going from there.

Theorem: (ω + n)! = ωn+1 + ωn⋅n + ωn-1⋅(n – 1) + … + ω2⋅2 + ω.

Try to prove this for yourself!

Now that we know what (ω + n)! is for each finite n, we are able to calculate what (ω⋅2)! is (using the fact that {ω + n | n ∈ ω} is a cofinal subset of ω⋅2). I’ll write its value, as well as the next few interesting ordinals:

(ω⋅2)! = U{ω!, (ω+1)!, (ω+2)!, …} = ωω
(ω⋅3)! = ωω⋅2
(ω⋅n)! = ωω⋅(n-1)

Using the same trick as before, this enables us to solve (ω2)!

2)! = U{ω!, (ω⋅2)!, (ω⋅3)!, …} = U{ω, ωω, ωω⋅2, ωω⋅3, …} = ωω2

From here, things suddenly take on a remarkable regularity (for a while at least). Looking at limit ordinals above ω2, they appear to have the following structure, at least up until we arrive at the first ordinal where standard notation fails us: ωωω = ε0.

Conjecture: For λ a limit ordinal such that ω2 ≤ λ < ε0, λ! = ωλ

If this conjecture is right, then (ωω)! = ωωω, and (ωωω)! = ωωωω, and so on. We can also write this using tetration notation by saying that (nω)! = n+1ω.

But something different happens once we get to ε0!

Conjecture: ε0! = U{(nω)! | n ∈ ω} = U{ω, 2ω, 3ω, 4ω, …} = ε0

If this is right, then this is pretty interesting behavior! The fixed points of the factorial function are 1, ω, and ε0, at the very least. What happens after ε0? The factorial function “loops” in a certain sense! Just as (ω+1)! was ω2 + ω, so (ε0+1)! ends up being ε02 + ε0. And (ε0⋅2)! = ε0ε0, paralleling (ω⋅2)! = ωω. And so on.

Conjecture: εn is a fixed point of the factorial function for all n.

There are plenty more questions to be answered. Is ω1 another fixed point of the factorial function? What exactly does the list of all the fixed points look like? Are there any cardinals that aren’t equal to their own factorial? What exactly is the general pattern for factorials of successor ordinals? How does α! compare to αα in general?

On Self-Hating Theories of Arithmetic

Gödel’s second incompleteness theorem tells us that no (sufficiently powerful) consistent theory can prove the statement of its own consistency. But did you know that such a theory can prove the statement of its own inconsistency? A consistent theory that claims to be inconsistent is what I’ll call a self-hating theory.

My convention in what follows: ℕ refers to the real, true natural numbers, the set consisting of {0, 1, 2, 3, …} and nothing else. ω refers to the formal object that exists in a theory of arithmetic that is “supposed to be” ℕ, but (in first-order logic) cannot be guaranteed to be so.

When I write ℕ ⊨ ψ, I am saying that the sentence ψ is true of the natural numbers. When I write T ⊢ ψ (resp. T ⊬ ψ), I am saying that the sentence ψ can be (resp. can’t be) proven from the axioms of the theory T. And when I write T ⊨ ψ, I am saying that the axioms of T semantically entail the truth of ψ (or in other words, that ψ comes out true in all models of T). The next two paragraphs will give some necessary background on Gödel’s encoding, and then we’ll explore the tantalizing claims I started with.

Gödel’s breathtakingly awesome insight was that within any language that is expressive enough to talk about natural number arithmetic, one can encode sentences as numbers and talk about syntactic features of these sentences as properties of numbers. When a number n encodes a sentence ψ, we write n = ⟦ψ⟧. Then Gödel showed that you can have sentences talking about the provability of other sentences. (The next step, of course, was showing that you can have sentences talking about their own provability – sneaking in self-reference through the back door of any structure attempting to describe arithmetic.)

In particular, in any theory of natural number arithmetic T, one can write a sentence that on its surface appears to just be a statement about the properties of natural numbers, but when looked at through the lens of Gödel’s encoding, ends up actually encoding the sentence “T ⊢ ψ”. And this sentence is itself encoded as some natural number! So there’s a natural number n such that n = ⟦T ⊢ ψ⟧. It’s a short step from here to generating a sentence that encodes the statement of T’s own consistency. We merely need to encode the sentence “¬∃n (n = ⟦T ⊢ 0=1⟧)”, or in English, there’s no number n such that n encodes a proof of “0=1” from the axioms of T. In even plainer English, no number encodes a proof of contradiction from T (from which it follows that there IS no proof of contradiction from T, as any proof of contradiction would be represented by some number). We write this sentence as Con(T).

Okay, now we’re in a position to write the original claim of this post more formally. If a theory T is consistent, then ℕ ⊨ Con(T). And Gödel’s second incompleteness theorem tells us that if ℕ ⊨ Con(T), then T ⊬ Con(T). But if T doesn’t prove the sentence Con(T), then no contradiction can be derived by adding ¬Con(T) as an axiom! So (T + ¬Con(T)) is itself a consistent theory, i.e. ℕ ⊨ Con(T + ¬Con(T)). But hold on! (T + ¬Con(T)) can prove its own inconsistency! Why? Because (T + ¬Con(T)) ⊢ ¬Con(T), i.e. it proves that a contradiction can be derived from the axioms of T, and it also has as axioms every one of the axioms of T! So the same number that encodes a proof of the inconsistency of T, also counts as a proof of the inconsistency of (T + ¬Con(T))!

Summarizing this all:

ℕ ⊨ Con(T)

T ⊬ Con(T)

ℕ ⊨ Con(T + ¬Con(T)),
but
(T + ¬Con(T)) ⊢ ¬Con(T + ¬Con(T))

There we have it, a theory that is consistent but proves its own inconsistency!

Expressed another way:

T ⊢ ∃n (n = ⟦T ⊢ 0=1⟧),
but
T ⊬ 0=1

Ok, so believe it or not, a lot of the strangeness of this can be explained away by thinking about the implications of nonstandard models of arithmetic. One easy way to see this is to reflect on the fact that, as we saw above, “T is consistent” becomes in Gödel’s encoding, “There is no natural number n such that n encodes a proof of T’s inconsistency.” Or more precisely, “T is consistent” becomes “There is no natural number n such that n = ⟦T ⊢ 0=1⟧.”

Now, no first-order theory can pin down the natural numbers.

(I’ve written about this here and here.) I.e. no first order theory can express a quantification like “there is no natural number N such that …”. You can try, for sure, by defining some object ω and adding axioms to restrict its structure to look more and more like ℕ, but no matter how hard you try, no matter how many axioms you add, there will always be models of the theory in which ω ≠ ℕ. In particular, ω will be a strict superset of ℕ in all of these nonstandard models (ℕ ⊂ ω), so that ω contains all the naturals but also additional nonstandard numbers.

So now consider what happens when we try to quantify over the naturals by saying “∀x ∈ ω”. This quantifier inevitably ranges over ALL of the elements of ω in each model, so it also touches the nonstandard numbers in the nonstandard models. This means that the theory only semantically entails quantified statements that are true of all possible nonstandard numbers! (Remember, T ⊨ ψ means that ψ is true in ALL models of T.)

One nice consequence of this is that if T has a model in which ω = ℕ then in this model “∀x∈ω Φ(x)” is true only if Φ(x) is true of all natural numbers. By the completeness of first-order logic, this means that T can’t prove “∀x∈ω Φ(x)” unless it’s true of ℕ. This is reassuring; if T ⊢ ∀x∈ω Φ(x) and T has a model in which ω = ℕ, then ℕ ⊨ ∀x∈ω Φ(x).

But the implication doesn’t go the other way! ℕ ⊨ ∀x∈ω Φ(x) does not guarantee us that T ⊢ ∀x∈ω Φ(x), because T can only prove that which is true in EVERY model. So T can only prove “∀x∈ω Φ(x)” if Φ(x) is true of all the naturals and every nonstandard number in every model of T!

This is the reason that we don’t know for sure that if Goldbach’s conjecture is true of ℕ, then it’s provable in Peano arithmetic. On the face of it, this looks quite puzzling; Goldbach’s conjecture can be written as a first-order sentence and first-order logic is complete, so if it’s true then how could we possibly not prove it? The answer is hopefully clear enough now: Goldbach’s conjecture might be true of all of ℕ but false of some nonstandard models of Peano arithmetic (henceforth PA).

You might be thinking “Well if so, then we can just add Goldbach’s conjecture as an axiom to PA and get rid of those nonstandard models!” And you’re right, you will get rid of those nonstandard models. But you won’t get rid of all the nonstandard models in which Goldbach’s conjecture is true! You can keep adding as axioms statements that are true of ℕ but false of some nonstandard model, and as you do this you rule out more and more nonstandard models. At the end of this process (once your theory consists of literally all the first-order sentences that are true of ℕ), you will have created what is known as “True Arithmetic”: {ψ | ℕ ⊨ ψ}.

But guess what! At this point, have you finally ruled out all the nonstandard models? No! There’s still many many more (infinitely many, in fact! Nonstandard models of every cardinality! So many models that no cardinality is large enough to describe how many!) Pretty depressing, right? There are all these models that agree with ℕ on every first order sentence! But they are still not ℕ (most obviously because they contain numbers larger than 0, 1, 2, and all the rest of ℕ).

The nonstandard models of True Arithmetic are the models that are truly irremovable in any first-order theory of arithmetic. Any axiom you add to try to remove them will also remove ℕ as a model. And when you remove ℕ as a model, some pretty wacky stuff begins to happen.

Fully armed now with new knowledge of nonstandard numbers, let’s return to the statement I started with at the top of this post: there are consistent theories that prove their own inconsistency. The crucial point, the thing that explains this apparent paradox, is that all such theories lack ℕ as a model.

If you think about this for a minute, it should make sense why this must be the case. If a theory T is consistent, then the sentence “∀x∈ω (x ≠ ⟦T ⊢ 0 = 1⟧)” is true in a model where ω = ℕ. So if T has such a model, then T simply can’t prove its own inconsistency, as it’s actually not inconsistent and the model where ω = ℕ will be able to see that! And once more, T can only prove what’s true in all of its models.

Okay, so now supposing T is consistent (i.e. ℕ ⊨ Con(T)), by Gödel’s second incompleteness theorem, T cannot prove its own consistency. This means that (T + ¬Con(T)) is a consistent theory! But (T + ¬Con(T)) no longer has ℕ as a model. Why? Because ℕ ⊨ Con(T) and (T + ¬Con(T)) ⊨ ¬Con(T). So for any consistent theory T, (T + ¬Con(T)) only has nonstandard models. What does this mean about the things that T + ¬Con(T) proves? It means that they no longer have to be true of ℕ. So for instance, even though ℕ ⊨ Con(T + ¬Con(T)), (T + ¬Con(T)) might end up proving ¬Con(T + ¬Con(T)). And in fact, it does prove this! As we saw up at the top of this post, a moment’s examination will show that (T + ¬Con(T)) asserts as an axiom that a contradiction can be derived from the axioms of T, but also contains all the axioms of T! So by monotonicity, (T + ¬Con(T)) proves ¬Con(T + ¬Con(T)).

What do we say of this purported proof of contradiction from (T + ¬Con(T))? Well, we know for sure that it’s not a standard proof, one that would be accepted by a mathematician. I.e., it asserts that there’s some n in ω that encodes a proof of contradiction from (T + ¬Con(T)). But this n is not actually a natural number, it’s a nonstandard number. And nonstandards encode proofs only in the syntactical sense; a nonstandard proof is a proof according to Gödel’s arithmetic encoding, but Gödel’s arithmetic encoding only applies to natural numbers. So if we attempted to translate n, we’d find that the “proof” it encoded was actually nonsense all along: a fake proof that passes as acceptable by wearing the arithmetic guise of a real proof, but in actuality proves nothing whatsoever.

Summarizing:

In first order logic, every theory of arithmetic has nonstandard models that foil our attempts to prove all the truths of ℕ. Theories of arithmetic with ONLY nonstandard models and no standard model can prove things that don’t actually hold true of ℕ. In particular, since theories of arithmetic can encode statements about their own consistency, theories that don’t have ℕ as a model can prove their own inconsistency, even if they really are consistent.

So much for first order logic. What about

Second Order Logic?

As you might already know, second order logic is capable of ruling out all nonstandard models. There are second order theories that are categorical for ℕ. But there’s a large price tag for this achievement: second order logic has no sound and complete proof system!

Sigh. People sometimes talk about nature being tricky, trying to hide aspects of itself from us. Often you hear this in the context of discussions about quantum mechanics and black holes. But I think that the ultimate trickster is logic itself! Want a logic that’s sound and complete? Ok, but you’ll have to give up the expressive power to allow yourself to talk categorically about ℕ. Want to have a logic with the expressive power to talk about ℕ? Ok, but you’ll have to give up the possibility of a sound and complete proof system. The ultimate structure of ℕ remains shifty, slipping from our view as soon as we try to look closely enough at it.

Suppose that T is a second order theory that is categorical for ℕ. Then for every second-order sentence ψ that is true of ℕ, T ⊨ ψ. But we can’t make the leap from T ⊨ ψ to T ⊢ ψ without a complete proof system! So there will be semantic implications of T that cannot actually be proven from T.

In particular, suppose T is consistent. Then T ⊨ Con(T), but T ⊬ Con(T), by Gödel’s second. And since T ⊬ Con(T), (T + ¬Con(T)) is consistent. But since T ⊨ Con(T), (T + ¬Con(T)) ⊨ Con(T). So (T + ¬Con(T)) ⊨ Con(T) ∧ ¬Con(T)!

In other words, T + ¬Con(T) actually has no model! But it’s consistent! There are consistent second-order theories that are actually not logically possible – that semantically entail a contradiction and have no models. How’s that for trickiness?

Updated Introduction to Mathematical Logic

I’ve noticed that some of my posts that have gotten the most hits are my introductions to mathematical logic. Overall, I’m pretty happy with that series of posts, though I think since then I’ve come up with clearer and more precise ways of talking about the primary distinctions: syntax, semantics, and proof systems.

A little while ago, I gave a talk on the subject of “true but unprovable” statements. The goal of the talk was just to explore this notion and explain what exactly it means to a logician. In the course of developing the talk, I ended up re-summarizing the basics of mathematical logic. I think this summary makes a good “companion piece” to my earlier introductory series, as well as being more standard in its presentation. The most important slide is this one:

Here’s the link to the full slideshow: True but Unprovable

And here you can look through it on this blog (but some of the formatting got a little wacky in the embedding process).

Finiteness can’t be captured in a sound, complete, finitary proof system

Consider the sentence “This blog has finitely many posts.” Do you understand what this sentence means? Then no set of rules (even infinite, even uncomputable!) can model your reasoning. This claim may sound shocking, but it can be justified on solid metamathematical grounds.

Another example: the sentence “There are finitely many planets in the universe.” You don’t have to think it’s true, you just have to think you understand what it means. What’s the common theme? It’s the notion of there being ‘finitely many’ of some class of objects. Let’s imagine building a language that has the expressive resources of first-order logic (which are quite modest), plus an additional quantifier F, whose semantics are given by the following rule: Fx φ(x) is satisfied by a model iff there are only finitely many objects in that model that satisfy φ(x).

It turns out that any language consisting of first order logic plus the quantifier F can’t be axiomatized in any sound, complete, and finitary proof system. Notice that effective enumerability of the rules of the proof system is not a requirement here! So long as the language is strong enough to express the semantics of {∧, ¬, ∀, F, variables xn, and relations Rn}, no set of sentences and sentence-manipulation rules in that language will suffice to capture these semantics.

Here’s a proof: consider the first-order theory of Peano arithmetic. This theory has nonstandard models (as any theory of arithmetic must have in a logic that is compact). All of these nonstandard models have the following feature: that there are numbers that are larger than infinitely many numbers. Think about what this means: this is a common feature of all nonstandards, so if we could write a sentence to express this feature then we could rule them all out! This is where the quantifier F steps in. With F, we can write the following simple sentence and add it to PA as an axiom:

∀x Fy (y < x)

In English: every number has finitely many numbers less than it. And with that, we’ve ruled out all nonstandard models! So now we have a theory that is categorical for ℕ. And that’s a big deal, metamathematically speaking!

Why? Well, as I’ve talked about in a previous post, you can prove some pretty strong limitative results about any logic that can produce a theory that’s categorical for ℕ. In particular, if we can produce such a theory then its logic cannot be compact. Quick proof: suppose a set of sentences Σ models ℕ. Add to Σ a constant c and the axioms “c ≠ 0”, “c ≠ 1”, “c ≠ 2”, and so on, and call this new set Σ’. Every finite subset of Σ’ models ℕ. So by compactness, Σ’ has a model. But this model is nonstandard – it contains numbers that aren’t natural numbers. And since Σ is a subset of Σ’, any model of Σ’ is also a model of Σ.

So compactness implies that no theory is categorical for ℕ. But compactness follows from the following three properties: a sound and complete proof system (Σ ⊢ α if and only if Σ ⊨ α), and that all proofs are only finitely long (try expressing this property without F!). Quick proof: If a set of sentences is finitely satisfied, then every finite subset of it has a model (by definition), so no finite subset of it can be refuted (by soundness), so the entire set can’t be refuted (by finite proofs), so the entire set is satisfied (by completeness).

So soundness + completeness + finiteness ⇒ compactness ⇒ the existence of nonstandard models of arithmetic in any theory that models ℕ. Which means that the semantics of F cannot be captured in any sound, complete, and finite proof system!

Take your pick: either you don’t really understand the semantics of the “finitely many” quantifier F, or no set of rules (not even requiring this set to be finite or computable) can fully capture your reasoning in finite-length proofs.

More information about related extensions of first-order logic and their limitations can be found here. The result I describe here is a rephrasing of results discussed there.

Immoral or Inconsistent?

In front of you sits a button. If you press this button, an innocent person will be subjected to the worst possible forms of torture constantly for the next year. (There will be no “resistance” built up over time – the torture will be just as bad on day 364 as it was on day 1.) If you don’t press it, N people will receive a slight pin-prick in their arm, just sharp enough to be noticeably unpleasant.

Clearly if N is something small, like, say, 10, you should choose the pin-pricks rather than the torture. But what about for very very large N? Is there any N so large that you’d switch to the torture option? And if so, what is it?

I’m going to take it as axiomatic that no value of N is high enough that it’s worth inflicting the year long torture. Even if you were faced with the choice between year long torture for one and momentary pin-prick for a literal infinity of people, the right choice would be the pin-pricks.

I feel safe in taking this as axiomatic for three main reasons. Firstly, it’s the obvious answer to the average person, who hasn’t spent lots of time thinking about normative ethics. (This is just in my experience of posing the problem to lay-people.)

Secondly, even among those that choose the torture, most of them do so reluctantly, citing an allegiance to some particular moral framework. Many of them say that they think that it’s technically what they should do, but that in reality they probably wouldn’t, or would at least be extremely hesitant to.

Thirdly, my own moral intuitions deliver a clear and unambiguous judgement on this case. I would 100% choose an infinity of people having the pin-pricks, and wouldn’t feel the tiniest bit of guilt about it. My intuition here is at the same level of obviousness as something like “Causing unhappiness is worse than causing happiness” or “Satisfying somebody’s preferences is a moral good, so long as those preferences don’t harm anybody else.”

With all that said, there are some major problems with this view. If you already know what these problems are, then the rest of this post will probably not be very interesting to you. But if you feel convinced that the torture option is unambiguously worse, and don’t see what the problem could be with saying that, then read on!

First of all, pain is on a spectrum. This spectrum is not continuous, but nonetheless there’s a progression of pains from “slight pin-prick” to “year long torture” such that each step is just barely noticeably worse than the previous one.

Second of all, for each step along this progression, there’s a trade-off to be made. For instance, a pin-prick for one person is better than a very slightly worse pin-prick for one person. And a pin-prick for each of one million people is worse than a slightly more painful pin-prick for one person. So there’s some number of people N that will cause you to change your choice from “pin-pricks for N people” to “slightly worse pin-prick for one person.”

Let’s formalize this a little bit. We’ll call our progression of pains p1, p2, p3, …, pn, where p1 is the pain of a slight pin-prick and pn is the pain of yearlong torture. And we’ll use the symbol < to mean “is less bad than”. What we’ve just said is that for each k, (pk for one person) < (pk+1 for one person) AND (pk for one million people) > (pk+1 for one person). (The choice of million really doesn’t matter here, all that we need is that there’s some number for which the slighter pain becomes worse. One million is probably high enough to do the job.)

Now, if (pk for N) < (pk+1 for 1), then surely (pk for 2N) < (pk+1 for 2). The second is just the first one, but two times! If the tradeoff was worth it the first time, then the exact same tradeoff should be worth it the second time. But now what this gives us is the following:

p1 for 1,000,000n > p2 for 1,000,000n-1 > … > pn-1 for 1,000,000 > pn for 1

In other words, if we are willing to trade off at each step along the progression, then we are forced on pain of inconsistency to trade off between the slight pin prick and the yearlong torture! I.e. there has to be some number (namely 1000000n) for which we choose the torture for 1 over the pin-prick for that number of people.

Writing this all out:

  1. There’s a progression of pains p1, p2, …, pn-1, pn such that for each k, (pk for one million people) > (pk+1 for one person).
  2. If (p for n) > (q for m), then (p for k⋅n) > (q for k⋅m).
  3. > is transitive.
  4. Therefore, (p1 for 1,000,000n) > (pn for 1)

If you accept the first three premises, then you must accept the fourth. And the problem is, all three premises seem very hard to deny!

For premise 1: It seems just as clearly immoral to choose to inflict a pain on a million people rather than inflict a slightly worse pain on one person as it does to choose to inflict torture on one rather than a pin-prick on an arbitrarily large number of people. Or to make the point stronger, no sane moral system would say that it’s worse to inflict a pain on one person than a very slightly less bad pain on an arbitrarily large number of people. We have to allow some tradeoff between barely different pains.

Premise 2: if you are offered a choice between two options and choose the first, and then before you gain any more information you are offered the exact same choice, with no connection between the consequences of the first choice and the consequences of the second choice, then it seems to me that you’re bound by consistency to choose the first once more. And if you’re offered that choice k times, then you should take the first every time.

(Let me anticipate a potential “counter-example” to this. Suppose you have the choice to either get a million dollars for yourself or for charity. Then you are given that same choice a second time. It’s surely not inconsistent to choose the million for yourself the first time and the million for charity the second time. I agree, it’s not inconsistent! But this example is different than what we’re talking about in Premise 2, because the choice you make the second time is not the same as the choice you made the first. Why? Because a million dollars to a millionaire is not the same as a million dollars to you or me. In other words, the goodness/badness of the consequences of the second choice are dependent on the first choice. In our torture/pin-prick example, this is not the case; the consequences of the second choice are being enacted on an entirely different group of people, and are therefore independent of the first choice.)

Premise 3: Maybe this seems like the most abstract of the three premises, and hence potentially the most easy to deny. But the problem with denying premise 3 is that it introduces behavioral inconsistency. If you think that moral badness is not transitive, then you think there’s an A, B, and C such that you’d choose A over B, B over C, but not A over C. But if you choose A over B and B over C, then you have in effect chosen A over C, while denying that you would do so. In other words, a moral system that denies transitivity of badness cannot be a consistent guide of action, as it will tell you not to choose A over C, while also telling you to take actions that are exactly equivalent to choosing A over C.

And so we’re left with the conclusion that pin-pricks for 1,000,000n people is worse than torture for one person for a year.

Okay, but what’s the take-away of this? The take-away is that no consistent moral system can agree with all of the following judgements:

  • Barely distinguishable pains are morally tradeoffable.
  • There’s a progression of pains from “momentary pin-prick” to “torture for a year”, where each step is just barely distinguishably worse from the last.
  • Torturing one person for a year is worse than inflicting momentary pin-pricks on any number of people.

You must either reject one of these, or accept an inconsistent morality.

This seems like a big problem for anybody that’s really trying to take morality seriously. The trilemma tells us in essence that there is no perfect consistent moral framework. No matter how long you reflect on morality and how hard you work at figuring out what moral principles you should endorse, you will always have to choose at least one of these three statements to reject. And whichever you reject, you’ll end up with a moral theory that is unacceptable (i.e. a moral theory which commits you to immoral courses of action).

Constructing ordinal numbers in ZFC

Today I want to talk about ordinal numbers in ZFC set theory. VSauce does a great job introducing his viewers to the concepts of ordinal vs cardinal numbers, and giving a glimpse into the weird and wild world of mathematical infinity. I want to go a bit deeper, and show exactly how the ordinals are constructed in ZFC. Let’s begin!

Okay, so first of all, we’re talking about first-order ZFC, which is an axiomatic formalization of set theory in first-order logic. As a quick reminder, first order logic gives us access to the following alphabet of symbols: ( ) , ∧ ∨ ¬ → ↔ ∀ ∃ =, as well as an infinite store of variables (x, y, z, w, v, u, and so on). A first order language also includes a store of constant symbols, relation symbols, and function symbols.

For first-order set theory, we are going to add only a single extra-logical symbol to our alphabet: the “is-an-element-of” relation ∈. This is pretty remarkable when you consider that almost all of mathematics can be done with just ZFC! In some sense, you can give a pretty good description of mathematics as the study of the elementhood relation! Using just ∈ we can define everything we need, from ⊆ and ⋃ and ⋂ to the empty set ∅ and the power set function P(x). In fact, as we’ll see, we’re even going to define numbers using ∈!

The elementhood relation ∈ is given its intended meaning by the axiom of extensionality: ∀x∀y (x=y ↔ ∀z (z∈x ↔ z∈y)). In plain English this says that two sets are the same exactly when they have all the same elements.

The semantics of first order logic has two parts: a “universe” of individuals that are quantified over by ∀ and ∃, and an interpretation of each of the constant symbols (as individual objects in the universe), the relation symbols (as maps from the universe to truth values), and the function symbols (as maps from the universe to itself).

Our universe is going to be entirely composed of sets. This means that sets won’t be composed of non-set elements; the elements of non-empty sets are always themselves sets. And those sets themselves, if non-empty, are made out of sets. It’s sets all the way down!

Now, the topic of this essay is ordinal numbers in ZFC. So if everything in ZFC is a set, guess what ordinal numbers will be? You got it, sets! What sets? We can translate from ordinals to sets in a few words: the ordinal 0 is translated as the empty set, and every other ordinal is translated as the set of all smaller ordinals.

This tells us that 1 = {0}, 2 = {0, 1}, 3 = {0, 1, 2}, and so on. If we were to write these ordinals entirely in set notation, it would look like: ∅, {∅}, {∅,{∅}}, {∅,{∅},{∅,{∅}}}, and so on. The choice to associate these particular sets with the natural numbers is a convention introduced by John Von Neumann (it is, however, an exceedingly wise convention, and has many virtues that competing conventions do not have, as will become clearer once we ascend to the transfinite).

So, now we know that we are going to associate the finite ordinals with the empty set and supersets of the empty set. But of course, we haven’t yet even shown that the empty set exists in ZFC! To actually construct the empty set in ZFC, we have to add some more axioms. Let’s start with the obvious one: a sentence that asserts the existence of the empty set:

Axiom of Empty Set: ∃x∀y ¬(y∈x)

In other words, there’s some set x that contains no sets. Notice that we didn’t refer to the empty set by name. We can’t refer to it by name in our axioms, because we haven’t included any constant symbols in our language! To actually talk about the particular set ∅ (equivalently, 0), we can use the rule of existential instantiation, which allows us to remove any existential quantifier, as long as we change the name of the quantified variable to something that has not been previously used. So, for example, in any particular proof, we can do the following:

1.  ∃x∀y ¬(y∈x) (Axiom of Empty Set)
2. ∀y ¬(y∈0) (from 1 by existential instantiation)

This is allowed so long as the symbol 0 has not appeared anywhere previously in the proof.

Now that we have the empty set, we need to be able to construct 1={0} and 2={0,1}. To do this, we introduce the axiom of pairing:

Axiom of Pairing: ∀x∀y∃z∀w (w∈z ↔ (w=x ∨ w=y))

This says that we can take any two sets x and y and form a new set z = {x, y}. We can right away use this axiom to construct the number 1.

3. ∀x∀y∃z∀w (w∈z ↔ (w=x ∨ w=y)) (Axiom of Pairing)
4. ∃z∀w (w∈z ↔ (w=0 ∨ w=0)) (from 3 by universal instantiation)
5. ∀w (w∈1 ↔ (w=0 ∨ w=0)) (from 4 by existential instantiation)
6. ∀w (w∈1 ↔ w=0)

We went from 3 to 4 by using universal instantiation (instantiating both variables x and y as 0), and from 4 to 5 by using existential instantiation (instantiating z as 1, which is allowed because we haven’t used the symbol 1 yet). The step from 5 to 6 is technically skipping a bunch of steps. Even though it’s obvious that we can replace (w=0 ∨ w=0) with w=0 inside the formula, there isn’t any particular rule of inference in first order logic that does it in one step. But since it is obvious that 5 semantically entails 6, and since first order logic has a sound and complete proof system, we know that 5 also syntactically entails 6.

To construct 2, we can simply use pairing again with 0 and 1:

7. ∃z∀w (w∈z ↔ (w=0 ∨ w=1)) (from 3 by universal instantiation)
8. ∀w (w∈2 ↔ (w=0 ∨ w=1)) (from 4 by existential instantiation)

We might think that we could simply use pairing once more to get 3 = {0,1,2}. But this won’t quite work. If we use pairing on 1 and 2 we get {1,2}, and if we use pairing again on this and 0 we get {0,{1,2}}, not {0,1,2}. In fact, we can easily see that any usage of pairing always produces a set with exactly two elements. And the set 3 has three elements! So pairing is not enough to get us where we want to go. To get to 3, we need a stronger axiom, which will allow us to take the union of sets.

Axiom of Union: ∀x∃y∀z (z∈y ↔ ∃w(z∈w ∧ w∈x))

This says that for any set x, we can construct a new set y which consists of the union of all sets in x. In other words, if z is an element of y, then z must be contained in some set w that’s an element of x.

Now, here’s how we’re going to construct 3. First we construct the set {2} by pairing 2 with itself. Then we construct the set {2,{2}} with pairing again. Then we union all the elements of this set to get 2⋃{2}. Now remember that 2 = {0,1}, so 2⋃{2} = {0,1}⋃{2} = {0,1,2}. And that’s 3!

Constructing {2}:

9. ∃z∀w (w∈z ↔ (w=2 ∨ w=2)) (from 3 by universal instantiation)
10. ∀w (w∈{2} ↔ (w=2 ∨ w=2)) (from 9 by existential instantiation)
11. ∀w (w∈{2} ↔ w=2)

Constructing {2,{2}}:

12. ∃z∀w (w∈z ↔ (w=2 ∨ w={2})) (from 3 by universal instantiation)
13. ∀w (w∈{2,{2}} ↔ (w=2 ∨ w={2})) (from 12 by existential instantiation)

Constructing 3:

14. ∀x∃y∀z (z∈y ↔ ∃w(z∈w ∧ w∈x)) (Axiom of Union)
15. ∃y∀z (z∈y ↔ ∃w(z∈w ∧ w∈{2,{2}})) (from 9 by universal instantiation)
16. ∀z (z∈3 ↔ ∃w(z∈w ∧ w∈{2,{2}})) (from 10 by existential instantiation)

Technically, we’ve now constructed 3. But let’s neaten this up and show that this set is really what we wanted (using more of the intuitive semantic arguments from before to skip many tedious steps).

17. ∀z (z∈3 ↔ ∃w(z∈w ∧ (w=2 ∨ w={2}))) (from 13,16)
18. ∀z (z∈3 ↔ ∃w((z∈w ∧ w=2) ∨ (z∈w ∧ w = {2})))
19. ∀z (z∈3 ↔ (z∈2 ∨ z∈{2}))
20. ∀z (z∈3 ↔ (z∈2 ∨ z=2)) (from 11,19)
21. ∀z (z∈3 ↔ (z=0 ∨ z=1 ∨ z=2)) (from 8,20)

We can construct 4 in pretty much the exact same way: first use pairing to construct {3} and then {3,{3}}, and then use union to construct 3⋃{3} = {0,1,2}⋃{3} = {0,1,2,3} = 4. I’ll go through this all formally one more time, more quickly than before:

22. ∀w (w∈{3} ↔ w=3) (pair 3 with 3)
23. ∀w (w∈{3,{3}} ↔ (w=3 ∨ w={3})) (pair 3 with {3})
24. ∀z (z∈4 ↔ ∃w(z∈w ∧ w∈{3,{3}})) (union {3,{3}})
25. ∀z (z∈4 ↔ (z=0 ∨ z=1 ∨ z=2 ∨ z=3)) (neatening up)

It should be clear that this process can continue indefinitely. From the ordinal 4 we can construct 5 by taking the union of 4 and {4}, and from 5 we can construct 6 = 5⋃{5}. And so on. In fact, we can define the successor of any set x as exactly the set x⋃{x}: S(x) = x⋃{x}. And using the above construction, we know that this successor set will always exist!

Wonderful! So now we have an outline for the construction of every natural number. What’s next? What comes after 0, 1, 2, 3, 4, and so on? The first infinite ordinal, ω! Just as 4 is the set {0,1,2,3}, and 10 is the set of all the ordinals before it, ω is defined as exactly the set of all previous ordinals. In other words, ω is the set of all natural numbers! ω = {0,1,2,3,…}.

Now, how can we construct ω in ZFC? Can we do it using the axioms we have so far? You might be tempted to say something like “Sure! You’ve just demonstrated a process that constructs n+1 from any n, and we know how to construct 0 already. So don’t we already have the ability to construct all the natural numbers?”

Well, hypothetical you, it’s true that we now know how to construct each natural number. But constructing the infinite set containing all natural numbers is an entirely different matter. Remember that proofs are only allowed to be finitely long! So in any proof using only the methods we’ve used so far, we can only construct finitely many natural numbers (proportional to the length of the proof). To get ω, we need something more than the axioms we have so far. Introducing: the axiom of infinity!

But before we get there, I want to construct a handy bit of shorthand which will make what comes next a lot easier to swallow. What we’ll do is write out as a first-order sentence the assertion “x is the successor of y”, as well as the sentence “x is the empty set”, and then introduce a shorthand notation for them. Trust me, it will make life a lot easier.

First, “x is the successor of y”, which we can also write as “x = y⋃{y}”. Try this for yourself before reading on! Ok, now that you’re back, here it is: ∀z (z∈x ↔ (z∈y ∨ z=y)). We’ll call this sentence Succ(x,y). So if you ever see “Succ(x,y)” in the future, read it as “x is the successor of y” and know that if we wanted to be fully formal about it we could replace it with “∀z (z∈x ↔ (z∈y ∨ z=y))”.

Good! Now, let’s do the same with the sentence “x is the empty set”, which is the same thing as “x is 0”. Try it for yourself! And now, here it is: ∀y ¬(y∈x). We’ll call this sentence isZero(x).

Now we’re ready for the axiom of infinity!

Axiom of Infinity: ∃x∀y ((isZero(y) → y∈x) ∧ (y∈x → ∃z (Succ(z,y) ∧ z∈x)))

If this axiom looks like a lot to comprehend, imagine it without our shorthand! Conceptually, what’s going on with this axiom is actually pretty simple. We’re just asserting that there exists an infinite set x that contains 0, and that is closed under the successor operation. So this set is guaranteed to contain 0, as well as the successor of 0 (1), and the successor of the successor of 0 (2), and the successor of this (3), and so on forever. (Bonus question: what does the set theoretic universe look like if we remove the axiom of infinity and add its negation as an axiom instead? What mathematical structure is it isomorphic to?)

Quiz question for you: have we now constructed ω? That is, the axiom of of infinity does guarantee us the existence of a set, but are we sure that that set is exactly the set of natural numbers and nothing more?

The answer is no. The axiom of infinity does guarantee us the existence of an infinite set, and we know for sure that this set contains all the natural numbers, but there’s nothing guaranteeing that it doesn’t also contain other sets! To actually obtain ω, we need one more axiom. This axiom will be the most powerful one we’ve seen yet: the axiom of comprehension.

Axiom of Comprehension: ∀x∃y∀z (z∈y ↔ (z∈x ∧ φ(z)))

This tells us that for any set x, we can construct a new set y, which consists of exactly the elements of x that have a certain property φ. In set-builder notation, we can write: y = {z∈x: φ(z)}. (Bonus question: why do we have to define y as the subset of x that satisfies φ? Why not just say that there exists a set of all sets that satisfy φ? This unrestricted comprehension axiom appeared in the early formalizations of set theory, but there was a big problem with it. What was it?)

You may notice that there’s something different about this axiom than the previous ones. What’s up with that symbol φ(z)? Well, φ(z) is a stand-in for any well-formed formula in the language of ZFC, so long as φ contains only z as a free variable. What that means is that there’s actually not one single axiom of comprehension, but a countably infinite axiom schema, one for each well-formed formula φ.

For instance, we have as one instance of the axiom of comprehension that ∀x∃y∀z (z∈y ↔ (z∈x ∧ z=z)). As another instance of the axiom, we have ∀x∃y∀z (z∈y ↔ (z∈x ∧ z≠z)). Both of these are pretty trivial examples: in the first case the set y is exactly the same as x (as all sets are self-identical), and in the second case y is the empty set (as no set z satisfies the property z ≠ z). But we can do the same thing for any property whatsoever, so long as it can be encoded in a sentence of first-order ZFC. (Another bonus question: One of the axioms I’ve mentioned before has now been obviated by the introduction of these new axioms. Can you figure out which it is, and produce its derivation?)

We use the axiom of comprehension to “carve ω out” from the set whose existence is guaranteed by the axiom of infinity (remember, we already know for sure that this set contains all the natural numbers, it’s just that it might contain more elements as well). So what we need is to construct a sentence φ(z) such that the only set z that satisfies the sentence is the set of all natural numbers ω.

There are several such sentences. I’ll briefly present one simple one here. Again we’ll introduce a convenient shorthand for the sake of sanity. Take a look at this sentence that we saw earlier: “∀y ((isZero(y) → y∈x) ∧ (y∈x → ∃z (Succ(z,y) ∧ z∈x)))”. What this sentence says is that x is a superset of ω (it contains 0 and is closed under successorhood). So we’ll call this sentence “hasAllNaturals(x)”.

Now, we can write the following sentence: ∀x (hasAllNaturals(x) → z∈x).

Consider what this sentence says. It tells us that z is an element of every set that contains all the naturals. But one such set is the smallest set containing all the naturals, i.e. ω! So z must be an element of ω. In other words, z is a natural number. So this sentence will do for our definition of φ(z).

φ(z): ∀x (hasAllNaturals(x) → z∈x)

Just like with hasAllNaturals(z) and Succ(x,y) and isZero(x), you should read φ(z) as simply a shorthand for the above sentence. If we really wanted to torture ourselves with formality, we could write out the entire sentence using only the allowed symbols of first-order ZFC.

Now we can finally get ω. Let’s continue with our proof progression from earlier. We left off at 25, so:

26. ∃x∀y ((isZero(y) → y∈x) ∧ (y∈x → ∃z (Succ(z,y) ∧ z∈x))) (Axiom of Infinity)
27. ∀y ((isZero(y) → y∈inf) ∧ (y∈inf → ∃z (Succ(z,y) ∧ z∈inf)))
28. ∀x∃y∀z (z∈y ↔ (z∈x ∧ φ(z))) (Axiom of Comprehension)
29. ∃y∀z (z∈y ↔ (z∈inf ∧ φ(z)))
30. ∀z (z∈ω ↔ (z∈inf ∧ φ(z)))

In going from line 26 to 27, we gave the infinite set guaranteed us by the axiom of infinity a placeholder name, “inf”. Line 30 is what we’ve been aiming for for the last few hundred words, and it honestly looks a little underwhelming. It’s not so immediately clear from this line that ω has all the properties that we want of the natural numbers. But at the same time, we couldn’t write something like “∀z (z∈ω ↔ (z=0 ∨ z=1 ∨ z=2 ∨ …)), because first-order logic is finitary (we aren’t allowed infinitely long sentences). So we have to make do with a definition of ω that may look a little more abstract that we may like. Suffice it to say that line 30 really does serve as an adequate definition of ω. It tells us that ω is the smallest set that contains all natural numbers. From this, we can pretty easily show that any particular natural number is an element of ω, and (less easily) that any other set (say, the set {2} or {5,1}) is not an element of ω.

If you’ve followed so far, give yourself a serious pat on the back. Together we’ve ascended past the realm of the finite to our first transfinite ordinal. This is no small accomplishment. But our journey does not end here. In fact, it has only barely begun. We’re going to start picking up speed from here on out, because as you’ll see, the ground we have yet to cover is much much greater than the ground we’ve covered so far.

The first step is easy. We already saw earlier how you can construct for any set x its successor set x⋃{x}. This construction didn’t rely on our sets being finite, it works just as well for the set ω. So ω has a successor! We’ll call it ω+1! Don’t believe me? I’ll prove it to you:

31. ∀x (x∈{ω} ↔ x=ω) (pair ω with ω)
32. ∀x (x∈{ω,{ω}} ↔ (x=ω ∨ x={ω})) (pair ω with {ω})
33. ∀x (x∈ω+1 ↔ ∃y(x∈y ∧ y∈{ω,{ω}})) (union {ω,{ω}})
34. ∀x (x∈ω+1 ↔ (x∈ω ∨ x=ω)) (neatening up)

There we have it! ω+1 = {0,1,2,3,…,ω}

And it doesn’t stop there: we can construct ω+2 = {0,1,2,3,…,ω,ω+1}. And ω+3 = {0,1,2,3,…,ω,ω+1,ω+2}. And so on, forever! By allowing the existence of one infinity, we’ve actually entailed the existence of an infinity of infinities!

But what’s next? We now have all the finite ordinals, and all the infinite ordinals of the form ω+n for finite n. What comes after this? Clearly, the next ordinal is just the set of all finite ordinals as well as all ordinals of the form ω+n! This ordinal is the smallest ordinal that’s larger than ω+n any finite n. So a natural name for it is ω+ω, or ω⋅2!

So conceptually ω⋅2 makes sense, but can we actually construct it? At first glance, this may seem unlikely. The axiom of infinity contains no guarantee that the infinite set it grants us contains ω, to say nothing of ω+1, ω+2, and the rest. And no finite amount of constructing successors will allow us to make the jump to ω⋅2 (for much the same reason as we needed the axiom of infinity to make the jump to ω). So perhaps we need to have a new axiom asserting the existence of ω⋅2? And then maybe we need a new axiom for ω⋅3, and ω⋅4, and so on forever! That would be a sad situation.

Well, things aren’t quite that bad. It turns out that we do need a new axiom. But we can do better than just guaranteeing the existence of ω⋅2. We’ll introduce an axiom schema that is by far the most powerful of all the axioms of ZFC. This axiom schema will take us far beyond ω⋅2, beyond ω⋅ω even, and far far beyond ω^ω and ω^ω^ω^… with infinitely many exponents. Introducing: the Axiom of Replacement! (dramatic music plays)

But first, we’re going to need to talk a little bit about the set theoretic notion of functions. I promise, it’ll be as quick as I can manage. Remember how we chose our language for ZFC to have no function symbols, only the single relation symbol ∈? This means that we have to build in functions through different means. We’ve already seen a hint of it when we talked about the sentence Succ(x,y) which said that x was the successor of y. This sentence is true for exactly the sets x and y such that x is the successor of y. We can think of the sentence as “selecting” ordered pairs (x,y) such that x = {y}. In other words, this sentence is filling the role of defining the function y ↦ {y}.

The same applies more generally. For any function f(x), we can construct the sentence F(x,y) which asserts “y = f(x)”. For instance, the identity function id(x) = x will be defined by the sentence Id(x,y): “y=x”. Notice that not all functions from sets to sets can be defined in this way, as we only have countably many sentences in our language to work with.

Suppose we’re handed a sentence φ(x,y). How are we to tell if φ(x,y) represents a function or just an ordinary sentence? Well, functions have the property that any input is mapped to exactly one output. We can write this formally: “∀x∀y∀z ((φ(x,y) ∧ φ(x,z)) → y=z)”. For shorthand, we’ll call this sentence isAFunction(φ).

And now we’re ready for the axiom schema of replacement.

Axiom of Replacement: isAFunction(φ) → ∀x∃y∀z (z∈y ↔ ∃w (w∈x ∧ φ(w,z)))

In English, this says that for any definable function f (defined by φ), and for any set of inputs x, the image of x under f exists. Like with the axiom schema of comprehension, this is a countably infinite collection of axioms, one for each well formed formula φ(w,z).

Let’s use this axiom to construct ω⋅2. First we define a function f that takes any finite number n to the ordinal ω+n. (Challenge: Try to explicitly define this function! Notice that the most obvious method for defining the function requires second-order logic. Try to come up with a trick that allows it to be defined in first-order ZFC!) Then we prove that this really is a function. Then we apply the axiom of replacement with our domain as ω (the set of all natural numbers). The image of ω under f is the set {ω,ω+1,ω+2,ω+3,…}. Not quite what we want. But we’re almost there!

Next we use the axiom of pairing to pair this newly constructed set with ω itself, giving us {{0,1,2,…}, {ω,ω+1,ω+2,…}}. And finally, we apply the axiom of union to this set, giving us {0,1,2,…,ω,ω+1,ω+2,…} = ω⋅2!

Phew, that was a lot of work just to make the jump to from ω to ω⋅2. But it was worth it! Now we can also jump to ω⋅3 in the exact same way!

Define a function f that maps n to ω⋅2+n. Now use replacement with ω as the domain to construct the set {ω⋅2, ω⋅2+1, ω⋅2+2, …}. Pair this set with ω⋅2, and union the result. This gives ω⋅3!

In exactly the same way, we can use the axiom of replacement to get ω⋅4, ω⋅5, ω⋅6, and so on to ω⋅n for any finite n! But it doesn’t stop there. We’ve just described a procedure to get ω⋅n for any finite n. So we write it as a function!

Define f as the function that maps n to ω⋅n. Now use replacement of f with ω as the domain to get the set {0, ω, ω⋅2, ω⋅3, ω⋅4, ω⋅5, …}. Apply the axiom of union to this set, and what do you get? Well it has to contain ω⋅n for every finite n, since each ω⋅n is contained in ω⋅m for every larger m. So it’s larger than all ordinals of the form ω⋅n for finite n. This new ordinal we’ve constructed is called ω⋅ω, or ω2.

But of course, our journey doesn’t stop there. We can use replacement to generate ω2 + ω, and ω2 + ω⋅2, ω2 + ω⋅3, and so on. But we can go further and use replacement to generate ω2 + ω2, or ω2⋅2! And from there we get ω2⋅3, ω2⋅4, and so on. And applying replacement to the function from n to ω2⋅n, we get a new ordinal larger than everything before, called ω3.

As you might be suspecting, this goes on forever. We can continually apply the axiom of replacement to get mind-bogglingly large infinities, each greater than the last. But here’s the kicker: all of these infinite ordinals that we’ve created so far? They all have the same cardinality.

Yes, that’s right. You may have thought that we had transcended ω in cardinality long ago, but no. For each infinite ordinal we’ve created so far, there’s a one-to-one mapping between it and ω. Think about it: every time we used replacement, we constructed a function that mapped an existing ordinal to a new one of the same cardinality. And in applying replacement to this function, we guaranteed that the new ordinal we created cannot be a larger cardinality than the existing ordinal! So each time we use replacement with a countable domain, we get a new countable set, each of whose elements is countable. And then if we use pairing or union, we’re always going to stay in the domain of the countable (unioning two countable sets just gets you another countable set, as does unioning a countable infinity of countable sets).

So now turn your mind to the following set: the set of all countable ordinals. Is this set itself an ordinal? It is if it’s the smallest uncountable ordinal, as then it contains every ordinal smaller than itself by definition! And what’s the cardinality of this set? It can’t be countable, because then it would have to be an element of itself! (Bonus question: Why is it that no ordinal can be an element of itself? Use the axiom of extensionality! Bonus question 2: In general, no set can be an element of itself. But this is not guaranteed by the axioms I’ve presented so far. The axiom that does the job is called the axiom of regularity/foundation, and it says that every set must have an element that it is disjoint with. Why does this prevent us from having sets that contain themselves?)

So this ordinal is the first uncountable ordinal. Its name is ω1. The cardinality of ω1 is the smallest cardinality that follows the cardinality of ω, and it’s called “aleph one” and written א‎1.

Now, I’ve already said that the same old paradigm we’ve used so far can’t get us to ω1. So can we get there with ZFC? It turns out that the answer is yes, and at the moment I’m not quite sure exactly how. It turns out that not only can ZFC construct ω1, it can also construct ω2 (the first ordinal with a larger cardinality than ω1), ω3, ω4, and so on forever.

So does ZFC have any limits? Or can we in principle construct every ordinal, using ever more ingenious means to transcend the limitations of our previous paradigms? The answer is no: ZFC is limited. There are ordinals so large that even ZFC cannot prove their existence (these ordinals have what’s called inaccessible cardinalities). To construct these new infinities, we must add them in as axioms, just as we had to for infinity (and indeed, for 0).

One might think that when we get to ordinals that are this mind-bogglingly large, nothing of any consequence could follow from asserting their existence. But this is not the case! Remarkably, if you add these new infinities to your theory, you can prove the consistency of ZFC. (That is, the consistency of the theory I’ve presented so far, which does not have these large cardinal axioms.) And to prove the consistency of this new theory, you must add even larger infinities. And now to prove the consistency of this one, you must expand the universe of sets again! And so on forever.

One might ask: “So how big is the universe of sets really?” At what point do we content ourselves to stop axiomatically asserting new and larger infinities, and say that we’ve obtained an adequate formalization of what the universe of sets looks like? I’m really not sure how to think about this question. Anyway, next up will be ordinal notation, and the notion of computable ordinals!

Meaning ain’t in the brain

I don’t know if there’s a name for the position that the meanings of our terms is pinned down by facts about the brain. The closest I know is semantic internalism, but a semantic internalist could think that meaning is pinned down by facts about qualia, which happen to not be facts about the brain. So I’ll make up a name for this position: call it physicalist semantic internalism.

Now, here’s an argument against physicalist semantic internalism that seems totally right to me.

What I mean by “second-order logical concepts” is the concepts of “and”, “or”, “not”, second-order quantifiers (“for all” and “for some”, ranging over not just objects but properties of objects), and the notions of functions, relations, and concepts.

  1. The semantics of second order logic captures what I mean when I use second-order logical concepts.
  2. No finite set of rules (and correspondingly no finite machine) can pin down the semantics of second order logic.
  3. So no finite machine pins down what I mean when I use second-order logical concepts.
  4. My brain is a finite machine.
  5. So my brain does not pin down what I mean when I use second-order logical concepts.

And here’s another argument along similar lines:

  1. The truth values of sentences about integers are determined by what we mean by integers.
  2. The statement of the satisfiability of each Diophantine equation has a determinate truth value.
  3. The statement of the satisfiability of each Diophantine equation is a statement about integers.
  4. So the satisfiability of each Diophantine equation is fixed by what we mean by integers.
  5. No finite machine can fix the satisfiability of each Diophantine equation.
  6. Our brain is a finite machine.
  7. So the meaning of integers is not contained in the brain.