Fundamentals of Logic: Syntax, Semantics, and Proof

A while back, I wrote a several-part introduction to mathematical logic. I wrote this when I was first wading into the subject and noticed that a lot of the stuff I was learning was not easy to find online in a concise beginner-level essay. Since then, I’ve learned an enormous amount more and become a lot more comfortable with the basics of the field. Looking back on my introduction series, I’m pretty happy with it. There’s nothing in there that’s incorrect. But I do notice that when I read it over, I wish that I had framed certain things slightly differently. In particular, one of the distinctions that I think is absolutely central to mathematical logic – the distinction between syntax, semantics, and proof – is muddled a bit in the write-up, and proof is bundled in with syntax in a way that I’m not totally satisfied with.

Given that, and given that this series is for whatever reason now one of the most popular reads on my blog, I want to provide a brief update post. I sort of did this a week or so ago, with this post, which linked to the slides for a presentation I gave for a discussion group. But I want to do it for REAL now, as an actual post of its own. And honestly, I also just love this subject and am happy for any excuse to write about it more.

So, let me begin by laying out what seems to me to be the primary trichotomy that you’ll see pop up over and over again in logic: syntax, semantics, and proof. One at a time, then.

Syntax

The syntax of a logic is a detailing of the allowed symbols of a language and the conditions for their grammatical use. It’s a set of characters, along with a function that defines the grammar of the language by taking in (finite, in most logics) strings of these characters and returning true or false. This function is usually inductively defined. So for instance in propositional logic we have some set of characters designated to be our propositional variables (we’ll denote them p, q, r, …), and another set of characters designated to be our logical symbols (∧, ∨, ¬, →, and parentheses). We inductively define our grammar function G as follows: Every propositional variable is grammatical. If X and Y are grammatical strings, then so are (¬X), (X ∧ Y), (X ∨ Y), and (X → Y). And no other string is grammatical. This definition is what allows us to say that (p ∧ (¬(q ∨ r))) is grammatical but (p ∧ (¬(q ∨ ¬))) is not. The grammar for propositional logic is especially nice and simple, and first-order logic is only mildly more complicated. The important thing is that the syntax for these logics as well as higher order logics is algorithmically checkable; it’s possible to write a simple program that verifies whether any given input string counts as grammatical or not.

Semantics

The semantics of a logic is a system for assigning some sort of meaning to the grammatical strings of the language. There are different types of meanings that might be assigned, but the primary one for classical logic is truth and falsity. For propositional logic, our semantics is a set of truth functions, which are functions that take in grammatical strings and return either true or false (you’ve encountered these before if you’ve seen truth tables; each row in a truth table corresponds to a particular truth function). Not just any such function will do; these functions will have to satisfy certain constraints, such as that whatever a truth function assigns to a string X, it must assign the opposite value to the string (¬X), and that (A ∧ B) is assigned true if and only if both A and B are assigned true. These constraints on our functions are really what endow our strings with meaning; they induce a sort of structure on the truth values of strings that lines up with our intended interpretation of them.

For propositional logic, we give an inductive definition for our collection of valid truth functions. We construct a truth function by first doing any assignment of truth values to the propositional variables (p, q, r, and so on), and then defining what the function does to the rest of the strings in terms of these truth values. So for any strings X and Y, the truth function assigns true to (¬X) if and only if it assigns false to X. It assigns true to (X ∧ Y) if and only if it assigns true to X and true to Y. And so on. By our construction of the grammar of propositional logic, we’ve guaranteed that our function is defined on all grammatical strings. Of course, there are many such truth functions (one for every way of assigning truth values to the propositional variables) – and this collection of truth functions is what defines our semantics for propositional logic.

First order logic is again more complicated than propositional logic in its semantics, but not hugely so. When talking about the semantics of first-order logic, we call its truth functions models (or structures), and each model comes with a universe of objects that the quantifiers of the logic range over. An important difference between the semantics of propositional logic and the semantics of first order logic is that in the former, we can construct an algorithm that directly searches through the possible truth functions. In the latter, getting a similar sort of direct access to the models is much more difficult. Consider, for instance, that some models will have universes with infinitely many elements, meaning that to verify the truth of a universally quantified statement requires verifying it for infinitely many elements.

Last thing I need to say about semantics before moving on to proof: there is the all-important concept of semantic entailment. A set of sentences (call it A) is said to semantically entail another sentence (call it X), if every truth function that assigns everything in A true also assigns X true. When this is the case, we write: A ⊨ X.

Examples
{(p ∧ q)} ⊨ p
{(p ∨ q), (¬q)} ⊨ p
{p, (p→q), (q→r)} ⊨ r

Proof

Notice that we defined our semantics using complicated words like “functions” and “inductive definition”. More troubling, perhaps, we seem to have engaged in a little bit of circularity when we said in our definition of the semantics that (A ∧ B) is assigned true only when A and B are both assigned true. Suppose that some pesky philosopher were to come up to us and ask us what we meant by “only when A and B are both assigned true”. What would we have to say? We couldn’t refer to our formal definition of the semantics of ∧ to ground our meaning; as this definition uses the concept of “and” in it explicitly! The circularity is even more embarrassingly vivid when in first order logic we’re asked to define the semantics of “∀”, and we say “∀x φ(x) is true whenever φ(x) is true of every object in our model”. In other words, there’s a certain sense in which our semantics is a little hand-wavy and non-rigorous. It relies on the reader already having an understanding of mathematical concepts that are usually much more advanced than the ones being defined, and indeed implicitly relies on the reader’s understanding of the ones being defined in the first place!

With this criticism of semantics fresh in our heads, we might want to ask for a purely mechanical system that would take in some starting strings and churn out new strings that follow from them according to the semantics. Such a system wouldn’t rely on the user’s understanding of any higher-order concepts; instead a mere understanding of the concept of string manipulations would suffice. That is, we want to rigorously define such a procedure so that the results of the procedure mirror the structure that we know the semantics has, but without relying on any high-level concepts or circularity. We can instead just point to the mechanical process by which strings of these symbols are manipulated and be content with that. A proof system is just such a mechanical system.

There are multiple styles of proof systems, varying primarily with respect to how many inference rules and axioms they have. On the side of many axioms and few inference rules we have Hilbert-style systems. And on the side of many inference rules and few (sometimes zero) axioms we have Gentzen-style natural deduction systems. But hold on, what are these “axioms” and “inference rules” I’m suddenly bringing up? Axioms are simply strings that can be used in a proof at any time whatsoever. Inference rules are functions that take in some set of strings and produce new ones. A proof is just an ordered list of strings, where each is either an axiom or a result of applying some inference rule to earlier strings.

For propositional logic, one common proof system involves just three axioms and one inference rule. The three: X → (Y → X), (X → (Y → Z)) → ((X → Y) → (X → Z)), and ((¬Y) → (¬X)) → (X → Y). The inference rule is the famous modus ponens, which takes the form of a function that takes as input X and X → Y, and produces as output Y.

(The discerning amongst you might notice that the “three axioms” are actually three infinite axiom schemas; for each you can populate X, Y, and Z with any three grammatical strings you like and you get a valid instance of the axioms. The even more discerning might notice that these axioms only involve the → and ¬ symbols, but we seem to be missing ∧ and ∨. That’s correct; we neglect them for economy of our proof system and are allowed to do so because all sentences with ∧ and ∨ can be translated into sentences using just → and ¬. To expand our proof system to deal with ∧ and ∨, we simply add a few more axioms to incorporate these translations.)

Soundness and Completeness

I’ve said that the purpose of a proof system is to mimic the semantics of a logic. I will now say more precisely what this mimicry amounts to.

There is a close cousin to the semantic-entailment relation (⊨). This is known as the syntactic entailment relation, and is written ⊢. We say that a set of sentences A syntactically entails a sentence X if there exists some proof that uses the sentences in A as assumptions and concludes with the sentence X. So A ⊢ X can be thought of as “X can be proven, provided that we assume A.”

To say that a proof system for a logic mimics the semantics of the logic is to say that the syntactic entailments and semantic entailments line up perfectly. This has two components:

Soundness: If A ⊢ X, then A ⊨ X.
Completeness: If A ⊨ X, then A ⊢ X.

There’s one other feature of a proof system that’s totally essential if it’s to do the job we want it to do; namely, that it should be actually computable. We should be able to write an algorithm that verifies whether any given sequence of strings counts as a valid proof according to the proof system. This is known as effective enumerability.

To have a sound, complete, and effectively enumerable proof system for a logic is to know that our semantics is on solid ground; that we’re not just waving our hands around and saying things that cannot be cashed out in any concrete way, but that in fact everything we are saying can be grounded in some algorithm. There’s a general philosophy in play here, something like “if I can compute it then it’s real.” We can rest a lot more comfortably if we know that our logic is computable in this sense; that our deductive reasoning can be reduced to a set of clear and precise rules that fully capture its structure.

Perhaps predictably, our next question is whether we really do have a sound, complete, and effectively enumerable proof system for a logic. It turns out that the answer depends on which logic is being discussed. This is already perhaps a little disconcerting; we might have hoped that every logic of any use can be nailed down concretely with a proof system. But alas, it’s not the case.

With that said, there do exist sound, complete, and effectively enumerable proof systems for propositional logic. Maybe that’s not so exciting, since propositional logic is pretty limited in terms of what sort of math you can do with it. But it turns out that sound complete and effective proof systems ALSO exist for first-order logic! And first-order logic is where we form theories like Peano arithmetic (for natural number arithmetic) and ZFC (for set theory).

In other words, there’s a logic within which one can talk about things like numbers and sets, and the semantics of this logic can be made precise with a proof system. That’s great news! Math is on sound grounds, right? Well…

You might be skeptical right now, the phrase “incompleteness theorems” bubbling into your mind, and you’re right to be. While we can form languages in first order logic whose sentences look to be describing natural numbers (in PA) or sets (in ZFC), it turns out that none of these languages have the expressive power to actually pin down these structures uniquely. Said more carefully, there is no first-order collection of sentences which has as a unique model the natural numbers. There’s a theorem called the Löwenheim-Skolem Theorem, which is one of the most shocking results in the field of mathematical logic. It says that if a first-order theory has any infinite model, then it has infinite models of every cardinality. So if we want to describe the natural numbers, we have to also be simultaneously describing structures of every cardinality. At this point, this is getting into well-covered territory for this blog, so I’ll just direct you to these posts for more detail about how this works: here, here, and here.

So, in first order logic you have a proof system that perfectly mirrors your semantics but no theory whose semantics line up with the natural numbers. There are many ways we can strengthen our semantics to give us the strength to talk about the natural numbers categorically, the most popular of which is second-order logic. In second order logic, we are allowed to quantify not only over objects in our universe, but over sets of objects as well. This turns out to be a really big deal. In second-order logic, we can actually write down a set of sentences that uniquely pick out the natural numbers as their only model. But guess what? Second-order logic has no sound and complete proof system!

This is a pattern that holds very generally. Either we have a logic with the expressive power to talk categorically about the natural numbers, in which case we have no sound and complete proof system, OR we have a logic with a sound and complete proof system, in which case we lack the ability to talk categorically about the natural numbers. No matter how we go, we end up at a loss as to how to ground the semantics of ℕ in an algorithmic procedure. And as a result, we also end up unable to have any algorithmic procedure that enumerates all the truths about natural numbers. This is what has become known as the incompleteness phenomenon.

I’ll stop there for now! Here’s a brief summary of the main concepts I’ve discussed. A logic defined by its syntax, semantics, and proof system. The syntax details the allowed symbols of the language and which combinations are grammatical. The semantics details the possible truth assignments to these sentences, consistent with our intended interpretation of the symbols. And the proof system provides a mechanical procedure for deriving new strings from old strings. A set of strings X semantically entails a string Y if every truth assignment that satisfies all the strings in X also satisfies Y. A set of strings X syntactically entails Y if there’s some proof of Y from the assumptions of X. A proof system is said to be sound and complete if its syntactic entailment relation perfectly mirrors the semantic entailment relation. And it’s said to be effectively enumerable if there’s an algorithm for determining whether a given sequence of strings counts as a proof or not. First-order logic has a sound, complete, and effectively enumerable proof system. But it lacks the expressive power to talk categorically about the natural numbers. No set of sentences in a first-order language has the natural numbers as a unique model. Second-order logic has the expressive power to talk categorically about the natural numbers, but it has no sound and complete proof system. In general, no logic has both (1) a sound, complete, and effectively enumerable proof system, and (2) theories that are categorical for the natural numbers.

The Pictorial Mathematics of Klak-Adbmal

In the distant land of Klak-Adbmal, cut off from the rest of civilization, mathematicians have developed their own distinctive form of math. It is entirely pictorial; every mathematical concept is translated as a diagram consisting of criss-crossing horizontal and vertical lines, and calculations are manipulations of these diagrams.

Translators have so far been able to figure out the symbols for a few basic mathematical concepts, but there are many remaining gaps in our ability to translate Klak-Adbmalist mathematics. This image shows the symbols we’ve figured out so far, as well as a step-by-step calculation of the successor of 1 equaling 2.

The Klak-Adbmalians are very secretive and slow to give up their secrets. Can you figure out the answers to the two questions above on the right?

It’s worth noting that just as there are multiple ways that ordinary mathematicians can write a given number (2 ⋅ 5 = 8 + 2 = 10),  there are many different pictures for any given concept. If you need more assistance, you can take a look at this recently discovered drawing of a Klak-Adbmalian calculation proving that True Or False is True.

And one final hint: Recently we’ve received verification that the following two calculations are aways valid, no matter what Klak-Adbmalist pictures replace the red and green boxes:

The Rise of Anti-Set Theory

This report details the discovery and proliferation of a radical new form of set-theory that has come to be known across math departments across the world as Anti-Set Theory. The earliest appearance of Anti-Set Theory is shrouded in mystery, but my best efforts have traced it to the work of two little-known logicians named Narl Cowman and Bishi Ranger. The germinal notion that would eventually grow into Anti-Set Theory was the idea of a theory of sets obtained by taking all of our familiar intuitions about sets, and require their exact opposites to hold. Any property that would hold of all sets, would of necessity not hold of any the objects in this theory. An attractive idea by any standard, and one self-evidently worthy of pursuit. And thus Anti-Set Theory was born.

The development of Anti-Set Theory progressed in three phases: the Era of Simple Negation (which was plagued with unattractively permissive axioms and a proliferation of models), the Age of Negation after the Universal Quantifier (where the previous problems were resolved, but new problems of consistency and paradox arose), and finally the Modern Age of Axiomatic Anti-Set Theory (where there arose the split between “revolutionary” and “traditionalist” Anti-Settists, as they came to be known). Each of these phases will be explained in more depth in what follows.

The Era of Simple Negation

The first formulations of Anti-Set Theory suffered for practical reasons that were obvious in retrospect, but took embarrassingly long to remedy. The early anti-settists naively took their project to be to simply take the axioms of existing set theories and negate each of them, then to collect them into a new axiomatic theory and explore the consequences. While the idea to build off of existing set theories was a worthy one, there was a serious problem with this approach. Namely, the axioms of existing set theories were designed to be quite strong and restrictive, which made their negations overly weak and permissive. Many axioms had the form “All sets have this property”, the negation of which becomes “At least one set doesn’t have this property”, not “NO sets have this property.” The latter was the ambition of the original anti-settists, but it would take them until the Age of Negation After the Universal Quantifier about a half hour later to realize the crucial error they were making.

For instance, in the axioms of ZF (the orthodox axiomatization of set theory until Anti-Set Theory took the world by storm), there is an axiom known as “the Axiom of Pairing.” This axiom says, in plain English, that for any two sets x and y, there exists a set containing just x and y and nothing else (which we typically write as {x, y}). In the language of first-order logic, it said “∀x∀y∃z∀w (w∈z ↔ (w=x ∨ w=y))” (or in a more readable short-hand, “∀x∀y∃z (z = {x, y})”). Anti-Settists saw clearly the obvious benefit of having a set theory with the opposite property, namely that for any two sets x and y, the pair set {x, y} must NOT exist. But their original formulation of the Axiom of Anti-Pairing used simple negation: “¬∀x∀y∃z (z = {x, y})”, which is equivalent to “∃x∃y∀z (z ≠ {x, y})”, or “there are two sets x and y such that no set contains just x and y.” This was obviously too weak for what was desired.

The same problem cropped up for anti-union, anti-comprehension, anti-foundation, and virtually every other axiom of ZF. A new idea was needed. And soon enough, it came: we should place our negations AFTER the initial block of universal quantifiers, not before. The brilliance of this breakthrough is hard to overestimate. Fields Medals were awarded and backs were patted. This brings us to the second phase of the history of Anti-Set Theory.

The Age of Negation After the Universal Quantifier

Now that the Anti-Settists knew how they were going to proceed, they got busy axiomatizing their new theory. Every set has a power set? No more, now no set has a power set. The union of each set exists? Nope! Unions are banned. Infinite sets? Absolutely not.

The first few axioms to be proposed were anti-pairing, anti-union, anti-powerset, and anti-foundation and they took the following form:

Anti-Pairing: ∀x∀y¬∃z∀w (w ∈ z ↔ (w = x ∨ w = y))
Anti-Union: ∀x¬∃y∀z (z ∈ y ↔ ∃w (w ∈ x ∧ z ∈ w))
Anti-Powerset: ∀x∀y∃z (z ⊆ y ∧ z ∉ x)
Anti-Foundation: ∀x∀y (y ∈ x → ∃z (z ∈ x ∧ z ∈ y))
Anti-Infinity: ∀x¬∃y∀z (z ∈ x → ∀w (w = z⋃{z} → w ∈ y))

But soon they began running into trouble. The first hint that something was amiss appeared with the Axiom of Anti-Comprehension. The Axiom of Comprehension says that for any property Φ definable as a sentence of first-order ZF, and for any set X, there exists the subset of X consisting of those elements that satisfy Φ. Anti-Comprehension could thus be written:

Anti-Comprehension: ∀x∀y (y ≠ {z∈x | φ(z)})

Perhaps you can see the problem. What if φ is a tautology? Or in other words, what if φ(z) is a property satisfied by ANY and EVERY set, like, say, z=z? Then our axiom tells us that for any set x, the subset consisting of those elements that are equal to themselves cannot exist. But that’s just x itself! In other words, for any set x, x does not exist. The first Anti-Settists had unwittingly destroyed the entire universe of sets!

Another problem arose with Anti-Replacement. Recall that Replacement says that for any function F(x) definable in a sentence of first-order ZF, and for any set X, the image of X under F exists. So Anti-Replacement states that this image does not exist. But what if F is the identity function? Then Anti-Replacement tells us that the image of X under the identity map (namely, X itself) does not exist. And again, we’ve proved that no sets exist.

Quite simply, the problem the Anti-Settists were running into was that while their original axioms were too weak, their new axioms were too strong! So strong that they DESTROYED THE UNIVERSE. That’s strong.

The solution? Anti-Comprehension and Anti-Replacement were discarded. But there was another problem in the Axiom of Anti-Extensionality. Extensionality says that any two sets are the same if they share all the same elements. So Anti-Extensionality says that if two sets share all the same elements, then they are not the same. But if we compare any set X with itself using this standard, we find that X cannot equal X! This was even worse than before, because it violates the first-order tautology ∀x (x = x). Not only did this second wave of Anti-Settists destroy the universe, they also broke the rules of logic!

So Anti-Extensionality had to go. That much everybody agreed on. And the removal of this axiom settled the last of the paradoxes of Anti-Set Theory. But no sooner had the dust settled than a new controversy arose as to the nature of Extensionality…

The Great Schism: The Modern Age of Anti-Set Theory

So far, the Anti-Settists had compiled the following list of axioms:

Anti-Pairing: ∀x∀y¬∃z∀w (w ∈ z ↔ (w=x ∨ w=y))
Anti-Union: ∀x¬∃y∀z (z∈y ↔ ∃w (w∈x ∧ z∈w))
Anti-Powerset: ∀x∀y∃z (z⊆y ∧ z∉x)
Anti-Foundation: ∀x∀y (y∈x → ∃z (z∈x ∧ z∈y))
Anti-Infinity: ¬∃x∀y (y ∈ x → ∀z (z = y⋃{y} → z ∈ x))

You might have noticed that the Axiom of Anti-Infinity looks a little strange. The Axiom of Infinity tells us that there’s a set X that contains all empty sets, and such that for any set Y contained in X, if Y has a successor then that successor is also in X. So the Axiom of Anti-Infinity will say that there is no such set. We’ll prove shortly that in fact, the other axioms entail that there are no empty sets, so it’s vacuously true of every set that it “contains all empty sets.” Thus the only restriction placed on our universe by the Axiom of Anti-Infinity is that there’s no set that contains the successors of all its members with successors.

For a while, everybody agreed on this list of axioms and all was well. But after a couple of hours, new rumbles arose about Extensionality. A new contingent of Anti-Settists began arguing in favor of including the Axiom of Extensionality. It’s hard to describe exactly what was going in the minds of these individuals, who appeared to be turning their backs on everything that Anti-Set Theory was all about by accepting an ORDINARY axiom alongside all of their beautiful Anti-Axioms. Some of them expressed a concern that they had gone too far by turning their back on extensionality, and worried that Anti-Set Theory was so far removed from our intuitions that it no longer deserved to be called a theory of sets. Others pointed out that Anti-Set Theory as it was currently formulated ruled out certain models that they felt deserved to belong to the pantheon of Anti-Set Universes. The rest of the Anti-Settists called them crazy, but they persisted. The fighting reached fever pitch one afternoon in a meeting of the leading Anti-Settists, where one individual who shan’t be named accused another of “selling out” to Traditional Set Theory, and a fist-fight nearly broke out. This led to the Great Schism.

Anti-Settists fractured into two contingents that became known as the traditionalists, who advocated an extensional Anti-Set Theory, and the revolutionaries, who wanted an intensional Anti-Set Theory where two sets could share all the same elements but still be different. In recent history the fighting has cooled off, probably because both sides noticed that there actually didn’t seem to be all that huge of a difference between extensional and intensional anti-set theory. The primary realization was that the other axioms banned any objects based off of the elements they contained (so that anti-pairing, for instance, really says that there can be no set with the same elements as the pair of x and y, not just that the pair of x and y doesn’t exist).

Extensional Anti-Set Theory
Extensionality: ∀x∀y (∀z (z ∈ x ↔ z ∈ y) → x = y)
Anti-Pairing: ∀x∀y¬∃z∀w (w ∈ z ↔ (w = x ∨ w = y))
Anti-Union: ∀x¬∃y∀z (z ∈ y ↔ ∃w (w ∈ x ∧ z ∈ w))
Anti-Powerset: ∀x∀y∃z (z⊆y ∧ z∉x)
Anti-Foundation: ∀x∀y (y ∈ x → ∃z (z ∈ x ∧ z ∈ y))

Intensional Anti-Set Theory
Anti-Pairing: ∀x∀y¬∃z∀w (w ∈ z ↔ (w = x ∨ w = y))
Anti-Union: ∀x¬∃y∀z (z ∈ y ↔ ∃w (w ∈ x ∧ z ∈ w))
Anti-Powerset: ∀x∀y∃z (z⊆y ∧ z∉x)
Anti-Foundation: ∀x∀y (y ∈ x → ∃z (z ∈ x ∧ z ∈ y))

That covers the history of Anti-Set Theory up to modern times. Let’s now take a look at some of the peculiar details of the theory.

Theorem 1: No anti-sets contain exactly one element.

Proof: Suppose that there was an anti-set X such that X contained Y and nothing else. Then the sentence “Aw (w ∈ X ↔ (w = Y ∨ w = Y))” would be true. But this would be a violation of anti-pairing, as there can be no set whose elements are the same as the pair {Y, Y}. So no such anti-set can exist.

Theorem 2: No anti-sets contain exactly two elements.

Proof: Suppose there was an anti-set X such that X contained Y, Z, and nothing else. Then the sentence “Aw (w ∈ X ↔ (w = Y ∨ w = Z))” would be true. But this would be a violation of anti-pairing. So no such anti-set can exist.

Theorem 3: No anti-sets contain an empty anti-set.

Proof: Suppose that some set X contained a set Y, where Y is empty. By Anti-Foundation, every element of X must share an element with X. So Y must share some element with X. But Y contains nothing, so it can’t share any element with X. Contradiction. So no such anti-set exists.

Theorem 4: No anti-set can be its own union.

Proof: Follows trivially from the axiom of Anti-Union: UX doesn’t exist for any x, so if X = UX, then X doesn’t exist. (This rules out anti-sets with the same structure as limit ordinals.)

Theorem 5: No empty anti-sets exist.

Proof: Suppose there exists an empty anti-set X. Then the union of X is also an empty anti-set. The Axiom of Anti-Union says that there can be no anti-set with the same elements as the union of any anti-set. So there can be no empty anti-set. Contradiction, so no empty anti-set exists.

These theorems tell us a lot about the structure of anti-sets. In particular, every anti-set contains at least three elements, and each of those elements in turn contains at least three elements, and so on forever. So we only have infinite descending membership-chains of anti-sets.

Also, we’ve ruled out anti-sets with zero, one, and two elements, so you might think that no three-element anti-sets can exist either. But the same argument we used for one- and two-element anti-sets doesn’t work any longer, since pairing never produces three-element sets. In fact, the first model of Anti-Set Theory discovered contained exactly five anti-sets, each of which had three elements. This model is an intensional model, as it’s crucial that two of the sets (C and X) contain the same elements. We’ll call this the Primordial Anti-Set Model.

Universe: P, A, B, C, X
P = {A, B, C}
A = {B, C, X}
B = {A, C, X}
C = {A, B, X}
X = {A, B, X}

Here’s two attempts to visualize this model:

Beautiful! Let’s check each of the axioms to convince ourselves that it really is a valid model.

Anti-Union: U(P) = A ⋃ B ⋃ C = {A, B, C, X}. Similarly, you can show that U(A), U(B), U(C), and U(X) all contain A, B, C, and X. And {A, B, C, X} is not an anti-set in the universe, so no violations of Anti-Union occur.

Anti-Pairing: This one’s easy, since each of our sets contains three elements, and no three-element set can be formed by pairing.

Anti-Powerset: Note that the axiom of power set only says “there’s a set containing all the existing sets that are subsets of X, for each X”, but it doesn’t bring into existence any new subsets. So for instance, in this model, the power set of P is just {P}, as P is the only subset of P that exists. But {P} doesn’t exist! Similarly, for each element, its power set is just the set of itself, which doesn’t exist by anti-pairing.

Anti-Foundation: Does any set contain an element that it doesn’t have anything in common with? You can just verify this by inspection.

Anti-Infinity: (EDIT: this model and the following one actually violate Anti-Infinity! Realized this after originally posting. A future post will provide more details.)

Now, this was a model only of Intensional Anti-Set Theory, not Extensional Anti-Set Theory. Here’s a way to modify it to get a model that works in both theories:

Universe: T, A, B, C
T = {A, B, C}
A = {T, B, C}
B = {A, T, C}
C = {A, B, T}

Again, convince yourself that this universe satisfies the axioms.

Using a similar construction, we can show that there are models of anti-set theory with exactly N sets, for every N ≥ 4. Our universe: T, A1, A2, …, AN-1. T = {An | n ∈ {1,2,…,N-1}, Ak = {T} ⋃ {Aj | j ∈ {1,2,…,N-1} \ {k}}.

Many lines of research remain open in the field of Anti-Set Theory. Can Anti-Sets serve as a new foundation for mathematics? Are there models of Anti-Set Theory that contain structures isomorphic to the natural numbers? Is the theory still consistent if we include an Axiom of Anti-Choice? (Adding Anti-Choice rules out the models we’ve discussed so far. Perhaps only infinite universes are allowed with Anti-Choice.) Other proposed additions to the axioms include the simple negation of extensionality, and an “Axiom of Undefinability”, which says that no set exists whose elements satisfy a definable property. How would these additions affect the universe of sets?

I want to close by noting that set theory sometimes gets a bad rap among mathematicians and the wider community for being too abstract or not “useful enough.” We hope that the advent of Anti-Set Theory will make it plain to the public that there is a future world in which set theory can be of GREAT practical use to everybody. The possible applications of Anti-Set Theory are too numerous to count, so numerous in fact that I regard as unnecessary naming any specific examples.

The Transfinite Factorial

Chapter 1: When Circularity is not Paradoxical

Circular definitions can be quite bad. Take the definition “a is the natural number that is equal to a”, or worse, “a is the natural number that is NOT equal to a”. Or, for an even more perverse one, “a+1, where a is the smallest number that can be defined in fewer than 50 words.” But just as not all self-reference is paradoxical, similarly not all circularity is vicious. Circular definitions form a crucial component of every area of math, where they go under the more polite term “recursion.”

The key component of a benign circular definition is that it avoids infinite regress. So “a=a” is unhelpful, because to evaluate a you must first know what a evaluates to. But what about “an = n ⋅ an-1, and a0 = 1″? Now to know a500, you only need to know a499, which in turn only requires you know a498, which requires… and so on, each step moving a little closer to the base case “a0 = 1″, at which point you can travel all the way back up the chain to discover that a500 = 500!. (That’s 500 factorial, not an excitable 500.)

Douglas Hofstadter has a parable about the “porpuquine”, a rare cousin of the porcupine that grows smaller porpuquines on its back rather than quills. The market value of such a beast is determined by the total number of porpuquine noses you can find on it. And if you’re worried about an infinite regress, don’t fear! There’s a smallest porpuquine, which is entirely bald.

Recursive definitions are extremely nice when we want to define a function whose outputs are related to each other in a more simple way than they are to the inputs. Some examples (in which S stands for the successor function, which takes 0 to 1 to 2 to …):

Addition
∀x (x + 0 = x)
∀x∀y (x + S(y) = S(x+y))

Multiplication
∀x (x ⋅ 0 = 0)
∀x∀y (x ⋅ S(y) = x⋅y + x)

Notice the similarity between these two definitions. In each case we define what happens when we add 0 to a number, and then we define what happens when we add a successor to a number. This covers all the natural numbers! Let’s show this in action by proving that 9+3 = 12.

9 + 3
9 + S(2)
S(9 + 2)
S(9 + S(1))
S(S(9 + 1))
S(S(9 + S(0)))
S(S(S(9 + 0)))
S(S(S(9)))
S(S(10))
S(11)
12

The colorful step is where we hit our base case (evaluating x+0), at which point we need merely to pass our answers up the levels to get back to the solution of our original problem.

Another: Let’s show that 3⋅2 = 6.

3⋅2
3⋅S(1)
3⋅1 + 3
3⋅S(0) + 3
(3⋅0 + 3) + 3
(0 + 3) + 3
(0 + S(2)) + 3
S(0 + 2) + 3
S(0 + S(1)) + 3
S(S(0 + 1)) + 3
S(S(0 + S(0))) + 3
S(S(S(0 + 0))) + 3
S(S(S(0))) + 3
S(S(1)) + 3
S(2) + 3
3 + 3
3 + S(2)
S(3 + 2)
S(3 + S(1))
S(S(3 + 1))
S(S(3 + S(0)))
S(S(S(3 + 0)))
S(S(S(3)))
S(S(4))
S(5)
6

Each green line is where we use a “base case rule” – the first is the base case for multiplication, the second and third for addition. I like that you can see the complexity growing in the length of the line until you hit the base case, at which point the lines start shortening until we next need to apply the recursive step.

Chapter 2: Infinite Recursion

Now what happens if we want to add x + y but y is neither 0 nor a successor. “Huh? Isn’t every non-zero number a successor of some number?” I hear you ask. Not infinity! Or more precisely, not ω. Our recursive definitional scheme is all well and good for finite numbers, but it fails when we enter the realm of transfinite ordinals.

No problem! There’s an easy patch. The problem is that we don’t know how to add or multiply x and y when y is a limit ordinal. So we just add on an extra rule to cover this case!

Addition
α + 0 = α
α + S(β) = S(α + β)
α + λ = U{α + β | β ∈ λ}

Multiplication
α ⋅ 0 = 0
α ⋅ S(β) = α⋅β + α
α + λ = U{α ⋅ β | β ∈ λ}

If you’re scratching your head right now at the notion of “the union of a set of numbers” or “a number being an element of another number”, go read this for a thorough summary. In short, in ZFC we encode each number as the set of all smaller numbers (e.g. 3 = {0, 1, 2}), and ω (the first infinite number) is defined to be the set of all natural numbers (ω = {0, 1, 2, 3, …}). So 3 ⋃ 5 = {0,1,2} ⋃ {0,1,2,3,4} = {0,1,2,3,4} = 5. In general, the union of a set of numbers is just the maximum number among them (or if there is no maximum number, then it’s the supremum of all the numbers).

Outside the context of ZFC, we can equally write α + λ = lim (α + β) as β → λ. We can also write α + λ = sup {α + β | β < λ}. And for those more familiar with ZFC, we don’t need to take the union over all β ∈ λ, we can make do with any cofinal subset (we’ll do this to simplify calculations greatly later in the post).

Now we’ve extended + and × into the infinite, and what we get is a little unusual. The first major sign that things have changed is that addition and multiplication are no longer commutative.

1 + ω = ⋃{1 + n: n ∈ ω} = U{1, 2, 3, 4, …} = ω
ω + 1 = ω + S(0) = S(ω + 0) = S(ω)

ω certainly doesn’t equal S(ω), so ω + 1 ≠ 1 + ω.

Distributivity also fails! For instance (ω + 3) ⋅ 2 = ω⋅2 + 3, not ω⋅2 + 6. But we don’t lose every nice algebraic feature. Associativity still holds for both addition and multiplication! I.e. α + (β + γ) = (α + β) + γ and α ⋅ (β ⋅ γ) = (α ⋅ β) ⋅ γ. This result is really important.

Earlier we showed that 1 + ω = ω. One can easily show that in general, n + ω = ω for any finite n, and the same holds for α + β is “infinitely smaller” than β. This notion of “infinitely smaller” is a little subtle; ω is not infinitely smaller than ω⋅2, but it is infinitely smaller than ω2. (It should be noted that everything up to here has been standard textbook stuff, but from here on I will be introducing original notions that I haven’t seen elsewhere. So treat what follows this with skepticism.)

A useful definition for us will be α << β (read as “α is infinitely less than β”) if α + β = β. We’ll also say that α and β are “comparable” if neither is infinitely smaller than the other: α ~ β if ¬(α << β) and ¬(β << α). Some conjectures I have about the comparability relation:

Conjecture 1: Commutativity holds between comparable ordinals.

Conjecture 2: α ~ β if and only if there’s some finite n such that α < β⋅n and β < α⋅n.

I’m actually fairly confident that Conjecture 1 is false, and would love to see a counterexample.

Here are some more useful identities for transfinite addition and multiplication:

Theorem 1: For finite n and limit ordinal λ, α⋅(λ + n) = α⋅λ + α⋅n

Proof sketch: α⋅(λ + n) = α⋅(λ + (n-1)) + α = α⋅(λ + (n-2)) + α + α = … = α⋅(λ + 0) + α + α + … + α = α⋅λ + α⋅n.

Theorem 2: For finite n and β << α, (α + β)⋅n = α⋅n + β.

Proof sketch: (α + β)⋅n = (α + β)⋅(n-1) + (α + β) = (α + β)⋅(n-2) + (α + β) + (α + β) = … = (α + β)⋅0 + (α + β) + … + (α + β) = (α + β) + (α + β) + … + (α + β) + (α + β) = α + (β + α) + (β + … + α) + β = α + α + α + … + α + β = α⋅n + β.

In case it’s not clear what’s going with this proof, the idea is that we expand (α + β)⋅n out to (α + β) + … + (α + β) (n times). Then we shift parentheses to enclose all the middle βs, making them look like (β + α). But then by assumption, since β << α, the β vanishes in each of these middle terms, leaving just the final one: α + α + α + … + α + β = α⋅n + β.

Infinite exponentiation, tetration and so on can be defined in the exact same way as addition and multiplication: first cover the 0 case, then the successor case, and finally the limit case. And the limit case is defined by just taking the union of all smaller cases.

Chapter 3: Transfinite Factorials

Now, what’s the first function you think of when you think of recursively defined functions? That’s right, the factorial! I mentioned it at the beginning of this post, long ago: n! = n⋅(n-1)! and 0! = 1. So let’s extend this definition to obtain a transfinite factorial!

Factorial
0! = 1
S(α)! = α! ⋅ S(α)
λ! = U{β! | β ∈ λ}

Now we can ask ω! is.

Theorem: ω! = ω

Proof: ω! = U{n! | n ∈ ω} = U{1, 2, 6, 24, 120, …} = ω (since {1, 2, 6, 24, 120, …} is a cofinal subset of ω).

Next, (ω+1)! = ω! ⋅ (ω + 1) = ω ⋅ (ω + 1) = ω⋅ω + ω⋅1 = ω2 + ω. Note that to prove this, we’ve used Theorem 1 from before.

And we can keep going from there.

Theorem: (ω + n)! = ωn+1 + ωn⋅n + ωn-1⋅(n – 1) + … + ω2⋅2 + ω.

Try to prove this for yourself!

Now that we know what (ω + n)! is for each finite n, we are able to calculate what (ω⋅2)! is (using the fact that {ω + n | n ∈ ω} is a cofinal subset of ω⋅2). I’ll write its value, as well as the next few interesting ordinals:

(ω⋅2)! = U{ω!, (ω+1)!, (ω+2)!, …} = ωω
(ω⋅3)! = ωω⋅2
(ω⋅n)! = ωω⋅(n-1)

Using the same trick as before, this enables us to solve (ω2)!

2)! = U{ω!, (ω⋅2)!, (ω⋅3)!, …} = U{ω, ωω, ωω⋅2, ωω⋅3, …} = ωω2

From here, things suddenly take on a remarkable regularity (for a while at least). Looking at limit ordinals above ω2, they appear to have the following structure, at least up until we arrive at the first ordinal where standard notation fails us: ωωω = ε0.

Conjecture: For λ a limit ordinal such that ω2 ≤ λ < ε0, λ! = ωλ

If this conjecture is right, then (ωω)! = ωωω, and (ωωω)! = ωωωω, and so on. We can also write this using tetration notation by saying that (nω)! = n+1ω.

But something different happens once we get to ε0!

Conjecture: ε0! = U{(nω)! | n ∈ ω} = U{ω, 2ω, 3ω, 4ω, …} = ε0

If this is right, then this is pretty interesting behavior! The fixed points of the factorial function are 1, ω, and ε0, at the very least. What happens after ε0? The factorial function “loops” in a certain sense! Just as (ω+1)! was ω2 + ω, so (ε0+1)! ends up being ε02 + ε0. And (ε0⋅2)! = ε0ε0, paralleling (ω⋅2)! = ωω. And so on.

Conjecture: εn is a fixed point of the factorial function for all n.

There are plenty more questions to be answered. Is ω1 another fixed point of the factorial function? What exactly does the list of all the fixed points look like? Are there any cardinals that aren’t equal to their own factorial? What exactly is the general pattern for factorials of successor ordinals? How does α! compare to αα in general?

On Self-Hating Theories of Arithmetic

Gödel’s second incompleteness theorem tells us that no (sufficiently powerful) consistent theory can prove the statement of its own consistency. But did you know that such a theory can prove the statement of its own inconsistency? A consistent theory that claims to be inconsistent is what I call a self-hating theory.

My convention in what follows: ℕ refers to the real, true natural numbers, the set consisting of {0, 1, 2, 3, …} and nothing else. ω refers to the formal object that exists in a theory of arithmetic that is “supposed to be” ℕ, but inevitably (in first-order logic) fails to be.

When I write ℕ ⊨ ψ, I am saying that the sentence ψ is true of the natural numbers. When I write T ⊢ ψ (resp. T ⊬ ψ), I am saying that the sentence ψ can be (resp. can’t be) proven from the axioms of the theory T. And when I write T ⊨ ψ, I am saying that the axioms of T semantically entail the truth of ψ (or in other words, that ψ comes out true in all models of T). The next two paragraphs will give some necessary background on Gödel’s encoding, and then we’ll explore the tantalizing claims I started with.

Gödel’s breathtakingly awesome insight was that within any language that is expressive enough to talk about natural number arithmetic, one can encode sentences as numbers and talk about syntactic features of these sentences as properties of numbers. When a number n encodes a sentence ψ, we write n = ⟦ψ⟧. Then Gödel showed that you can have sentences talking about the provability of other sentences. (The next step, of course, was showing that you can have sentences talking about their own provability – sneaking in self-reference through the back door of any structure attempting to describe arithmetic.)

In particular, in any theory of natural number arithmetic T, one can write a sentence that on its surface appears to just be a statement about the properties of natural numbers, but when looked at through the lens of Gödel’s encoding, ends up actually encoding the sentence “T ⊢ ψ”. And this sentence is itself encoded as some natural number! So there’s a natural number n such that n = ⟦T ⊢ ψ⟧. It’s a short step from here to generating a sentence that encodes the statement of T’s own consistency. We merely need to encode the sentence “¬∃n (n = ⟦T ⊢ 0=1⟧)”, or in English, there’s no number n such that n encodes a proof of “0=1” from the axioms of T. In even plainer English, no number encodes a proof of contradiction from T (from which it follows that there IS no proof of contradiction from T, as any proof of contradiction would be represented by some number). We write this sentence as Con(T).

Okay, now we’re in a position to write the original claim of this post more formally. If a theory T is consistent, then ℕ ⊨ Con(T). And Gödel’s second incompleteness theorem tells us that if ℕ ⊨ Con(T), then T ⊬ Con(T). But if T doesn’t prove the sentence Con(T), then no contradiction can be derived by adding ¬Con(T) as an axiom! So (T + ¬Con(T)) is itself a consistent theory, i.e. ℕ ⊨ Con(T + ¬Con(T)). But hold on! (T + ¬Con(T)) can prove its own inconsistency! Why? Because (T + ¬Con(T)) ⊢ ¬Con(T), i.e. it proves that a contradiction can be derived from the axioms of T, and it also has as axioms every one of the axioms of T! So the same number that encodes a proof of the inconsistency of T, also counts as a proof of the inconsistency of (T + ¬Con(T))!

Summarizing this all:

ℕ ⊨ Con(T)

T ⊬ Con(T)

ℕ ⊨ Con(T + ¬Con(T)),
but
(T + ¬Con(T)) ⊢ ¬Con(T + ¬Con(T))

There we have it, a theory that is consistent but proves its own inconsistency!

Expressed another way:

T ⊢ ∃n (n = ⟦T ⊢ 0=1⟧),
but
T ⊬ 0=1

Ok, so believe it or not, a lot of the strangeness of this can be explained away by thinking about the implications of nonstandard models of arithmetic. One easy way to see this is to reflect on the fact that, as we saw above, “T is consistent” becomes in Gödel’s encoding, “There is no natural number n such that n encodes a proof of T’s inconsistency.” Or more precisely, “T is consistent” becomes “There is no natural number n such that n = ⟦T ⊢ 0=1⟧.”

Now, no first-order theory can pin down the natural numbers.

(I’ve written about this here and here.) I.e. no first order theory can express a quantification like “there is no natural number N such that …”. You can try, for sure, by defining some object ω and adding axioms to restrict its structure to look more and more like ℕ, but no matter how hard you try, no matter how many axioms you add, there will always be models of the theory in which ω ≠ ℕ. In particular, ω will be a strict superset of ℕ in all of these nonstandard models (ℕ ⊂ ω), so that ω contains all the naturals but also additional nonstandard numbers.

So now consider what happens when we try to quantify over the naturals by saying “∀x ∈ ω”. This quantifier inevitably ranges over ALL of the elements of ω in each model, so it also touches the nonstandard numbers in the nonstandard models. This means that the theory only semantically entails quantified statements that are true of all possible nonstandard numbers! (Remember, T ⊨ ψ means that ψ is true in ALL models of T.)

One nice consequence of this is that if T has a model in which ω = ℕ then in this model “∀x∈ω Φ(x)” is true only if Φ(x) is true of all natural numbers. By the completeness of first-order logic, this means that T can’t prove “∀x∈ω Φ(x)” unless it’s true of ℕ. This is reassuring; if T ⊢ ∀x∈ω Φ(x) and T has a model in which ω = ℕ, then ℕ ⊨ ∀x∈ω Φ(x).

But the implication doesn’t go the other way! ℕ ⊨ ∀x∈ω Φ(x) does not guarantee us that T ⊢ ∀x∈ω Φ(x), because T can only prove that which is true in EVERY model. So T can only prove “∀x∈ω Φ(x)” if Φ(x) is true of all the naturals and every nonstandard number in every model of T!

This is the reason that we don’t know for sure that if Goldbach’s conjecture is true of ℕ, then it’s provable in Peano arithmetic. On the face of it, this looks quite puzzling; Goldbach’s conjecture can be written as a first-order sentence and first-order logic is complete, so if it’s true then how could we possibly not prove it? The answer is hopefully clear enough now: Goldbach’s conjecture might be true of all of ℕ but false of some nonstandard models of Peano arithmetic (henceforth PA).

You might be thinking “Well if so, then we can just add Goldbach’s conjecture as an axiom to PA and get rid of those nonstandard models!” And you’re right, you will get rid of those nonstandard models. But you won’t get rid of all the nonstandard models in which Goldbach’s conjecture is true! You can keep adding as axioms statements that are true of ℕ but false of some nonstandard model, and as you do this you rule out more and more nonstandard models. At the end of this process (once your theory consists of literally all the first-order sentences that are true of ℕ), you will have created what is known as “True Arithmetic”: {ψ | ℕ ⊨ ψ}.

But guess what! At this point, have you finally ruled out all the nonstandard models? No! There’s still many many more (infinitely many, in fact! Nonstandard models of every cardinality! So many models that no cardinality is large enough to describe how many!) Pretty depressing, right? There are all these models that agree with ℕ on every first order sentence! But they are still not ℕ (most obviously because they contain numbers larger than 0, 1, 2, and all the rest of ℕ).

The nonstandard models of True Arithmetic are the models that are truly irremovable in any first-order theory of arithmetic. Any axiom you add to try to remove them will also remove ℕ as a model. And when you remove ℕ as a model, some pretty wacky stuff begins to happen.

Fully armed now with new knowledge of nonstandard numbers, let’s return to the statement I started with at the top of this post: there are consistent theories that prove their own inconsistency. The crucial point, the thing that explains this apparent paradox, is that all such theories lack ℕ as a model.

If you think about this for a minute, it should make sense why this must be the case. If a theory T is consistent, then the sentence “∀x∈ω (x ≠ ⟦T ⊢ 0 = 1⟧)” is true in a model where ω = ℕ. So if T has such a model, then T simply can’t prove its own inconsistency, as it’s actually not inconsistent and the model where ω = ℕ will be able to see that! And once more, T can only prove what’s true in all of its models.

Okay, so now supposing T is consistent (i.e. ℕ ⊨ Con(T)), by Gödel’s second incompleteness theorem, T cannot prove its own consistency. This means that (T + ¬Con(T)) is a consistent theory! But (T + ¬Con(T)) no longer has ℕ as a model. Why? Because ℕ ⊨ Con(T) and (T + ¬Con(T)) ⊨ ¬Con(T). So for any consistent theory T, (T + ¬Con(T)) only has nonstandard models. What does this mean about the things that T + ¬Con(T) proves? It means that they no longer have to be true of ℕ. So for instance, even though ℕ ⊨ Con(T + ¬Con(T)), (T + ¬Con(T)) might end up proving ¬Con(T + ¬Con(T)). And in fact, it does prove this! As we saw up at the top of this post, a moment’s examination will show that (T + ¬Con(T)) asserts as an axiom that a contradiction can be derived from the axioms of T, but also contains all the axioms of T! So by monotonicity, (T + ¬Con(T)) proves ¬Con(T + ¬Con(T)).

What do we say of this purported proof of contradiction from (T + ¬Con(T))? Well, we know for sure that it’s not a standard proof, one that would be accepted by a mathematician. I.e., it asserts that there’s some n in ω that encodes a proof of contradiction from (T + ¬Con(T)). But this n is not actually a natural number, it’s a nonstandard number. And nonstandards encode proofs only in the syntactical sense; a nonstandard proof is a proof according to Gödel’s arithmetic encoding, but Gödel’s arithmetic encoding only applies to natural numbers. So if we attempted to translate n, we’d find that the “proof” it encoded was actually nonsense all along: a fake proof that passes as acceptable by wearing the arithmetic guise of a real proof, but in actuality proves nothing whatsoever.

Summarizing:

In first order logic, every theory of arithmetic has nonstandard models that foil our attempts to prove all the truths of ℕ. Theories of arithmetic with ONLY nonstandard models and no standard model can prove things that don’t actually hold true of ℕ. In particular, since theories of arithmetic can encode statements about their own consistency, theories that don’t have ℕ as a model can prove their own inconsistency, even if they really are consistent.

So much for first order logic. What about

Second Order Logic?

As you might already know, second order logic is capable of ruling out all nonstandard models. There are second order theories that are categorical for ℕ. But there’s a large price tag for this achievement: second order logic has no sound and complete proof system!

Sigh. People sometimes talk about nature being tricky, trying to hide aspects of itself from us. Often you hear this in the context of discussions about quantum mechanics and black holes. But I think that the ultimate trickster is logic itself! Want a logic that’s sound and complete? Ok, but you’ll have to give up the expressive power to allow yourself to talk categorically about ℕ! Want to have a logic with the expressive power to talk about ℕ? Ok, but you’ll have to give up the possibility of a sound and complete proof system! The ultimate structure of ℕ remains shifty, slipping from our view as soon as we try to look closely enough at it.

Suppose that T is a second order theory that is categorical for ℕ. Then for every second-order sentence ψ that is true of ℕ, T ⊨ ψ. But we can’t make the leap from T ⊨ ψ to T ⊢ ψ without a complete proof system! So there will be semantic implications of T that cannot actually be proven from T.

In particular, suppose T is consistent. Then T ⊨ Con(T), but T ⊬ Con(T), by Gödel’s second. And since T ⊬ Con(T), (T + ¬Con(T)) is consistent. But since T ⊨ Con(T), (T + ¬Con(T)) ⊨ Con(T). So (T + ¬Con(T)) ⊨ Con(T) ∧ ¬Con(T)!

In other words, T + ¬Con(T) actually has no model! But it’s consistent! There are consistent second-order theories that are actually not logically possible – that semantically entail a contradiction and have no models. How’s that for trickiness?

Updated Introduction to Mathematical Logic

I’ve noticed that some of my posts that have gotten the most hits are my introductions to mathematical logic. Overall, I’m pretty happy with that series of posts, though I think since then I’ve come up with clearer and more precise ways of talking about the primary distinctions: syntax, semantics, and proof systems.

A little while ago, I gave a talk on the subject of “true but unprovable” statements. The goal of the talk was just to explore this notion and explain what exactly it means to a logician. In the course of developing the talk, I ended up re-summarizing the basics of mathematical logic. I think this summary makes a good “companion piece” to my earlier introductory series, as well as being more standard in its presentation. The most important slide is this one:

Here’s the link to the full slideshow: True but Unprovable

And here you can look through it on this blog (but some of the formatting got a little wacky in the embedding process).

Finiteness can’t be captured in a sound, complete, finitary proof system

Consider the sentence “This blog has finitely many posts.” Do you understand what this sentence means? Then no set of rules (even infinite, even uncomputable!) can model your reasoning. This claim may sound shocking, but it can be justified on solid metamathematical grounds.

Another example: the sentence “There are finitely many planets in the universe.” You don’t have to think it’s true, you just have to think you understand what it means. What’s the common theme? It’s the notion of there being ‘finitely many’ of some class of objects. Let’s imagine building a language that has the expressive resources of first-order logic (which are quite modest), plus an additional quantifier F, whose semantics are given by the following rule: Fx φ(x) is satisfied by a model iff there are only finitely many objects in that model that satisfy φ(x).

It turns out that any language consisting of first order logic plus the quantifier F can’t be axiomatized in any sound, complete, and finitary proof system. Notice that effective enumerability of the rules of the proof system is not a requirement here! So long as the language is strong enough to express the semantics of {∧, ¬, ∀, F, variables xn, and relations Rn}, no set of sentences and sentence-manipulation rules in that language will suffice to capture these semantics.

Here’s a proof: consider the first-order theory of Peano arithmetic. This theory has nonstandard models (as any theory of arithmetic must have in a logic that is compact). All of these nonstandard models have the following feature: that there are numbers that are larger than infinitely many numbers. Think about what this means: this is a common feature of all nonstandards, so if we could write a sentence to express this feature then we could rule them all out! This is where the quantifier F steps in. With F, we can write the following simple sentence and add it to PA as an axiom:

∀x Fy (y < x)

In English: every number has finitely many numbers less than it. And with that, we’ve ruled out all nonstandard models! So now we have a theory that is categorical for ℕ. And that’s a big deal, metamathematically speaking!

Why? Well, as I’ve talked about in a previous post, you can prove some pretty strong limitative results about any logic that can produce a theory that’s categorical for ℕ. In particular, if we can produce such a theory then its logic cannot be compact. Quick proof: suppose a set of sentences Σ models ℕ. Add to Σ a constant c and the axioms “c ≠ 0”, “c ≠ 1”, “c ≠ 2”, and so on, and call this new set Σ’. Every finite subset of Σ’ models ℕ. So by compactness, Σ’ has a model. But this model is nonstandard – it contains numbers that aren’t natural numbers. And since Σ is a subset of Σ’, any model of Σ’ is also a model of Σ.

So compactness implies that no theory is categorical for ℕ. But compactness follows from the following three properties: a sound and complete proof system (Σ ⊢ α if and only if Σ ⊨ α), and that all proofs are only finitely long (try expressing this property without F!). Quick proof: If a set of sentences is finitely satisfied, then every finite subset of it has a model (by definition), so no finite subset of it can be refuted (by soundness), so the entire set can’t be refuted (by finite proofs), so the entire set is satisfied (by completeness).

So soundness + completeness + finiteness ⇒ compactness ⇒ the existence of nonstandard models of arithmetic in any theory that models ℕ. Which means that the semantics of F cannot be captured in any sound, complete, and finite proof system!

Take your pick: either you don’t really understand the semantics of the “finitely many” quantifier F, or no set of rules (not even requiring this set to be finite or computable) can fully capture your reasoning in finite-length proofs.

More information about related extensions of first-order logic and their limitations can be found here. The result I describe here is a rephrasing of results discussed there.

The Mindblowing Goodstein Sequences

Goodstein sequences are amazingly strange and unintuitive. I want to tell you all their unusual properties up-front, but I also don’t want to spoil the surprise. So let me start by defining the Goodstein sequence of a number.

We start by defining the hereditary base-n representation of a number m. We first write m as a sum of powers of n (e.g. for m = 266 and n = 2, we write 266 = 28 + 23 + 21). Now we write each exponent as a sum of powers of n. (So now we have 266 = 223 + 22+1 + 21). And we continue this until all numbers in the representation are ≤ n. For 266, our representation stabilizes at 266 = 222+1 + 22+1 + 21.

Now we define Gn(m) as follows: if m = 0, then Gn(m) = 0. Otherwise, Gn(m) is the number produced by replacing every n in the hereditary base-n representation of m by n+1 and subtracting 1. So G2(266) = G2(222+1 + 22+1 + 21) = 333+1 + 33+1 + 31 – 1 ≈ 4.4 × 1038.

Finally, we define the Goodstein sequence for a number n as the following:

a0 = n
a1 = G2(a0)
a2 = G3(a1)
a3 = G4(a2)
And so on.

Let’s take a small starting number and look at its Goodstein sequence. Suppose we start with 4.

a0 = 4 = 22
a1 = 33 – 1 = 2⋅32 + 2⋅3 + 2 = 26
a2 = 2⋅42 + 2⋅4 + 1 = 41
a3 = 2⋅52 + 2⋅5 = 60
a4 = 2⋅62 + 2⋅6 – 1 = 2⋅62 + 6 + 5 = 83
a5 = 2⋅72 + 7 + 4 = 109

Now, notice that the sequence always seems to increase. And this makes sense! After all, each step we take, we increase the base of each exponent by 1 (a big increase in the value) and only subtract 1. Here’s a plot of the first 100 values of the Goodstein sequence of 4:

And for further verification, here are the first million values of the Goodstein sequence of 4:

Looks like exponential growth to me! Which, again, is exactly what you’d expect. In addition, the higher the starting value, the more impressive the exponential growth we see. Here we have the first 100 values of the Goodstein sequence of 6:

Notice how after only 100 steps, we’re already at a value of 5 × 1010! This type of growth only gets more preposterous for larger starting values. Let’s look at the Goodstein sequence starting with 19.

a0 = 19
a1 ≈ 7 × 1012
a2 ≈ 1.3 × 10154
a3 ≈ 1.8 × 102184
And so on.

We can’t even plot a sequence like this, as it’ll just look like two perpendicular lines:

Alright, so here’s a conjecture that hopefully you’re convinced of: Goodstein sequences are always growing. In fact, let’s conjecture something even weaker: Goodstein sequences never terminate (i.e. never go to zero).

We’re finally ready for our first big reveal! Both of these conjectures are wrong! Not only do some Goodstein sequences terminate, it turns out that EVERY Goodstein sequence eventually terminates!

This is pretty mind-blowing. What about all those graphs I showed? Well, the sequences do increase for a very very long time, but they eventually turn around. How long is eventually? The Goodstein sequence of 4 takes 3 × 2402,653,210 – 1 steps to turn around!! This number is too large for me to be able to actually show you a plot of the sequence turning around. But we do know that once the Goodstein sequence stops increasing, it actually stays fixed for 3 × 2402,653,209 steps before finally beginning its descent to zero. Heuristically, it would look something like this:

(Why does it stay constant, incidentally? The reason is that when we get to a number whose hereditary base-B notation just looks like B + n for some n < B, we increase B by 1 and decrease n by 1 each step. This doesn’t change the sequence’s value! So it stays constant until we get to n = 0. At that point, we decrease by 1 until we descend all the way to zero.)

So, how do we prove this? We do it using infinite ordinals. (Now you see why this topic caught my eye recently!) The plan is to associate with Goodstein sequence a “matching sequence” of ordinals. The way to do this is pretty simple: for a number written in hereditary base-n notation, just replace every n in that representation with ω. Now you have an infinite ordinal!

So, for example, here’s the first few numbers in the Goodstein sequence of 4, along with their translation to ordinals:

4 = 22 becomes ωω
26 = 33 – 1 = 2⋅32 + 2⋅3 + 2 becomes 2ω2 + 2ω + 2
41 = 2⋅42 + 2⋅4 + 1 becomes 2ω2 + 2ω + 1
60 = 2⋅52 + 2⋅5 becomes 2ω2 + 2ω
83 = 2⋅62 + 2⋅6 – 1 = 2⋅62 + 6 + 5 becomes 2ω2 + ω + 5
109 = 2⋅72 + 7 + 4 becomes 2ω2 + ω + 4

Examine this sequence of ordinals for a couple of minutes. You might notice something interesting: at each step, the ordinal is decreasing! This turns out to always hold true. Let’s see why.

Notice that there are two cases: either the ordinal is a successor ordinal (like ω2 + 4), or it’s a limit ordinal (like ω2 + ω). If it’s a successor ordinal, then the next ordinal in the sequence is just its predecessor. (So ω2 + 4 becomes ω2 + 3). This is obviously a smaller ordinal.

What if it’s a limit ordinal? Then the smallest limit ordinal in its expanded representation drops to a smaller limit ordinal and some finite number is added. So for instance, in the ordinal ω2 + 2ω, 2ω will drop to ω and some finite number will be added on depending on which step in the sequence you’re at. So we’ll end up with ω2 + ω + n, for some finite n. This ordinal is smaller than the original one, because we’ve made an infinitely large jump downwards (from one limit ordinal to a lower limit ordinal), and only increased by a finite amount.

So in either case, we go to a smaller ordinal. And now the key realization is that every decreasing sequence of ordinals MUST terminate! (This is one of my favorite facts about ordinals, by the way. Even though we’re looking at arbitrarily large infinite ordinals, it’s still the case that there is no decreasing sequence that goes on forever. Whenever we hit a limit ordinal, we have to make an infinitely large jump downwards to get to the next element in the sequence, as the ordinal has no predecessor.)

So we’ve paired every Goodstein sequence with a decreasing sequence of ordinals. And every decreasing sequence of ordinals eventually terminates. So the Goodstein sequence must terminate as well! (One easy way to see this is to notice that every value in a Goodstein sequence is ≤ the ordinal assigned to it. Another way is to notice that the only number that could possibly be assigned to the ordinal 0 is 0 itself. So when the decreasing sequence of ordinals hits 0, as it must eventually, so must the Goodstein sequence) And that’s our proof!

That Goodstein sequences always terminate is the big flashy result for this post. But there’s much more that’s cool about them. For instance, the proof that we just used did not rely on the fact that we just increased the base by 1 at each step. In fact, even if we update the base by an arbitrarily large function at each stage, the sequences still must terminate!

Even cooler, the proof that all Goodstein sequences terminate cannot be formalized in Peano arithmetic! And in fact, no proof can be found of this theorem using PA. Laurie Kirby and Jeff Paris proved in 1982 that the statement that Goodstein sequences always terminate could be reduced to a theorem of Gentzen from which the consistency of PA could be deduced. So if PA could prove that Goodstein sequences always terminate, then it could prove its own consistency! Gödel’s second incompleteness theorem tells us that this cannot be, and thus assuming that PA really is consistent, it cannot prove that that these sequences all terminate.

This was historically the third result to be found independent of PA, and it was the first that was purely number-theoretic in character (as opposed to meta-mathematical, like the Gödel sentences). It gives us a very salient way of expressing the limitations of Peano arithmetic: simply write a program that computes the Goodstein sequence of any given input, terminating only when it reaches 0 and outputting the length of the sequence. (Code Golf Stack Exchange shows that this can be done using just 77 characters of Haskell! And here’s a nice way to do it in Python). Now, it happens to be true that this program will always terminate for any natural number input (if run on sufficiently powerful hardware). But Peano arithmetic cannot prove this!

As a final fun suprise, let’s define G(n) to be the length of the Goodstein sequence starting at n. As we’ve already seen, this sequence gets really big really quickly. But exactly how quickly does it grow? Turns out that it grows much much faster than other commonly known fast-growing computable functions. For instance, G(n) grows much faster than Ackermann’s function, and G(12) is already greater than Graham’s number!

Wacky Non-Standard Models of PA

Peano arithmetic is unable to pin down the natural numbers, despite its countable infinity of axioms. In fact, assuming its consistency Peano arithmetic has models of every cardinality, meaning that as far as PA is aware, there might be uncountably many real numbers. (If PA is not consistent then it has no models.) I want to take a look at these non-standard models of PA, especially the countable ones. A natural question is, how many countable non-standard models are there alongside the standard models?

Assuming the consistency of PA (which I will leave out from now on, as it’s assumed in all that follows), there are continuum many non-isomorphic countable models. That’s a lot! That means that there’s a distinct countable non-standard model of arithmetic for every real number. This is our first result: the number of nonstandard models of PA of cardinality ℵ0 is 20. Interestingly, this result generalizes! For every infinite cardinal κ, there are 2κ non-isomorphic nonstandard models of cardinality κ. That’s a lot of nonstandard models! In fact, since any model of cardinality κ involves a specification of some number of constants, binary relations over κ and functions from κ to κ, we know that the maximum number of models of cardinality κ is 2κ. So in this sense, Peano arithmetic has as many nonstandard models of each cardinality as it can have!

Let’s take a closer look at the countable non-standard models. It turns out that we can say a lot about their order type. Namely, all countable non standard models of PA have order type ω + (ω*+ω)·η. What does this notation mean? Let me break down each of the order types involved in that formula:

  • ω: order type of the naturals
  • ω*: order type of the negative integers
  • η: order type of the rationals

So ω*+ω is the order type of the integer line, and (ω*+ω)·η is the order type of a structure that resembles an integer line for each rational number. Thus, every countable non-standard model of PA looks like a copy of natural numbers followed by as many copies of integers as there are rational numbers. The order on this structure is lexicographic: two nonstandards on the same integer line are judged according to their position on the integer line, and two nonstandards on different integer lines are judged according to the position of these integer lines on the rational line. It’s not the easiest thing to visualize, but here’s my attempt:

Visualization of the order type of countable nonstandard models of Peano arithmetic

So if all countable non-standard models have the same order type, then where do they differ? It turns out that they differ in the details of how addition and multiplication work on the non-standards. (After all, a model of PA is defined by the size of its universe and its interpretation of ≤, +, and ×. Same order type means same ≤, so what’s left to vary is + and ×.) In each of the models, + and × work exactly like normal on the naturals. And + and × must operate on the non-standards in such a way as to maintain the truth of all the axioms of PA. So, for instance, since PA can prove that 2x = x + x, the same must be true for nonstandard x. And so on. But even given these restrictions, there are still uncountably many ways to define + and × on the non-standards.

What’s more, we have a theorem known as the Tennenbaum Theorem, which tells us that it’s impossible to give recursive definitions to + and × in non-standard models. Said more simply, addition and multiplication are uncomputable in every non-standard model of arithmetic!

One thing I remain unsure of is how the nonstandard models of Peano arithmetic compare to the structure of ω in nonstandard models of ZFC. We know that there must be models of ZFC where ω is nonstandard, by the compactness theorem (define ZFC* to be ZFC with an extra constant c, with the extra axioms “c ∈ ω”, “c ≠ 0”, “c ≠ 1”, “c ≠ 2”, and so on. ZFC* has a model by compactness, and this model is also a model of ZFC by monotonicity.) But is ZFC “better” at ruling out nonstandard models of the naturals than PA is? Or is there a nonstandard ω for every nonstandard model of Peano arithmetic?

The Continuum Hypothesis is False

Before I read this paper, I thought that the continuum hypothesis probably did not have a truth value, and that ZFC + CH and ZFC + ¬CH are systems that equally well capture what it means to be a set. Now I think that the continuum hypothesis is actually false, and that ZFC + CH would not be an adequate representation of our intuitive notion of sets.

The argument in the paper is not a formal proof of ¬CH. Instead, it’s a refutation of the following form: “The continuum hypothesis is equivalent in ZFC to this statement which is obviously false. So the continuum hypothesis must be false.”

So what is this obviously false statement? It is a claim that an event that happens with 100% probability is impossible. More exactly, for any function that assigns to each real number a countable set of real numbers, the claim is that it’s impossible for there exist two real numbers x and y such that x is not in the countable set assigned to y and y is not in the countable set assigned to x.

Why is this obviously false? Well, imagine throwing darts at the real line, so as to randomly select a real each time. If we throw one dart, there is a probability of zero that we would hit a natural number. Why? Because the cardinality of the naturals is less than the cardinality of the reals. By the same token, there is a probability of zero that our dart hits a rational number (as the rationals are also only countably large).

Now, if we throw a second dart, the probability that we hit a real number that’s some rational distance away from the first real number is also zero. Again, this is just because we’re considering a countable set of candidates from an uncountable set of possibilities. So by the same token, if we assign to the first dart ANY countable set of real numbers, then there is probability zero that the second dart lands within that countable set. And if we also assign a countable set of real numbers to the second dart, there is also probability zero that the first dart happened to fall within the countable set assigned to the second.

So, to summarize in more general terms: for any function f that maps each real number to a countable set of real numbers, a random choice of x and y will give us that x ∉ f(y) and y ∉ f(x) with 100% probability. Now, the statement that is equivalent to the falsity of the continuum hypothesis (i.e. the existence of cardinalities between |ω| and |𝒫(ω)|) is that it’s possible to find two real numbers x and y that are not in each others’ countable sets.

Notice how weak this is! By randomly choosing real numbers, we will end up with two that are not in each others’ countable sets with probability 100%. But we’re not even saying that the probability 100% event will always happen! We’re just saying that it’s POSSIBLE that it happens! I.e. imagine that by some miraculous coincidence it happens that the first two real numbers we choose are within each others’ countable sets (technically, only one of them needs to be in the other’s countable set). After all, probability-zero events do happen. Not a problem! Just pick two new real numbers! And if this fails, pick again!

Pretty cool right? Well, it gets even cooler. We just saw that from the statement “for any function f that maps reals to countable sets of reals, there exists x and y such that x ∉ f(y) and y ∉ f(x)”, you can prove that |𝒫(ω)| > |ω1| (where the cardinality of ω1 is by definition the next cardinality after ω). We can now consider THREE reals x, y, and z. Consider mappings from two real numbers to a countable set of reals. We can now state that for any such mapping, none of the three reals is in the countable set assigned to the others. And this entails that we can prove that |𝒫(ω)| > |ω2|! In other words, we can prove that there are at least TWO cardinalities in between the reals and the naturals!

And for ANY n, if we take the statement that there are n reals such that none is inside the countable set mapped to by the others, then we can use this to prove that |𝒫(ω)| > |ωn-1|. Notice: whatever n we choose, there’s still probability 100% that the n reals don’t “overlap” in this way! So for any n, we’re again only claiming that it’s possible for an event with 100% probability to happen. And if we accept this, then this entails that there’s not just a single cardinality between |ω| and |𝒫(ω)|. There’s definitely at least two. And at least three. And so on for ANY finite value of n. Which means that there are infinitely many cardinalities between the naturals and the reals!

Ok, this is a good point to stop reading if you don’t want to know the technical details of how EXACTLY to prove the result and are satisfied to just know what the result says. But if not, then read on! (The proof is actually really short and simple, though it relies on some background knowledge about ordinals.)

I’ll prove the result for n = 2. Suppose that |𝒫(ω)| = ℵ1 = |ω1|. Then by the axiom of choice, there’s some well-ordering of the reals of order type ω1. Call this well-ordering <. We define f(x) to be {y | y ≤ x}. Notice that since ω1 is the FIRST uncountable ordinal, f(x) is always a countable set, no matter what x is! Then, since for any two reals, x ≤ y or y ≤ x, this entails that for any two reals, x ∈ f(y) or y ∈ f(x).

And we’re done! We’ve shown that if the continuum hypothesis is true, then there exists a function f such that for any two reals, x ∈ f(y) or y ∈ f(x). So if we think that there is no such function, then we must deny the continuum hypothesis.