A Coloring Problem Equivalent to the Continuum Hypothesis

Summary
Is there an infinity in between ℵ0 and 20? Is there a way to color the plane with countably many colors so that there are no monochromatic triangles lined up with the x and y axes? If these questions seem unrelated to you, then we had the same initial reaction. But it turns out that an intermediate infinity exists if and only if no such coloring exists.

CH and Coloring

Imagine taking the plane ℝ2 and coloring it entirely. You are allowed infinitely many colors, but only a countable infinity (so no using the whole continuous spectrum). Some example colorings:

The colorings don’t have to be contiguous on the plane like in the ones you just saw. They can have individual points of one color such that neighborhoods of arbitrarily small radius around them are colored differently. So we can get wacky colorings like:

Think about right triangles on the plane that line up with the x and y axes. For instance:

For each of these triangles, of the three sides one must be parallel to the x-axis and another to the y-axis. Now, for a given coloring, we can ask if there exist any triangles whose corners are all the same color. Importantly, we’re only interested in the three corner points, not the entire side length. For the earlier colorings we looked at, we can find such triangles fairly easily:

Even for our wacky coloring, it shouldn’t be too hard to find three points of the same color that meet the criterion.

Now here’s a question for you to ponder: is there any way to color the plane so that no such monochromatic right triangles exist?

I encourage you to pause here and try your best to come up with such a coloring. Use this as a chance to develop some intuitions on whether you think that a coloring like this should exist or not. It’s important to keep in mind that the amount of possible right triangles on the plane is 20, while the number of colors you have available to you is just ℵ0.

(…)

(pause for thought)

(…)

Now, I shall prove to you that the existence of such a coloring is equivalent to the Continuum Hypothesis! The rest of this post will be more involved, and even though I’ve tried to make it less scary with pictures aplenty, it will most likely take a slow and attentive read-through to grasp. But the proof is really cool, so it’s worth it. Oh and also, I have assumed some things as background knowledge for the sake of space. In particular, understanding the proof requires that you understand what ordinals are and why the set of all countable ordinals is the first uncountable ordinal.

If the Continuum Hypothesis is true, then a coloring with no monochromatic right triangles exists.

The proof outline: Consider the first uncountable ordinal, ω1, which is the set of all countable ordinals. We construct a coloring on ω1 × ω1 that satisfies the no-monochromatic-right-triangles property. If the Continuum Hypothesis is true, then |ω1| = |ℝ|, so we form a bijection from ω1 to ℝ. Finally, we use this bijection to map our coloring on ω1 × ω1 to a coloring on ℝ × ℝ, and show that coloring also satisfies the same property.

Alright, so consider ω1 × ω1. If we were to visualize it, it would look something this:

This “plane” is discrete, so in a certain sense it much more closely resembles ℕ2 than ℝ2. The set of colors we’ll be using will be labeled with integers. We split the colors into two categories, the negatives N and the positives P.

N = {-1, -2, -3, …}
P = {0, 1, 2, 3,…}

For each α in ω1, we construct two bijections f: α → N and g: α → P. These bijections exist because every element of ω1 is countable.

We use f and g to construct our coloring. For each α in ω1, we assign the color f(β) to (α, β) for each β < α, and we assign the color g(β) to (β, α), for each β ≤ α.

Since f and g are both bijections, no colors are repeated along the path from (α, 0) to (α, α) to (0, α). And we can easily show that no monochromatic right triangles exist under this coloring!

If a triangle crosses y = x, then it has at least one point colored from N and another from P. N and P are disjoint, so the triangle cannot be monochromatic.

And if a triangle doesn’t cross y = x, then it has two points lined up so as to have different colors:

So we have no monochromatic triangles! Now, assuming the Continuum Hypothesis, |ω1| = |ℝ|, so we can form a bijection M from ω1 to ℝ. We use M to map our original coloring to a new coloring on ℝ × ℝ. If (a, b) ∈ ω1 × ω1 is colored C, then we also color (M(a), M(b)) ∈ ℝ × ℝ. Under this map, non-monochromatic triangles stay non-monochromatic. So we’ve constructed a coloring on ℝ × ℝ with no monochromatic triangles!

If a coloring with no monochromatic right triangles exists, then the Continuum Hypothesis is true.

The proof outline here is to show that any infinite subset A of ℝ either has cardinality ℵ0 or 20. This shows that there are no intermediate cardinalities between these two.

Consider any infinite subset A ⊆ ℝ. For any a, define Ya to be the y-coordinates of points along the vertical line through (a, 0) that have no color repeats on that line. So Ya = {y ∈ ℝ | the color of (a, y) is different from (a, y’) for all y’ ≠ y}.

Notice that Ya must be countable, because all its elements have different colors and there are only countably many different colors to have. So |Ya| = ℵ0.

Define YA to be the union of Ya, for each a in A. Since each Ya is countable and A is at least ℵ0, |YA| ≤ |A|⋅ℵ0 = |A|. There are two cases: either YA = ℝ, or YA ≠ ℝ.

Case 1: YA = ℝ. In this case |ℝ| = |YA| ≤ |A|, so |A| ≥ |ℝ|. Since A is a subset of ℝ, this implies that |A| = |ℝ|.

Case 2: YA ≠ ℝ. In this case, there’s some real number y* in ℝ – YA. By definition y* is outside YA, so for each vertical line {x} × ℝ there is at least one point with the same color as (x, y*) besides itself.

Each (x, y*) with x ∈ A must have a different color. Proof: Suppose not. Then we have two elements of A, x1 and x2, such that the color of (x1, y*) is the same as the color of (x2, y*). By the last paragraph’s result, we can pick a third point (x1, y’) that has the same color as (x1, y*). These three points form a monochromatic right triangle, but we’ve assumed that’s impossible! Contradiction

So each point in A × {y} must have a different color. Thus, the cardinality of A × {y} must be countable. A is an infinite set, so |A| = ℵ0.

So in case one |A| = |ℝ| = 20, and in case two |A| = ℵ0. And this is true for every infinite subset A of ℝ. So every subset of ℝ is either finite, countable, or has the same cardinality as ℝ. And this is the continuum hypothesis!

No known unknowns in Peano arithmetic

Reports that say that something hasn’t happened are always interesting to me, because as we know, there are known knowns; there are things we know we know. We also know there are known unknowns; that is to say we know there are some things we do not know. But there are also unknown unknowns—the ones we don’t know we don’t know. And if one looks throughout the history of our country and other free countries, it is the latter category that tend to be the difficult ones.

Donald Rumsfeld

Rumsfeld’s statement may have been correct about American politics, but he was wrong in the context of Peano arithmetic. In Peano arithmetic, every known is a known known, and every unknown is an unknown unknown. Let me explain.

A paragraph for background: PA expresses enough arithmetic to allow it to encode sentences like “φ is provable from the axiom set T” as arithmetic properties of the natural numbers. But Peano arithmetic also has nonstandard models containing objects that aren’t natural numbers. It’s possible for a sentence about arithmetic to be true of these nonstandard models and false of the standard models. And in particular, the sentences that Gödel-encode proofs of other sentences can take on a different meaning in nonstandard models. For instance, the Gödel encoding of a sentence like “PA is consistent” looks like “There is no number with the property that it encodes a proof of 0=1 from the axioms of PA”. This might be true of the standard natural numbers, but false in some nonstandard model. There might be some nonstandard number that satisfies Gödel’s arithmetic formula for a proof of 0=1 from the axioms of PA, but which doesn’t actually encode any such proof (as only standard natural numbers Gödel-encode valid proofs). Gödel encodings are only logically equivalent to the statements they encode in the standard model of the natural numbers.

Of all the sentences of the form “φ is provable from the axiom set T”, which are provable from the axioms of Peano arithmetic? How much does Peano arithmetic know about what arbitrary formal theories prove? For all of the following results, I will assume that PA is consistent.

Our first result: If PA proves “φ is provable from T”, then φ really is provable from T.

This follows from the soundness of first-order logic. If PA proves “φ is provable from T”, then this sentence must be true in all models. In particular, it’s true in the standard model. And if it’s true in the standard model, then it must actually be true that φ is provable from T.

Second result: If φ is provable from T, then PA proves “φ is provable from T.”

This follows from three facts: (1) that the standard natural numbers are a subset of every nonstandard model, (2) that the sentence “φ is provable from T” is a statement of the form “There exists a number with some property”, and (3) that first-order logic is complete.

The Gödel encoding of “φ is provable from T” is something like “there’s some number n with the property that n encodes a proof of φ from T.” Now, suppose that φ is provable from T. Then there’s a standard natural number that encodes this proof. And since every nonstandard model contains every natural number, this number exists in all these models as well! So the statement is true in all models. So by completeness, it’s provable from PA.

So far we have that PA proves “φ is provable from T” if and only if φ is actually provable from T. Loosely, if a theory can prove something, then Peano arithmetic knows this. Even though ZFC is a stronger theory than Peano arithmetic, there’s nothing that ZFC can prove that PA doesn’t know it can prove. In particular, ZFC proves the consistency of PA, so PA proves that ZFC proves the consistency of PA! This of course doesn’t mean that PA proves its own consistency, because PA doesn’t trust ZFC to be true (it doesn’t accept the axioms of ZFC).

Furthermore, for every sentence that PA can prove, PA can prove that PA can prove it. In this sense, everything that PA knows is a known known. What we’ll see next is that for PA, every unknown is an unknown unknown. For Peano arithmetic, and in fact for ANY consistent theory that expresses enough arithmetic to be subject to Gödel’s theorems, there are no known unknowns.

Third result: If φ is not provable from PA, then “φ is not provable from PA” is not provable from PA.

This follows from Gödel’s second incompleteness theorem and the principle of explosion.

Now, the proof of our third result. Suppose “φ is not provable from PA” is provable from PA. By the principle of explosion, if PA was consistent then EVERYTHING would be provable from it. So by finding something that isn’t provable from its axioms, PA is able to prove its own consistency! But then, by Gödel’s second incompleteness theorem, PA must be inconsistent. This contradicts our background assumption that PA is consistent.

Note that the sentence “φ is not provable from PA” isn’t the same type of sentence as “φ is provable from PA”. As we saw a moment ago, the second is a sentence of the form “there’s some number n with a particular property,” and all of these sentences are provable from PA if and only if they’re true of the standard naturals. But the first sentence is of the form “there’s no number n with a particular property”, which is not the same! It’s possible for this sentence to be true of the standards but false of the nonstandards. All we need is for there to be no standard numbers and at least one nonstandard number with that property.

So if PA is consistent, then it can never prove that it doesn’t prove something. Now, notice that all that I’ve said applies more generally than just Peano arithmetic. For any consistent mathematical theory that express sufficient arithmetic for Gödel’s incompleteness theorems to apply, every known is a known known and every unknown is an unknown unknown.

The subtlety of Gödel’s second incompleteness theorem

Gödel’s second incompleteness theorem is an order of magnitude more subtle than his first. It’s commonly summarized as “no consistent theory strong enough to do arithmetic can prove its own consistency.” But there’s a lot of subtlety in both the “strong enough to do arithmetic” and the “prove its own consistency.” First of all, what exactly counts as strong enough to do arithmetic? Peano arithmetic certainly does, but it has an infinite axiom schema. Do any finite theories meet the criterion of “strong enough to do arithmetic”? It urns out that the answer to this is yes! Robinson arithmetic, which is what you get if you remove the infinite axiom schema of induction from PA, meets the requirement.

There are weaker theories of arithmetic, like Presburger arithmetic (which has only addition) and Skolem arithmetic (only multiplication), that don’t meet the criterion, and are therefore not subject to the incompleteness theorems. And it turns out that both of these theories are actually decidable! This is even stronger than being sound and complete; soundness and completeness tell us that there’s an algorithm that determines after a finite amount of time that a given sentence is a tautology, but not necessarily that such an algorithm exists to determine that a sentence is NOT a tautology. Decidability gives us both: not only can we classify any tautology as such in a finite time, we can also classify any non-tautology as such in a finite time.)

The amount of arithmetic we need is exactly the amount that Gödel uses to prove the incompleteness theorems. This has two parts: one, the theory must express enough arithmetic to do Gödel encoding (and therefore to express the notion of “provability”), and two, the theory must be able to formalize diagonalization. Dan Willard came up with theories that formalize enough arithmetic to do the first but not the second: these are theories that can talk about their own provability via Gödel coding, but are still too weak to be subject to the incompleteness theorems. Thus, these theories can actually prove their own consistency! These fascinating theories are called self-verifying theories.

Everything I’ve said so far has been about the subtlety of the notion of “strong enough to do arithmetic”. Now I want to talk about the subtlety of the notion of “proving one’s own consistency.” I’ll do this by first taking a brief interlude to talk about an argument I recently saw against the Consistency Thesis in philosophy of math that uses this notion. A friend of mine writes:

Consistency is a weak soundness property.

Here’s what I’ll call the Consistency Thesis: (Mathematical sentences are justified and/or true when they are part of a consistent theory.) What I want to do is to raise a problem for that thesis. Maintaining the Consistency Thesis leads to bizarre mathematical beliefs which don’t lend themselves to being true or justified. A merely ‘consistent’ theory can claim P(0), P(1), and so on, and yet claim that there’s some n for which ¬P(n). Such a theory is omega-inconsistent.

There are mathematical theories which despite their syntactic consistency, claim to be inconsistent. They will claim to be able to derive that 0=1 and yet never derive 0=1. One example of a consistent but unsound theory would be this: [suppose we add to ZFC the new axiom “ZFC is inconsistent”. If ZFC is consistent, then the new theory ZFC’ is consistent (by Godel’s second incompleteness theorem), even though it falsely proves that ZFC is inconsistent.]

So far, I’ve only mentioned omega-inconsistent theories. I should also mention that there are omega-consistent theories which are arithmetically unsound. A much stronger soundness property would be soundness in omega-logic, which entails consistency, omega-consistency, and arithmetical soundness.

Here’s the response I gave:

You say that an adherent to the Consistency Thesis believes in mathematical theories that are in fact consistent (no contradictions can be proved from them) but claim to be inconsistent; for instance ZFC + ¬Con(ZFC). This bit about “claiming to be inconsistent” is subtler than it might initially seem. There’s a very important difference between Con(ZFC) and “ZFC is consistent”. A first order theory can’t talk directly about its own consistency, it can only talk about properties of the objects in its models. We are allowed an indirect method to talk about consistency of theories, Gödel encoding. But this method has problems.

Gödel encoding allows us to write down statements that, if understood to be about the natural numbers, are equivalent to the assertion that a theory proves a contradiction. But this “if understood to be about the natural numbers” is a very important qualification, because no first order theory categorically defines the natural numbers (i.e. no first order theory has as its only model the natural numbers). More generally, no theory within a logic that has a sound, complete, and finitary proof system categorically describes the natural numbers (these are the only logics that a Formalist will see as well-defined, by the way).

What this means is that when we write “Con(ZFC)”, we’re actually using a short-hand for a complicated sentence about the objects in our models, and this complicated sentence is NOT equivalent to the claim that no contradiction can be proven from ZFC. Con(ZFC) could be false in a model even if ZFC is consistent, and Con(ZFC) could be true in a model even if ZFC is inconsistent, so long as that model is not the standard natural numbers.

So the adherent of the Consistency Thesis is not actually committed to weird beliefs about a theory being consistent and claiming its own consistency; they are just committed to the belief that the natural numbers are not well-defined.

The same objection applies to the claim that they have to accept as valid theories that claim P(0), P(1), and so on, but also that there’s some n for which ¬P(n). That’s true, and that’s fine! One can just say that such theories are not theories of the standard natural numbers; the n for which ¬P(n) is some other type of mathematical object that is not a natural number.

A TL;DR for my response: “Con(T)” only means “T is consistent” if T is about the natural numbers. Furthermore, the theories that assert their own inconsistency never have the natural numbers as a model. So it’s ultimately not very weird that these theories assert “¬Con(T)”… this statement doesn’t actually mean “T is inconsistent” in any of the models of the theory!

Undecidability results on lambda diagrams

For background on how lambda diagrams work, look HERE.

The equivalence of lambda calculus with Turing machine computations, and the logical limitations on computation, combine to give us interesting limitative results on lambda diagrams.

Reducibility, Absolute and Contingent

We begin by defining a notion of reducibility. A lambda diagram is reduced if there is no beta reduction that can be applied to the diagram. Here are some examples of reduced diagrams:

And here are some examples of non-reduced diagrams:

For each of these diagrams, if you keep applying beta reductions in any order you will eventually get to a point where the diagram is reduced. But this is not the case for all lambda diagrams. Check out the following diagram for M M (the Mockingbird function applied to itself):

Applying beta reduction gives you the following:

And we’re back where we started! This tells us that our diagram is irreducible.

Sometimes there are multiple beta reductions that can be applied. In these cases, it might be that one sequence of beta reductions reduces the diagram, but another sequence goes forever without reducing it. For instance, here is the diagram for False (M M):

If we first feed (M M) as input to False, then we get a reduced diagram:

But if we first feed M to M, then what we get back is the same thing we started with:

If we keep feeding M to M, we keep getting back the same diagram and never fully reduce. When a diagram reduces on some series of beta reductions and never reduces on others, we call the diagram contingently reducible. If every strategy of beta reduction eventually reduces the diagram, we call it absolutely reducible. And if no strategy reduces the diagram, we call it absolutely irreducible.

Some interesting observations:

It’s possible for two diagrams A and B to both be absolutely reducible, but for A B to be absolutely irreducible.

It’s also possible for a diagram A to be absolutely irreducible, but the diagram A B to be contingently reducible (though never absolutely reducible).

It’s not, however, possible for A to be absolutely irreducible but A B to be absolutely reducible. Why? Simply because you can choose your reduction strategy for A B to be any of the reduction strategies for A, each of which always fails. In general, if any sub-diagram within a lambda diagram is contingently reducible, then the whole diagram cannot be absolutely reducible. Why? Simply because you can choose your reduction strategy for the whole diagram to be a reduction strategy that fails for the sub-diagram (at least one such strategy must exist, by our assumption that the sub-diagram is only contingently reducible). This same reduction strategy that fails for the sub-diagram, also fails for the whole diagram. So the whole diagram cannot be absolutely reducible.

Here’s a question that I don’t know the answer to: Is it possible for an application of of an irreducible diagram to an irreducible diagram to be reducible? I suspect not.

The distinction between contingent and absolute reducibility goes away the moment you fix a general reduction strategy that for any diagram gives a unique prescription for which reduction should be applied (if any are possible). Henceforth, I will assume that a particular reduction strategy has been fixed, and so will only speak of reducibility and irreducibility, no absolutes and contingents.

Reducibility Oracles

It might be convenient if we could always know whether a given diagram reduces or not. We might even wonder if there’s a lambda diagram that “computes” the answer to this question. Let’s suppose there is such a diagram, which we’ll call a “reducibility oracle.” It’s a black box, so we’ll denote its diagram as a box with an R:

We’ll define the behavior of the reducibility behavior as follows:

We’ll now design a larger diagram R’ that uses R:

Let’s see how R’ behaves when given an input (call it F).

Notice that if F F is reducible, then R’ F is irreducible. And if F F is irreducible, then R’ F is reducible.

So what happens if we feed R’ to itself? Well, by the logic of the above paragraph, if R’ R’ is reducible, then R’ R’ is irreducible. And if R’ R’ is irreducible, then R’ R’ is reducible. We have obtained a contradiction! And so no such reducibility oracle diagram can exist.

But wait, there’s more!

Let’s define S(N) to be the size of the smallest lambda diagram that reduces to the Church numeral N, where size means the number of lines required to build the diagram.

There’s a tight correspondence between lambda calculus and programming languages. In fact, lambda calculus can be thought of as just one specific highly abstract functional programming language (along the lines of Haskell and Lisp). Recall now that the Kolmogorov complexity K of a string s (given a particular programming language) is defined as the shortest program that outputs that string. If we choose our programming language to be lambda calculus, then we get a correspondence between K(N) and S(N). This correspondence gives us the following results:

There is no lambda diagram that serves as a “size oracle” – i.e. a lambda diagram that when fed a number N, returns S(N). (Analogous to the uncomputability of Kolmogorov complexity)

For any sound proof system F, there’s a number L such that F cannot prove that the smallest lambda diagram that reduces to any number has more than L lines. (Analogous to Chaitin’s incompleteness theorem).

Sizes of lambda diagrams

Let me make some final notes on the function S(N). Every number has a “standard form”: λfx.f (f … (f x))), with N copies of f. The lambda diagram for this looks like an upside down ladder with N steps:

This has 2N + 3 lines, so we know that for all N, S(N) ≤ 2N + 3. We can define a number as “irreducibly complex” if S(N) = 2N + 3. For instance, 0 and 1 are irreducibly complex, and I think that 2 and 3 are as well:

This raises an interesting question: What is the smallest number that isn’t irreducibly complex? Another question is whether there are arbitrarily large irreducibly complex numbers (I strongly suspect not, but am not positive).

We can find some upper bounds for S(N), where N is a product or a power, as follows:

In general, for any computable function f whatsoever, there is a lambda diagram for f, to which we can append the diagram for any number N to represent f(N). The size of this diagram will just be the size of the diagram for f, plus the size of the diagram for N, plus 1 (for the line connecting the two). This means that if a number M can be written as f(N) for some computable f, then we can produce a diagram for M whose size is some constant + 2N. Thus S(f(N)) grows at most linearly with respect to N, no matter how fast-growing f is.

The Anti-Set Program

In ZFC set theory, we specify a collection of sentences within a first-order language to count as our axioms. The models of this collection of sentences are the set-theoretic universes (and many of these models are “unintended” – pesky perversions in which the set of naturals ω is uncountably large, as one example – but we’ll put this aside for this post). Most of the axioms act as constraints that all sets must follow. For instance, the axiom of pairing says that “For any sets x and y, there must exist another set containing as elements just x and y and nothing else”. This is an axiom that begins with universal quantification over the sets of the universe, and then states some requirement that must hold of all these sets.

The anti-set program is what you get when you take each of these restriction axioms, and negate the restriction it imposes on all sets. So, for instance, the axiom of anti-pairing says that for any sets x and y, there must NOT exist any set {x, y}. Contrast this with the simple negation of pairing, which would tell us only that there exist two sets x and y such that their pair doesn’t exist. The anti-axiom is much stronger than the negated axiom, in that it requires NO pairs to exist.

Not all axioms begin with universal quantifiers, in particular the axiom of infinity, which simply asserts that a set exists that satisfies a certain property. To form the axiom of anti-infinity, we simply negate the original axiom (so that no sets with that property exist).

As it turns out, the anti-set program, if applied to ALL the axioms of ZFC, ends in disaster and paradox. In particular, a contradiction can be derived from anti-comprehension, from anti-replacement, and from anti-extensionality. We don’t handle these cases the same way. Anti-comprehension and anti-replacement are simply discarded, being too difficult to patch. By contrast, anti-extensionality is replaced by ordinary extensionality. What’s up with that? The philosophical justification is simply that extensionality, being the most a priori of the bunch, is needed to justify us calling the objects in our universe sets at all.

There’s one last consideration we must address, which regards the axiom of anti-choice. I am currently uncertain as to whether adding this axiom makes the theory inconsistent. One thing that is currently known about the axiom of anti-choice is that with its addition, out go all finite models (there’s a really pretty proof of this that I won’t include here). In the rest of this post, I will be excluding anti-choice from the axioms and only exploring models of anti-ZF.

With that background behind us, let me list the axioms of anti-ZF.

Anti-Pairing
∀x∀y∀z∃w (w ∈ z ∧ w ≠ x ∧ w ≠ y)
No set is the pair of two others.

Anti-Union
∀x∀y∃z (z ∈ y ∧ ¬∃w (z ∈ w ∧ w ∈ x))
No set is the union of another.

Anti-Powerset
∀x∀y∃z (z ∈ y ∧ ∃w (w ∈ z ∧ w ∉ x))
No set contains all existing subsets of another.

Anti-Foundation
∀x∀y (y ∈ x → ∃z (z ∈ x ∧ z ∈ y))
Every set’s members have at least one element in common with it.

Anti-Infinity
∀x∃y ((y is empty ∧ y ∉ x) ∨ (y ∈ x ∧ ∃z (z = S(y) ∧ z ∉ x)))
Every set either doesn’t contain all empty sets, or has an element whose successor is outside the set.

If the axioms of ZF are considered to be a maximally nice setting for mathematics, then perhaps the axioms of anti-ZF can be considered to be maximally bad for mathematics.

We need to now address some issues involving the axiom of anti-infinity. First, the abbreviations used: “y is empty” is shorthand for “∀z (z ∉ y)” and “z = S(y)” is shorthand for “∀w (w ∈ z ↔ (w ∈ y ∨ w = y))” (i.e. z = y ⋃ {y}).

Second, it turns out that from the other axioms we can prove that there are no empty sets. So the first part of the disjunction is always false, meaning that the second part must always be true. Thus we can simplify the axiom of anti-infinity to the following statement, which is logically equivalent in the context of the other axioms:

Anti-Infinity
∀x∃y (y ∈ x ∧ ∃z (z = S(y) ∧ z ∉ x))
Every set has an element whose successor is outside the set.

Let’s now prove some elementary consequences of these axioms.

Theorems

>> There are no empty sets.
Anti-Union ⊢ ¬∃x∀y (y ∉ x)
Suppose ∃x∀y (y ∉ x). Call this set ∅. So ∀y (y ∉ ∅) Then ⋃∅ = ∅. But this implies that the union of ∅ exists. This contradicts anti-union.

>> There are no one-element sets.
Anti-Pairing ⊢ ¬∃x∃y∀z (z ∈ x ↔ z = y)
Suppose that ∃x∃y∀z (z ∈ x ↔ z = y). Then ∃x∃y∀z (z ∈ x ↔ (z = y ∨ z = y)). But this is a violation of anti-pairing, as then x would be the pair of y and y. Contradiction.

>> There are no two-element sets.
Anti-Pairing ⊢ ¬∃x∃y∃z∀w (w ∈ x ↔ (w = y ∨ w = z))
Suppose that ∃x∃y∃z∀w (w ∈ x ↔ (w = y ∨ w = z)). Then w is the pair of y and z, so anti-pairing is violated. Contradiction.

>> No models of anti-ZF have just N-element sets (for any finite N).
Suppose that a model of anti-ZF had only N-element sets. Take any of these sets and call it X. By anti-infinity, X must contain an element with a successor that is outside the set. Call this element Y and its successor S(Y). Y cannot be its own successor, as then its successor would be inside the set. This means that Y ∉ Y. Also, Y is an N-element set by assumption. But since Y ≠ S(Y), S(Y) must contain all the elements of Y in addition to Y itself. So S(Y) contains N+1 elements. Contradiction.

>> There is no set of all sets.
Suppose that there is such a set, and call it X. By Anti-Infinity, X must contain an element Y with a successor that’s outside of X. But no set is outside of X, by assumption! Contradiction.

>> No N-element model of anti-ZF can have N-1 sets that each contain N-1 elements.
Suppose that this were true. Consider any of these sets that contain N-1 elements, and call it X. By the same argument made two above, this set must contain an element whose successor contains N elements. But there are only N sets in the model, so this is a set of all sets! But we already know that no such set can exist. Contradiction.

>> Every finite set of disjoint sets must be its own choice function.
By a choice function for a set X, I mean a set that contains exactly one element in common with each element of X. Suppose that X is a finite set of disjoint sets. Let’s give the elements of X names: A1, A2, A3, …, AN. By anti-foundation, each An must contain at least one element of X. Since the Ans are disjoint, these elements cannot be the same. Also, since there are only N elements of X, no An can contain more than one element of X. So each element of X contains exactly one element of X. Thus X is a choice function for X!

The next results come from a program I wrote that finds models of anti-ZF of a given size.

>> There are no models of size one, two, three or four.

>> There are exactly two non-isomorphic models of size five.

Here are pictures of the two:

Now, some conjectures! I’m pretty sure of each of these, especially Conjecture 2, but haven’t been able to prove them.

Conjecture 1: There’s always a set that contains itself.

Conjecture 2: There can be no sets of disjoint sets.

Conjecture 3: In an N-element model, there are never less than three sets with fewer than N-1 elements.

Pictures of some extremely large numbers

(Note: in here are spoilers for my puzzle a couple days ago. Also, familiarity with both lambda calculus and lambda diagrams is assumed; here’s a link for the latter.)

In lambda calculus, numbers are sort of like adverbs; they tell you how many times to do something. So “2” is like “twice”; it’s a function that takes in an f and an x and returns f of f of x, written in lambda calculus as f (f x). “3” is like “thrice”; 3 f x = f (f (f x)).

One of the reasons that this turns out to be a really nice definition of numbers is that it makes arithmetic very simple, especially with large numbers. For instance, the operation of taking a number to the power of another number can be defined extremely simply as follows:

↑ = λnm.m n

In other words, we do an m-fold application of the operation “do an n-fold application.”

Once we’ve defined exponentiation, we can pretty easily get tetration too. n ↑↑ m is defined to be n ↑ (n ↑ (… ↑ n)), with m copies of n. But this is just m-fold application of the operation (n ↑) to 1. So we can write:

↑↑ = λnm.m (↑ n) 1

Similarly, for pentation (or hyper-5), we can write:

↑↑↑ = λnm.m (↑↑ n) 1

And in general, for any n, we can write:

N+1 = λnm.m (↑N n) 1

Together with our definition of ↑, this gives us a nice construction of all the hyperoperations:

↑ = λnm.m n
N+1 = λnm.m (↑N n) 1

There’s a problem though. If we actually tried to write out something like ↑100 just in terms of the basic syntax of the lambda calculus, our expression would end up extremely long; we’d have to write something like ↑99, ↑98, ↑97, and so on. We can do better than this by “automating” the process of going from ↑N to ↑N+1 (after all, we’re doing the same thing each time!). So we define a function that implements this process, arbitrarily choosing to name it g:

g = λfnm.m (f n) 1

Some examples of how g behaves:

↑↑= g ↑
↑↑↑ = g ↑↑ = g (g ↑)
↑↑↑↑ = g ↑↑↑ = g (g (g ↑))

You might already see where this is going. Now to get to ↑N+1, we just need to do an N-fold application of g to ↑! So we can write:

N+1 = N g ↑

Now writing out something massive like (3 ↑257 3) merely requires us writing out the functions g, ↑, 3, and 256, each of which is pretty simple. And the end result is much more concise than what we would have gotten using our previous construction.

There are the lambda diagrams for what we have so far:

There’s a bit of redundancy here that can be removed, shrinking the diagrams:

But suppose now we want to go a step further. Let’s define the following extremely fast-growing sequence of numbers:

a0 = 1
a1 = 3 ↑(a0 + 1) 3
a2 = 3 ↑(a1 + 1) 3

Some of you may recognize this sequence as closely resembling the usual construction of Graham’s number. It’s not quite the same as this construction, but Graham’s number is approximately a66. This sequence can be encoded into lambda calculus using the same trick as before: define a new function which takes us from aN to aN+1, and then apply it n-fold to a0. Let’s call this function h.

h = λa.a g ↑ 3 3

Verify for yourself that h takes us from one number in our sequence to the next!

a1 = h a0
a2 = h a1 = h (h a0) = 2 h a0
a3 = h a2 = h (h a1) = h (h (h a0)) = 3 h a0

We can construct a general function now that takes in a number n and returns an:

a = λn.n h a0

We can do a little better than this by removing some redundancy. Notice the repeated 3s in the definition of h; we can built this repetition into our definition of g and get an even more concise representation of massive numbers.

g’ = λfn.n (f n) n
h’ = λa.a g ↑ 3
a’ = λn.n h’ 1

Here’s what these new and improved functions look like:

And finally, here’s an image of a256, much much larger than Graham’s number, to hang on your wall:

Keep in mind that Graham’s number is really really big. From wiki: “it is so large that the observable universe is far too small to contain an ordinary digital representation of Graham’s number, assuming that each digit occupies one Planck volume, possibly the smallest measurable space. But even the number of digits in this digital representation of Graham’s number would itself be a number so large that its digital representation cannot be represented in the observable universe. Nor even can the number of digits of that number—and so forth, for a number of times far exceeding the total number of Planck volumes in the observable universe.”

It’s pretty wild to think that if you start applying beta reductions to this diagram, the lambda diagram you’ll build up (the normal form of the expression) will be so large that it could not be represented within the entire observable universe!

How to draw lambda diagrams

If you don’t want spoilers for my puzzle a few days ago, don’t read ahead!

I think lambda diagrams are extremely cool, and haven’t seen any detailed description on how they work online. I’ll start by showing some very simple examples of lambda diagrams, and then build up to more complicated ones.

First of all, what are lambda diagrams? They are pictorial representations of lambda expressions, and hence count as a pictorial system for a large portion of mathematics. I will assume that you understand how lambda calculus works for this post, and if you aren’t familiar then you can do no better than this video for a basic introduction.

The Basics

Let’s start simple: the lambda expression for “True”: λx.λy.x

Each time we see a new bound variable, we draw a horizontal line to represent that variable. So parsing from left to right, we start with a horizontal line for x.

Next is “λy.“, so we insert another horizontal line for y (below the first horizontal line).

Next we have an x in the function body, which we represent by a vertical line leaving the x line:

And there we have it, the lambda diagram for True!

Now let’s look at False: λx.λy.y

We start the same way as before, with two horizontal lines for our variables x and y:

But now we have a y in our function body, not an x. So we draw our vertical line leaving the y line instead of the x line:

And that’s the diagram for False! The actual diagram needn’t have the labels “x” and “y” in it, because our convention that we draw lines for new variables below existing horizontal lines uniquely specifies which lines stand for which variables.

Let’s now do a slightly more complicated function: λx.λy.y x

As before, we start with our two variables, x and y.

Now we have a y in the function body, so we add a vertical line leaving the y bar:

Next is an x, which is being fed as input to the y. We draw this as follows:

Two things have happened here: First I’ve moved the original vertical line for y over to the left to make space. Next, I’ve represented “feeding x to y as input” as a line leaving the x bar and terminating at the vertical line for y.

Take a guess what λx.λy.x y would look like!

Here it is:

Next, let’s look at λx.λy.x x y (which is incidentally the “or” function).

We start with the introduction of the variables x and y.

Next, an x:

And another x:

And finally a y:

Notice that this y connects below the first connection, to the original branch. What would it mean if it were connected to the second branch?

As you can see, this would indicate that y is first fed to x, and then the result of THAT is fed to x.

With these principles, you should now be able to draw out diagrams for much more complicated lambda expressions. Try this one: λx.λy.λz.z (y x) (y z x) x

Here it is!

Make sure that this makes sense before moving on!

Lambda Expressions within Lambda Expressions

Next, we’ll look at how to deal with lambda expressions that contain new lambda expressions within their function body. Here’s a simple example: λx.λy.x (λz.z) y

Everything up until the third λ is the same as always:

Now, to deal with our new variable z, we draw a new horizontal line. But importantly, this horizontal line comes after the vertical line for the x that has already been used!

After the “λz.” we have a “z“, so we draw a line from the z bar to our original vertical line leaving x.

And finally, after all of this we feed y to the original vertical line:

Try this one: λx.λy.x (λz.z z) (λw.λv.v (w w)) y

And here’s another one, where the inside function uses outside variables: λx.λy.x (λz.y z)

Alright, the final category you need to know to understand how to draw any lambda diagram whatsoever is…

Function Application

Suppose we have the lambda expression (λx.λy.x) (λz.z z). We first draw the lambda diagrams for each of the two component expressions side by side:

And now we simply attach the second to the first, indicating that the entire second lambda expression is fed as input to the first!

Here’s another example for you to try: (λx.x) (λy.λz.z z y) (λw.λv.v (v w))

And one more example, this one using all the tricks we’ve seen so far: (λx.x (λy.λz.z y y) (λw.λv.w (v x))) (λu.u u)

Beta Reduction

The final thing to know about lambda diagrams is how to actually do computations with them. The basic idea is that when we have one lambda diagram fed as input to another, we can substitute the “input” diagram for each vertical line leaving the topmost horizontal bar, and then erase this bar. Let’s look at some examples:

You can also do function application within larger lambda diagrams. Take a look at the following example, which is a calculation that shows that the successor of zero is one:

The first beta reduction here is just like the previous ones we’ve seen. But the second one does a substitution within the main lambda expression, as does the third. This works in much the same way as the earlier reductions we saw, the primary difference being that references to variables outside the scope of the function being applied must be maintained. You can see this in the final step above, where we remove the line representing the variable y, and attach the line connecting to it to the line representing the variable x.

Now for the fun part! I’ve found some great animations of calculations using lambda diagrams on Youtube. Each of them is using just the rules I’ve described above, nothing more! And I must say that the music is delightful. Take a look!

Beautiful!

Fundamentals of Logic: Syntax, Semantics, and Proof

A while back, I wrote a several-part introduction to mathematical logic. I wrote this when I was first wading into the subject and noticed that a lot of the stuff I was learning was not easy to find online in a concise beginner-level essay. Since then, I’ve learned an enormous amount more and become a lot more comfortable with the basics of the field. Looking back on my introduction series, I’m pretty happy with it. There’s nothing in there that’s incorrect. But I do notice that when I read it over, I wish that I had framed certain things slightly differently. In particular, one of the distinctions that I think is absolutely central to mathematical logic – the distinction between syntax, semantics, and proof – is muddled a bit in the write-up, and proof is bundled in with syntax in a way that I’m not totally satisfied with.

Given that, and given that this series is for whatever reason now one of the most popular reads on my blog, I want to provide a brief update post. I sort of did this a week or so ago, with this post, which linked to the slides for a presentation I gave for a discussion group. But I want to do it for REAL now, as an actual post of its own. And honestly, I also just love this subject and am happy for any excuse to write about it more.

So, let me begin by laying out what seems to me to be the primary trichotomy that you’ll see pop up over and over again in logic: syntax, semantics, and proof. One at a time, then.

Syntax

The syntax of a logic is a detailing of the allowed symbols of a language and the conditions for their grammatical use. It’s a set of characters, along with a function that defines the grammar of the language by taking in (finite, in most logics) strings of these characters and returning true or false. This function is usually inductively defined. So for instance in propositional logic we have some set of characters designated to be our propositional variables (we’ll denote them p, q, r, …), and another set of characters designated to be our logical symbols (∧, ∨, ¬, →, and parentheses). We inductively define our grammar function G as follows: Every propositional variable is grammatical. If X and Y are grammatical strings, then so are (¬X), (X ∧ Y), (X ∨ Y), and (X → Y). And no other string is grammatical. This definition is what allows us to say that (p ∧ (¬(q ∨ r))) is grammatical but (p ∧ (¬(q ∨ ¬))) is not. The grammar for propositional logic is especially nice and simple, and first-order logic is only mildly more complicated. The important thing is that the syntax for these logics as well as higher order logics is algorithmically checkable; it’s possible to write a simple program that verifies whether any given input string counts as grammatical or not.

Semantics

The semantics of a logic is a system for assigning some sort of meaning to the grammatical strings of the language. There are different types of meanings that might be assigned, but the primary one for classical logic is truth and falsity. For propositional logic, our semantics is a set of truth functions, which are functions that take in grammatical strings and return either true or false (you’ve encountered these before if you’ve seen truth tables; each row in a truth table corresponds to a particular truth function). Not just any such function will do; these functions will have to satisfy certain constraints, such as that whatever a truth function assigns to a string X, it must assign the opposite value to the string (¬X), and that (A ∧ B) is assigned true if and only if both A and B are assigned true. These constraints on our functions are really what endow our strings with meaning; they induce a sort of structure on the truth values of strings that lines up with our intended interpretation of them.

For propositional logic, we give an inductive definition for our collection of valid truth functions. We construct a truth function by first doing any assignment of truth values to the propositional variables (p, q, r, and so on), and then defining what the function does to the rest of the strings in terms of these truth values. So for any strings X and Y, the truth function assigns true to (¬X) if and only if it assigns false to X. It assigns true to (X ∧ Y) if and only if it assigns true to X and true to Y. And so on. By our construction of the grammar of propositional logic, we’ve guaranteed that our function is defined on all grammatical strings. Of course, there are many such truth functions (one for every way of assigning truth values to the propositional variables) – and this collection of truth functions is what defines our semantics for propositional logic.

First order logic is again more complicated than propositional logic in its semantics, but not hugely so. When talking about the semantics of first-order logic, we call its truth functions models (or structures), and each model comes with a universe of objects that the quantifiers of the logic range over. An important difference between the semantics of propositional logic and the semantics of first order logic is that in the former, we can construct an algorithm that directly searches through the possible truth functions. In the latter, getting a similar sort of direct access to the models is much more difficult. Consider, for instance, that some models will have universes with infinitely many elements, meaning that to verify the truth of a universally quantified statement requires verifying it for infinitely many elements.

Last thing I need to say about semantics before moving on to proof: there is the all-important concept of semantic entailment. A set of sentences (call it A) is said to semantically entail another sentence (call it X), if every truth function that assigns everything in A true also assigns X true. When this is the case, we write: A ⊨ X.

Examples
{(p ∧ q)} ⊨ p
{(p ∨ q), (¬q)} ⊨ p
{p, (p→q), (q→r)} ⊨ r

Proof

Notice that we defined our semantics using complicated words like “functions” and “inductive definition”. More troubling, perhaps, we seem to have engaged in a little bit of circularity when we said in our definition of the semantics that (A ∧ B) is assigned true only when A and B are both assigned true. Suppose that some pesky philosopher were to come up to us and ask us what we meant by “only when A and B are both assigned true”. What would we have to say? We couldn’t refer to our formal definition of the semantics of ∧ to ground our meaning; as this definition uses the concept of “and” in it explicitly! The circularity is even more embarrassingly vivid when in first order logic we’re asked to define the semantics of “∀”, and we say “∀x φ(x) is true whenever φ(x) is true of every object in our model”. In other words, there’s a certain sense in which our semantics is a little hand-wavy and non-rigorous. It relies on the reader already having an understanding of mathematical concepts that are usually much more advanced than the ones being defined, and indeed implicitly relies on the reader’s understanding of the ones being defined in the first place!

With this criticism of semantics fresh in our heads, we might want to ask for a purely mechanical system that would take in some starting strings and churn out new strings that follow from them according to the semantics. Such a system wouldn’t rely on the user’s understanding of any higher-order concepts; instead a mere understanding of the concept of string manipulations would suffice. That is, we want to rigorously define such a procedure so that the results of the procedure mirror the structure that we know the semantics has, but without relying on any high-level concepts or circularity. We can instead just point to the mechanical process by which strings of these symbols are manipulated and be content with that. A proof system is just such a mechanical system.

There are multiple styles of proof systems, varying primarily with respect to how many inference rules and axioms they have. On the side of many axioms and few inference rules we have Hilbert-style systems. And on the side of many inference rules and few (sometimes zero) axioms we have Gentzen-style natural deduction systems. But hold on, what are these “axioms” and “inference rules” I’m suddenly bringing up? Axioms are simply strings that can be used in a proof at any time whatsoever. Inference rules are functions that take in some set of strings and produce new ones. A proof is just an ordered list of strings, where each is either an axiom or a result of applying some inference rule to earlier strings.

For propositional logic, one common proof system involves just three axioms and one inference rule. The three: X → (Y → X), (X → (Y → Z)) → ((X → Y) → (X → Z)), and ((¬Y) → (¬X)) → (X → Y). The inference rule is the famous modus ponens, which takes the form of a function that takes as input X and X → Y, and produces as output Y.

(The discerning amongst you might notice that the “three axioms” are actually three infinite axiom schemas; for each you can populate X, Y, and Z with any three grammatical strings you like and you get a valid instance of the axioms. The even more discerning might notice that these axioms only involve the → and ¬ symbols, but we seem to be missing ∧ and ∨. That’s correct; we neglect them for economy of our proof system and are allowed to do so because all sentences with ∧ and ∨ can be translated into sentences using just → and ¬. To expand our proof system to deal with ∧ and ∨, we simply add a few more axioms to incorporate these translations.)

Soundness and Completeness

I’ve said that the purpose of a proof system is to mimic the semantics of a logic. I will now say more precisely what this mimicry amounts to.

There is a close cousin to the semantic-entailment relation (⊨). This is known as the syntactic entailment relation, and is written ⊢. We say that a set of sentences A syntactically entails a sentence X if there exists some proof that uses the sentences in A as assumptions and concludes with the sentence X. So A ⊢ X can be thought of as “X can be proven, provided that we assume A.”

To say that a proof system for a logic mimics the semantics of the logic is to say that the syntactic entailments and semantic entailments line up perfectly. This has two components:

Soundness: If A ⊢ X, then A ⊨ X.
Completeness: If A ⊨ X, then A ⊢ X.

There’s one other feature of a proof system that’s totally essential if it’s to do the job we want it to do; namely, that it should be actually computable. We should be able to write an algorithm that verifies whether any given sequence of strings counts as a valid proof according to the proof system. This is known as effective enumerability.

To have a sound, complete, and effectively enumerable proof system for a logic is to know that our semantics is on solid ground; that we’re not just waving our hands around and saying things that cannot be cashed out in any concrete way, but that in fact everything we are saying can be grounded in some algorithm. There’s a general philosophy in play here, something like “if I can compute it then it’s real.” We can rest a lot more comfortably if we know that our logic is computable in this sense; that our deductive reasoning can be reduced to a set of clear and precise rules that fully capture its structure.

Perhaps predictably, our next question is whether we really do have a sound, complete, and effectively enumerable proof system for a logic. It turns out that the answer depends on which logic is being discussed. This is already perhaps a little disconcerting; we might have hoped that every logic of any use can be nailed down concretely with a proof system. But alas, it’s not the case.

With that said, there do exist sound, complete, and effectively enumerable proof systems for propositional logic. Maybe that’s not so exciting, since propositional logic is pretty limited in terms of what sort of math you can do with it. But it turns out that sound complete and effective proof systems ALSO exist for first-order logic! And first-order logic is where we form theories like Peano arithmetic (for natural number arithmetic) and ZFC (for set theory).

In other words, there’s a logic within which one can talk about things like numbers and sets, and the semantics of this logic can be made precise with a proof system. That’s great news! Math is on sound grounds, right? Well…

You might be skeptical right now, the phrase “incompleteness theorems” bubbling into your mind, and you’re right to be. While we can form languages in first order logic whose sentences look to be describing natural numbers (in PA) or sets (in ZFC), it turns out that none of these languages have the expressive power to actually pin down these structures uniquely. Said more carefully, there is no first-order collection of sentences which has as a unique model the natural numbers. There’s a theorem called the Löwenheim-Skolem Theorem, which is one of the most shocking results in the field of mathematical logic. It says that if a first-order theory has any infinite model, then it has infinite models of every cardinality. So if we want to describe the natural numbers, we have to also be simultaneously describing structures of every cardinality. At this point, this is getting into well-covered territory for this blog, so I’ll just direct you to these posts for more detail about how this works: here, here, and here.

So, in first order logic you have a proof system that perfectly mirrors your semantics but no theory whose semantics line up with the natural numbers. There are many ways we can strengthen our semantics to give us the strength to talk about the natural numbers categorically, the most popular of which is second-order logic. In second order logic, we are allowed to quantify not only over objects in our universe, but over sets of objects as well. This turns out to be a really big deal. In second-order logic, we can actually write down a set of sentences that uniquely pick out the natural numbers as their only model. But guess what? Second-order logic has no sound and complete proof system!

This is a pattern that holds very generally. Either we have a logic with the expressive power to talk categorically about the natural numbers, in which case we have no sound and complete proof system, OR we have a logic with a sound and complete proof system, in which case we lack the ability to talk categorically about the natural numbers. No matter how we go, we end up at a loss as to how to ground the semantics of ℕ in an algorithmic procedure. And as a result, we also end up unable to have any algorithmic procedure that enumerates all the truths about natural numbers. This is what has become known as the incompleteness phenomenon.

I’ll stop there for now! Here’s a brief summary of the main concepts I’ve discussed. A logic defined by its syntax, semantics, and proof system. The syntax details the allowed symbols of the language and which combinations are grammatical. The semantics details the possible truth assignments to these sentences, consistent with our intended interpretation of the symbols. And the proof system provides a mechanical procedure for deriving new strings from old strings. A set of strings X semantically entails a string Y if every truth assignment that satisfies all the strings in X also satisfies Y. A set of strings X syntactically entails Y if there’s some proof of Y from the assumptions of X. A proof system is said to be sound and complete if its syntactic entailment relation perfectly mirrors the semantic entailment relation. And it’s said to be effectively enumerable if there’s an algorithm for determining whether a given sequence of strings counts as a proof or not. First-order logic has a sound, complete, and effectively enumerable proof system. But it lacks the expressive power to talk categorically about the natural numbers. No set of sentences in a first-order language has the natural numbers as a unique model. Second-order logic has the expressive power to talk categorically about the natural numbers, but it has no sound and complete proof system. In general, no logic has both (1) a sound, complete, and effectively enumerable proof system, and (2) theories that are categorical for the natural numbers.

The Pictorial Mathematics of Klak-Adbmal

In the distant land of Klak-Adbmal, cut off from the rest of civilization, mathematicians have developed their own distinctive form of math. It is entirely pictorial; every mathematical concept is translated as a diagram consisting of criss-crossing horizontal and vertical lines, and calculations are manipulations of these diagrams.

Translators have so far been able to figure out the symbols for a few basic mathematical concepts, but there are many remaining gaps in our ability to translate Klak-Adbmalist mathematics. This image shows the symbols we’ve figured out so far, as well as a step-by-step calculation of the successor of 1 equaling 2.

The Klak-Adbmalians are very secretive and slow to give up their secrets. Can you figure out the answers to the two questions above on the right?

It’s worth noting that just as there are multiple ways that ordinary mathematicians can write a given number (2 ⋅ 5 = 8 + 2 = 10),  there are many different pictures for any given concept. If you need more assistance, you can take a look at this recently discovered drawing of a Klak-Adbmalian calculation proving that True Or False is True.

And one final hint: Recently we’ve received verification that the following two calculations are aways valid, no matter what Klak-Adbmalist pictures replace the red and green boxes:

The Rise of Anti-Set Theory

This report details the discovery and proliferation of a radical new form of set-theory that has come to be known across math departments across the world as Anti-Set Theory. The earliest appearance of Anti-Set Theory is shrouded in mystery, but my best efforts have traced it to the work of two little-known logicians named Narl Cowman and Bishi Ranger. The germinal notion that would eventually grow into Anti-Set Theory was the idea of a theory of sets obtained by taking all of our familiar intuitions about sets, and require their exact opposites to hold. Any property that would hold of all sets, would of necessity not hold of any the objects in this theory. An attractive idea by any standard, and one self-evidently worthy of pursuit. And thus Anti-Set Theory was born.

The development of Anti-Set Theory progressed in three phases: the Era of Simple Negation (which was plagued with unattractively permissive axioms and a proliferation of models), the Age of Negation after the Universal Quantifier (where the previous problems were resolved, but new problems of consistency and paradox arose), and finally the Modern Age of Axiomatic Anti-Set Theory (where there arose the split between “revolutionary” and “traditionalist” Anti-Settists, as they came to be known). Each of these phases will be explained in more depth in what follows.

The Era of Simple Negation

The first formulations of Anti-Set Theory suffered for practical reasons that were obvious in retrospect, but took embarrassingly long to remedy. The early anti-settists naively took their project to be to simply take the axioms of existing set theories and negate each of them, then to collect them into a new axiomatic theory and explore the consequences. While the idea to build off of existing set theories was a worthy one, there was a serious problem with this approach. Namely, the axioms of existing set theories were designed to be quite strong and restrictive, which made their negations overly weak and permissive. Many axioms had the form “All sets have this property”, the negation of which becomes “At least one set doesn’t have this property”, not “NO sets have this property.” The latter was the ambition of the original anti-settists, but it would take them until the Age of Negation After the Universal Quantifier about a half hour later to realize the crucial error they were making.

For instance, in the axioms of ZF (the orthodox axiomatization of set theory until Anti-Set Theory took the world by storm), there is an axiom known as “the Axiom of Pairing.” This axiom says, in plain English, that for any two sets x and y, there exists a set containing just x and y and nothing else (which we typically write as {x, y}). In the language of first-order logic, it said “∀x∀y∃z∀w (w∈z ↔ (w=x ∨ w=y))” (or in a more readable short-hand, “∀x∀y∃z (z = {x, y})”). Anti-Settists saw clearly the obvious benefit of having a set theory with the opposite property, namely that for any two sets x and y, the pair set {x, y} must NOT exist. But their original formulation of the Axiom of Anti-Pairing used simple negation: “¬∀x∀y∃z (z = {x, y})”, which is equivalent to “∃x∃y∀z (z ≠ {x, y})”, or “there are two sets x and y such that no set contains just x and y.” This was obviously too weak for what was desired.

The same problem cropped up for anti-union, anti-comprehension, anti-foundation, and virtually every other axiom of ZF. A new idea was needed. And soon enough, it came: we should place our negations AFTER the initial block of universal quantifiers, not before. The brilliance of this breakthrough is hard to overestimate. Fields Medals were awarded and backs were patted. This brings us to the second phase of the history of Anti-Set Theory.

The Age of Negation After the Universal Quantifier

Now that the Anti-Settists knew how they were going to proceed, they got busy axiomatizing their new theory. Every set has a power set? No more, now no set has a power set. The union of each set exists? Nope! Unions are banned. Infinite sets? Absolutely not.

The first few axioms to be proposed were anti-pairing, anti-union, anti-powerset, and anti-foundation and they took the following form:

Anti-Pairing: ∀x∀y¬∃z∀w (w ∈ z ↔ (w = x ∨ w = y))
Anti-Union: ∀x¬∃y∀z (z ∈ y ↔ ∃w (w ∈ x ∧ z ∈ w))
Anti-Powerset: ∀x∀y∃z (z ⊆ y ∧ z ∉ x)
Anti-Foundation: ∀x∀y (y ∈ x → ∃z (z ∈ x ∧ z ∈ y))
Anti-Infinity: ∀x¬∃y∀z (z ∈ x → ∀w (w = z⋃{z} → w ∈ y))

But soon they began running into trouble. The first hint that something was amiss appeared with the Axiom of Anti-Comprehension. The Axiom of Comprehension says that for any property Φ definable as a sentence of first-order ZF, and for any set X, there exists the subset of X consisting of those elements that satisfy Φ. Anti-Comprehension could thus be written:

Anti-Comprehension: ∀x∀y (y ≠ {z∈x | φ(z)})

Perhaps you can see the problem. What if φ is a tautology? Or in other words, what if φ(z) is a property satisfied by ANY and EVERY set, like, say, z=z? Then our axiom tells us that for any set x, the subset consisting of those elements that are equal to themselves cannot exist. But that’s just x itself! In other words, for any set x, x does not exist. The first Anti-Settists had unwittingly destroyed the entire universe of sets!

Another problem arose with Anti-Replacement. Recall that Replacement says that for any function F(x) definable in a sentence of first-order ZF, and for any set X, the image of X under F exists. So Anti-Replacement states that this image does not exist. But what if F is the identity function? Then Anti-Replacement tells us that the image of X under the identity map (namely, X itself) does not exist. And again, we’ve proved that no sets exist.

Quite simply, the problem the Anti-Settists were running into was that while their original axioms were too weak, their new axioms were too strong! So strong that they DESTROYED THE UNIVERSE. That’s strong.

The solution? Anti-Comprehension and Anti-Replacement were discarded. But there was another problem in the Axiom of Anti-Extensionality. Extensionality says that any two sets are the same if they share all the same elements. So Anti-Extensionality says that if two sets share all the same elements, then they are not the same. But if we compare any set X with itself using this standard, we find that X cannot equal X! This was even worse than before, because it violates the first-order tautology ∀x (x = x). Not only did this second wave of Anti-Settists destroy the universe, they also broke the rules of logic!

So Anti-Extensionality had to go. That much everybody agreed on. And the removal of this axiom settled the last of the paradoxes of Anti-Set Theory. But no sooner had the dust settled than a new controversy arose as to the nature of Extensionality…

The Great Schism: The Modern Age of Anti-Set Theory

So far, the Anti-Settists had compiled the following list of axioms:

Anti-Pairing: ∀x∀y¬∃z∀w (w ∈ z ↔ (w=x ∨ w=y))
Anti-Union: ∀x¬∃y∀z (z∈y ↔ ∃w (w∈x ∧ z∈w))
Anti-Powerset: ∀x∀y∃z (z⊆y ∧ z∉x)
Anti-Foundation: ∀x∀y (y∈x → ∃z (z∈x ∧ z∈y))
Anti-Infinity: ¬∃x∀y (y ∈ x → ∀z (z = y⋃{y} → z ∈ x))

You might have noticed that the Axiom of Anti-Infinity looks a little strange. The Axiom of Infinity tells us that there’s a set X that contains all empty sets, and such that for any set Y contained in X, if Y has a successor then that successor is also in X. So the Axiom of Anti-Infinity will say that there is no such set. We’ll prove shortly that in fact, the other axioms entail that there are no empty sets, so it’s vacuously true of every set that it “contains all empty sets.” Thus the only restriction placed on our universe by the Axiom of Anti-Infinity is that there’s no set that contains the successors of all its members with successors.

For a while, everybody agreed on this list of axioms and all was well. But after a couple of hours, new rumbles arose about Extensionality. A new contingent of Anti-Settists began arguing in favor of including the Axiom of Extensionality. It’s hard to describe exactly what was going in the minds of these individuals, who appeared to be turning their backs on everything that Anti-Set Theory was all about by accepting an ORDINARY axiom alongside all of their beautiful Anti-Axioms. Some of them expressed a concern that they had gone too far by turning their back on extensionality, and worried that Anti-Set Theory was so far removed from our intuitions that it no longer deserved to be called a theory of sets. Others pointed out that Anti-Set Theory as it was currently formulated ruled out certain models that they felt deserved to belong to the pantheon of Anti-Set Universes. The rest of the Anti-Settists called them crazy, but they persisted. The fighting reached fever pitch one afternoon in a meeting of the leading Anti-Settists, where one individual who shan’t be named accused another of “selling out” to Traditional Set Theory, and a fist-fight nearly broke out. This led to the Great Schism.

Anti-Settists fractured into two contingents that became known as the traditionalists, who advocated an extensional Anti-Set Theory, and the revolutionaries, who wanted an intensional Anti-Set Theory where two sets could share all the same elements but still be different. In recent history the fighting has cooled off, probably because both sides noticed that there actually didn’t seem to be all that huge of a difference between extensional and intensional anti-set theory. The primary realization was that the other axioms banned any objects based off of the elements they contained (so that anti-pairing, for instance, really says that there can be no set with the same elements as the pair of x and y, not just that the pair of x and y doesn’t exist).

Extensional Anti-Set Theory
Extensionality: ∀x∀y (∀z (z ∈ x ↔ z ∈ y) → x = y)
Anti-Pairing: ∀x∀y¬∃z∀w (w ∈ z ↔ (w = x ∨ w = y))
Anti-Union: ∀x¬∃y∀z (z ∈ y ↔ ∃w (w ∈ x ∧ z ∈ w))
Anti-Powerset: ∀x∀y∃z (z⊆y ∧ z∉x)
Anti-Foundation: ∀x∀y (y ∈ x → ∃z (z ∈ x ∧ z ∈ y))

Intensional Anti-Set Theory
Anti-Pairing: ∀x∀y¬∃z∀w (w ∈ z ↔ (w = x ∨ w = y))
Anti-Union: ∀x¬∃y∀z (z ∈ y ↔ ∃w (w ∈ x ∧ z ∈ w))
Anti-Powerset: ∀x∀y∃z (z⊆y ∧ z∉x)
Anti-Foundation: ∀x∀y (y ∈ x → ∃z (z ∈ x ∧ z ∈ y))

That covers the history of Anti-Set Theory up to modern times. Let’s now take a look at some of the peculiar details of the theory.

Theorem 1: No anti-sets contain exactly one element.

Proof: Suppose that there was an anti-set X such that X contained Y and nothing else. Then the sentence “Aw (w ∈ X ↔ (w = Y ∨ w = Y))” would be true. But this would be a violation of anti-pairing, as there can be no set whose elements are the same as the pair {Y, Y}. So no such anti-set can exist.

Theorem 2: No anti-sets contain exactly two elements.

Proof: Suppose there was an anti-set X such that X contained Y, Z, and nothing else. Then the sentence “Aw (w ∈ X ↔ (w = Y ∨ w = Z))” would be true. But this would be a violation of anti-pairing. So no such anti-set can exist.

Theorem 3: No anti-sets contain an empty anti-set.

Proof: Suppose that some set X contained a set Y, where Y is empty. By Anti-Foundation, every element of X must share an element with X. So Y must share some element with X. But Y contains nothing, so it can’t share any element with X. Contradiction. So no such anti-set exists.

Theorem 4: No anti-set can be its own union.

Proof: Follows trivially from the axiom of Anti-Union: UX doesn’t exist for any x, so if X = UX, then X doesn’t exist. (This rules out anti-sets with the same structure as limit ordinals.)

Theorem 5: No empty anti-sets exist.

Proof: Suppose there exists an empty anti-set X. Then the union of X is also an empty anti-set. The Axiom of Anti-Union says that there can be no anti-set with the same elements as the union of any anti-set. So there can be no empty anti-set. Contradiction, so no empty anti-set exists.

These theorems tell us a lot about the structure of anti-sets. In particular, every anti-set contains at least three elements, and each of those elements in turn contains at least three elements, and so on forever. So we only have infinite descending membership-chains of anti-sets.

Also, we’ve ruled out anti-sets with zero, one, and two elements, so you might think that no three-element anti-sets can exist either. But the same argument we used for one- and two-element anti-sets doesn’t work any longer, since pairing never produces three-element sets. In fact, the first model of Anti-Set Theory discovered contained exactly five anti-sets, each of which had three elements. This model is an intensional model, as it’s crucial that two of the sets (C and X) contain the same elements. We’ll call this the Primordial Anti-Set Model.

Universe: P, A, B, C, X
P = {A, B, C}
A = {B, C, X}
B = {A, C, X}
C = {A, B, X}
X = {A, B, X}

Here’s two attempts to visualize this model:

Beautiful! Let’s check each of the axioms to convince ourselves that it really is a valid model.

Anti-Union: U(P) = A ⋃ B ⋃ C = {A, B, C, X}. Similarly, you can show that U(A), U(B), U(C), and U(X) all contain A, B, C, and X. And {A, B, C, X} is not an anti-set in the universe, so no violations of Anti-Union occur.

Anti-Pairing: This one’s easy, since each of our sets contains three elements, and no three-element set can be formed by pairing.

Anti-Powerset: Note that the axiom of power set only says “there’s a set containing all the existing sets that are subsets of X, for each X”, but it doesn’t bring into existence any new subsets. So for instance, in this model, the power set of P is just {P}, as P is the only subset of P that exists. But {P} doesn’t exist! Similarly, for each element, its power set is just the set of itself, which doesn’t exist by anti-pairing.

Anti-Foundation: Does any set contain an element that it doesn’t have anything in common with? You can just verify this by inspection.

Anti-Infinity: (EDIT: this model and the following one actually violate Anti-Infinity! Realized this after originally posting. A future post will provide more details.)

Now, this was a model only of Intensional Anti-Set Theory, not Extensional Anti-Set Theory. Here’s a way to modify it to get a model that works in both theories:

Universe: T, A, B, C
T = {A, B, C}
A = {T, B, C}
B = {A, T, C}
C = {A, B, T}

Again, convince yourself that this universe satisfies the axioms.

Using a similar construction, we can show that there are models of anti-set theory with exactly N sets, for every N ≥ 4. Our universe: T, A1, A2, …, AN-1. T = {An | n ∈ {1,2,…,N-1}, Ak = {T} ⋃ {Aj | j ∈ {1,2,…,N-1} \ {k}}.

Many lines of research remain open in the field of Anti-Set Theory. Can Anti-Sets serve as a new foundation for mathematics? Are there models of Anti-Set Theory that contain structures isomorphic to the natural numbers? Is the theory still consistent if we include an Axiom of Anti-Choice? (Adding Anti-Choice rules out the models we’ve discussed so far. Perhaps only infinite universes are allowed with Anti-Choice.) Other proposed additions to the axioms include the simple negation of extensionality, and an “Axiom of Undefinability”, which says that no set exists whose elements satisfy a definable property. How would these additions affect the universe of sets?

I want to close by noting that set theory sometimes gets a bad rap among mathematicians and the wider community for being too abstract or not “useful enough.” We hope that the advent of Anti-Set Theory will make it plain to the public that there is a future world in which set theory can be of GREAT practical use to everybody. The possible applications of Anti-Set Theory are too numerous to count, so numerous in fact that I regard as unnecessary naming any specific examples.