Hypernaturals Simplified (Ultra Series 2)

Previous: Introduction

What is a hypernatural number? It is a collection of infinitely long sequences of natural numbers. More precisely, it is an equivalence class of these infinite sequences.

An equivalence class under what equivalence relation? This is a little tricky to describe.

I’ll start with a slight lie to simplify the story. When we see the trouble that results from our simple definition, I will reveal the true nature of the equivalence relation that gives the hypernaturals. In the process you’ll see how the notion of an ultrafilter naturally arises.

So, hypernaturals are all about infinite sequences of natural numbers. Some examples of these:

(0,1,2,3,4,5,6,7,8,9,…)
(0,1,0,2,0,3,0,4,0,5,…)
(1,2,1,2,1,2,1,2,1,2,…)
(0,2,4,6,8,10,12,14,…)
(3,1,4,1,5,9,2,6,5,3,…)

We’ll define an equivalence relation ~ between sequences as follows:

Let x and y be infinite sequences of natural numbers.
Then x ~ y iff x and y agree in all but finitely many places.

For example, (0,1,2,3,4,5,6,…) ~ (19,1,2,3,4,5,6,…), because these two sequences only disagree at one spot (the zero index).

(1,1,2,2,4,4,…) and (1,2,4,8,…) are not equivalent, because these sequences disagree at infinitely many indices (every index besides the zeroth index).

Same with (0,1,2,3,4,5,6,…) and (1,2,3,4,5,6,7,…); even though they look similar, these sequences disagree everywhere.

(2,4,6,8,10,12,14,…) and (2,0,6,0,10,0,14,…) are not equivalent, because these sequences disagree at infinitely many indices (every odd index).

One can easily check that ~ is an equivalence relation, and thus it partitions the set of sequences of naturals into equivalence classes. We’ll denote the equivalence class of the sequence (a1, a2, a3, …) as [a1, a2, a3, …]. These equivalence classes are (our first stab at the definition of) the hypernaturals!

For instance, the equivalence class of (0,0,0,0,0,…) contains (1,4,2,0,0,0,0,0,…), as well as (0,2,4,19,0,0,0,0,…), and every other sequence that eventually agrees with (0,0,0,0,…) forever. So all of these correspond to the same hypernatural number: [0,0,0,0,…]. This object is our first hypernatural number! It is in fact the hypernatural number that corresponds exactly to the ordinary natural number 0. In other words 0 = [0,0,0,0,…].

[1,1,1,1,…] is a distinct equivalence class from [0,0,0,0,…]. After all, the sequences (0,0,0,0,…) and (1,1,1,1,…) disagree everywhere. You might guess that [1,1,1,1,…] is the hypernatural analogue to the natural number 1, and you’d be right!

For any standard natural number N, the corresponding hypernatural number is [N,N,N,N,N,…], the equivalence class of the sequence consisting entirely of Ns.

Now consider the hypernatural [0,1,2,3,4,5,6,…]. Let’s call it K. Does K = N for any standard natural number N? In other words, is (0,1,2,3,4,5,6,…) ~ (N,N,N,N,N,…) true for any finite N? No! Whatever N you choose, it will only agree with (0,1,2,3,4,5,6,…) at one location. We need cofinite agreement, and here we have merely finite agreement. Not good enough! This means that K is our first nonstandard natural number!

How does K relate to the standard naturals in terms of order? We haven’t talked about how to define < on the hypernaturals yet, but it’s much the same as our definition of =.

[a1, a2, a3, …] = [b1, b2, b3, …]
iff
{ k∈ℕ | ak = bk } is cofinite

[a1, a2, a3, …] < [b1, b2, b3, …]
iff
{ k∈ℕ | ak < bk } is cofinite

Exercise: Verify that this is in fact a well-defined relation. Every equivalence class has many different possible representatives; why does the choice of representatives not matter for the purpose of describing order?

Now we can see that K > N for every standard N. Look at (0,1,2,3,4,5,…) and (N,N,N,N,…). The elements of the first sequence are only less than the elements of the second sequence at the first N indices. Then the elements of K are greater than the elements of N forever. So elements of K’s representative sequence are greater than elements of N’s representative sequence in a cofinite set of indices. Thus, K > N for every standard N. So K is an infinitely large number!

Here’s another one: K’ = [0,2,4,6,8,…]. You can see that K’ > K, because the elements of K’ are greater than those of K at all but one index (the first one). So we have another, bigger, infinite number.

Addition and multiplication are defined elementwise, so

K + K
= [0,1,2,3,4,…] + [0,1,2,3,4,…]
= [0+0, 1+1, 2+2, 3+3, 4+4, …]
= [0,2,4,6,8,…]
= K’

K’
= [0,2,4,6,8,…]
= [2⋅0, 2⋅1, 2⋅2, 2⋅3, 2⋅4, …]
= 2⋅[0,1,2,3,4,…]
= 2⋅K

Predictably, we get many many infinities. In fact, there are continuum many nonstandard hypernatural numbers!

Proof: we construct an injection f from ℝ to *ℕ. If x is a real number, then f(x) := [floor(x), floor(10x), floor(100x), floor(1000x), …]. For example, f(35.23957…) = [35,352,3523,35239,352395, …]. For any two distinct reals x and y, the sequences x and y will eventually disagree forever. So each real is mapped to a distinct hypernatural, meaning that there are no more reals than hypernaturals. At the same time, there are no more hypernaturals than reals, because there are only continuum many countable sequences of natural numbers. So |*ℕ| = |ℝ|.

It turns out that every nonstandard hypernatural number is also larger than every standard natural number. We’ll see why in a bit, but it’ll take a bit of subtlety that I’ve yet to introduce.

Now, > is transitive in ℕ. Is it also transitive in *ℕ? Yes! Suppose A > B and B > C. Choose any representative sequences (a1, a2, a3, …), (b1, b2, b3, …), and (c1, c2, c3, …) for A, B, and C. Then X = { k∈ℕ | ak > bk } and Y = { k∈ℕ | bk > ck } are both cofinite. The intersection of cofinite sets is also cofinite, meaning that X⋂Y = { k∈ℕ | ak > bk and bk > ck } ⊆ { k∈ℕ | ak > ck } is cofinite. So A > C!

It’s a good sign that > is transitive. But unfortunately, the story I’ve told you thus far starts to break down here. The greater-than relation is a total order on the natural numbers. For any naturals a and b, exactly one of the following is true: a = b, a > b, b > a. But this is not true of the hypernaturals!

Consider the two hypernatural numbers n = [0,1,0,1,0,1,…] and m = [1,0,1,0,1,0,…]. Are n and m equal? Clearly not; they disagree everywhere. So n ≠ m.

Is n > m? No. The set of indices where n’s sequence is greater than m’s sequence is {1, 3, 5, 7, …}, which is not cofinite.

So is m > n? No! The set of indices where m’s sequence is greater than n’s sequence is {0, 2, 4, 6, …}, which is also not cofinite!

So as we’ve currently defined the hypernatural numbers, the > relation is not a total relation on them. This might be fine for some purposes, but we’ll be interested in defining the hypernaturals to mirror the properties of the naturals as closely as possible. So we’ll have to tweak our definition of the hypernaturals. The tweak will occur way back at the start where we defined our equivalence relation on sequences of naturals.

Recall: we said that two sequences (a1, a2, a3, …) and (b1, b2, b3, …), are equivalent if they agree in all but finitely many places. Said another way: a ~ b if { k∈ℕ | ak = bk } is cofinite. We defined > similarly: a > b if the agreement set for > is cofinite.

The problem with this definition was that it wasn’t definitive enough. There are cases where the agreement set is neither cofinite nor finite. (Like in our previous example, where the agreement set was the evens.) In such cases, our system gives us no direction as to whether a > b or b > a. We need a criterion that still deals with all the cofinite and finite cases appropriately, but also gives us a definitive answer in every other case. In other words, for ANY possible set X of indices of agreement, either X or X’s complement must be considered “large enough” to decide in its favor.

For example, maybe we say that if { k∈ℕ | ak = bk } = {0,2,4,6,8,…}, then a > b. Now our criterion for whether a > b is: the set of indices for which ak = bk is either cofinite OR it’s the evens. This implies that [1,0,1,0,1,0,…] > [0,1,0,1,0,1,…].

Once we’ve made this choice, consistency forces us to also accept other sets besides the evens as decisive. For instance, now compare (0,0,1,0,1,0,…) and (0,1,0,1,0,1,…). The set of indices where the first is greater than the second is {2,4,6,8,…}. But notice that the first differs only cofinitely from (1,0,1,0,…), meaning that [0,0,1,0,1,0,…] = [1,0,1,0,1,0,…]. The conclusion is that [0,0,1,0,1,0,…] > [0,1,0,1,0,1,…], which says that the set of indices {2,4,6,8,…} must also be decisive. And in general, once we’ve accepted the evens as a decisive set of indices, we must also accept the evens minus any finite set.

The three criterion we’ve seen as desirable for what sets of indices will count as decisive are (1) includes all cofinite sets, (2) for any set X, either X or X’s complement is decisive, and (3) consistency. These requirements turn out to correspond perfectly to a type of mathematical object called a free ultra-filter!

In the next post, we will define ultrafilters and finalize our definition of the hypernatural numbers!

Even Numbers are Tautologies

To each atomic proposition P assign a natural number p. If that natural number is even, then its corresponding proposition is true. If the number is odd then the proposition is false.

If you have two numbers n and m and you multiply them, the result is even so long as either n or m is even. So multiplication is like or; P∨Q corresponds to pq.

Negation is like adding 1: even becomes odd and odd becomes even. So ¬P corresponds to p+1.

Consider addition: p+q is even if p and q are both even or both odd. So p+q is like the biconditional ↔.

Other connectives can be formed out of these. Take P∧Q: P∧Q is equivalent to ¬(¬P∨¬Q), which is (p+1)(q+1) + 1. So P∧Q corresponds to pq+p+q+2. P→Q is ¬P∨Q, which is (p+1)q.

The logical constants ⊤ and ⊥ correspond to numerical constants: ⊤ can be assigned 0 and ⊥ assigned 1, for instance.

Tautologies translate to algebraic expressions which are always even. For instance, P→P translates to (p+1)p, which is always even. P→(Q→P) translates to (p+1)(q+1)p, which is also always even. P∨¬P translates to p(p+1), always even.

Contradictions translate to algebraic expressions which are always odd. P∧¬P translates to (p+1)(p+2) + 1, which is always odd. And so on.

Inference rules can be understood as saying: if all the premises are even, then the conclusion will be even. Take conjunction elimination: P∧Q ⊨ P. This says that if (p+1)(q+1) + 1 is even then p is even, which ends up being right if you work it out. Modus ponens: P, P→Q ⊨ Q. This says that if p and (p+1)q are both even, then q is even. Again works out!

You can also work the other way: For any number p, p(p+1) is even. Translating into propositional logic, this says that P∨¬P is a tautology. We’ve proved the law of the excluded middle! It’s interesting to note that earlier we saw that P→P translates to (p+1)p. So in some sense the law of the excluded middle is just self-implication but with the two products reversed!

A Compact Proof of the Compactness Theorem

I’ve written up a proof of the compactness theorem before, but I recently looked it over and think that the proof can be expressed more, heh heh, compactly, than before.

So, what is the compactness theorem? It is the statement that if a set of statements is finitely satisfiable (each of its finite subsets has a model), then it’s satisfiable. The converse of this (a satisfiable set of statements is also finitely satisfiable) is trivial: a model of a set of sentences is also a model of every subset of that set. The following proof will be for propositional logic, but can be easily extended to first order logic.

The short version

Suppose that a set of sentences A is finitely satisfiable. We’ll extend A to a larger set B by giving A an “opinion” on every sentence. We can build up this extension in a series of stages: B0 is just A. Now, take any sentence a. If B0 ⋃ {a} is finitely satisfiable, define B1 to be B0 ⋃ {a}. Otherwise, define B1 to be B0 ⋃ {¬a}. Either way, B1 will be finitely satisfiable, because B0 cannot be inconsistent with both a and ¬a. When we’ve gone through every sentence, we take the union of all these extensions to form B. B is finitely satisfiable, since every finite subset of B is also a finite subset of one of the extensions that make it up. Now, we define a truth assignment V that assigns True to each propositional variable p if and only if p is in B. V satisfies B (which you can show by induction on sentences), and since A is a subset of B, V also satisfies A. So A is satisfiable.

The long(er) version

Suppose that a set of sentences A is finitely satisfiable. If a set of sentences A’ is finitely satisfiable, then for any sentence a, at least one of A’ ⋃ {a} and A’ ⋃ {¬a} is finitely satisfiable. If neither were, then we’d have two finite sets, one that entails ¬a and the other that entails a, and the union of these would be an unsatisfiable finite subset of A’.) So given any well-ordering of the sentences of the language, we can extend A one sentence at a time, maintaining finite satisfiability at each step. The union of all these extensions, call it B, is still finitely satisfiable, because any finite subset of B is also a finite subset of one of the extensions. B is also complete: for any sentence b either b ∈ B or ¬b ∈ B.

Now, define the truth assignment V as follows: for any atomic sentence b, V(b) = True iff b ∈ B. V satisfies B, which we prove by induction on sentences:

  • If b is an atomic sentence, then V(b) = True iff b ∈ B, by construction.
  • If b = ¬c for some c (for which V(c) = True iff c ∈ B), then V(b) = True iff V(c) = False iff c ∉ B iff b ∈ B.
  • If b = c ∧ d for some c and d that satisfy the induction hypothesis, then V(b) = V(c ∧ d) = True iff V(c) = V(d) = True iff c ∈ B and d ∈ B. If both c and d are in B, then ¬(c ∧ d) can’t be in B by finite satisfiability, so c ∧ d ∈ B by completeness of B. And if c ∧ d ∈ B, then neither ¬c nor ¬d can be in B by finite satisfiability. So c and d are both in B by completeness of B.

This proof by induction covers the connectives ¬ and ∧, with which you can express all other connectives, so this shows that V satisfies B. And since B is a superset of A, V satisfies A as well. This shows that any finitely satisfiable set of sentences is also satisfiable. Note that we used the axiom of choice implicitly with the instruction to well-order the language. As the language can be any size, this is equivalent to the well-ordering principle, which is equivalent in ZF to the axiom of choice. The same sort of assumption arises in the proof of the completeness theorems for first-order and propositional logics. If you reject choice, then you should also be skeptical that propositional logic and first-order logics are complete!

Defining uncountability

Most of the shocking results in mathematical logic have to do with how limited we are in terms of producing good deductive systems for mathematical concepts. (Good here means sound, complete, finitary, and computable.) No logic with a good deductive system can categorically define the natural numbers. Adding a “for finitely many” quantifier to first-order logic makes it impossible to have a good deductive system. Ditto for a “for all subsets of” quantifier. And so on.

Once you’ve accustomed yourself to the extreme logical limitations imposed on us, it comes as a bit of a shock when you realize that we still have the ability to describe ANY complicated mathematical concept. What I learned recently was that you can extend first-order logic by adding a “for uncountably many” quantifier, and still have a good deductive system for the logic! It says something that I was so impressed by this fairly moderate ability to distinguish between the countable and the uncountable. But it certainly does feel puzzling that while you can distinguish between the countable and the uncountable, you can’t distinguish between the finite and the infinite, which seems like a much more basic and fundamental division!

What’s even cooler is that the deductive system for this extended logic is extremely simple. It amounts to taking the axioms and inference rules of first-order logic, and then tacking on four additional axiom schemas. We’ll write “for uncountably many x, Φ(x)” as Ux Φ(x).

  1. ∀y∀z¬Ux (x = y ∨ x = z)
  2. ∀x (Φ(x) → Ψ(x)) → (Ux Φ(x) → Ux Ψ(x))
  3. Ux Φ(x) ↔ Uy Φ(y), where y isn’t free in Φ(x)
  4. Uy∃x Φ(x,y) → (∃xUy Φ(x,y) ∨ Ux∃y Φ(x,y))

Each of these axioms schemas is conceptually quite easy to make sense of. The first says that for any two elements y and z, there’s not uncountably many things that are each equal to one or the other. In other words, uncountable is bigger than two.

The second says that if the set of things that satisfy Φ is a subset of the set of things that satisfy Ψ, and uncountably many things satisfy Φ, then uncountably many things satisfy Ψ. In other words, if a set has an uncountable subset, then it must be uncountable itself.

The third is obvious enough that it doesn’t need explanation. The fourth, on the other hand, does. I find it helpful to think of Φ(x,y) as saying “x points at y.” Then Uy∃x Φ(x,y) becomes “uncountably many things are pointed at”, ∃xUy Φ(x,y) becomes “some person points at uncountably many things”, and Ux∃y Φ(x,y) becomes “there are uncountably many people pointing.”

So the fourth axiom just says that if uncountably many things are pointed at, then either somebody is pointing at uncountably many things, or there are uncountably many people pointing at things. Otherwise you’d only have countably many people pointing at countably many things each, and that’s not enough to get uncountably many things pointed at! So this fourth axiom can really be understood as saying that the union of countably many countable sets is countable.

✯✯✯

There’s a caveat to the above claims. This is that first-order logic extended by the “for uncountably many” quantifier only has a good deductive system IF the language is restricted to only allow countably many constant symbols. Here’s the reasoning:

First of all, if a logic has a sound, complete and finitary deductive system, then it is compact. Discussed here.

Second, if a logic L is compact and a set of sentences Σ in L has a countably infinite model, then Σ also has an uncountable model. Why? Simply append to L an uncountable set A of constant symbols, and add to Σ an uncountable set of sentences declaring that each of these constants refer to a distinct object. Now we have an uncountable model of Σ’s extension by compactness (the countably infinite model is a model of every finite subset of Σ’s extension), and this uncountable model must also satisfy Σ. (This is the only place where we make use of uncountably many constant symbols.)

Third, if a logic L is compact, and a set of sentences Σ in L has models of arbitrarily large finite size, then Σ has infinite models. The proof of this is virtually identical to the last paragraph; add infinitely many constant symbols and declare them all to refer to distinct objects. Every finite subset of this extension of Σ has a model, so by compactness the extension of Σ also has a model. This model is infinite by construction, and as a model of an extension of Σ must also be a model of Σ.

That sets up all the background model theory we need. Now, suppose that FOL+U had a good deductive system that applied even with uncountably many constant symbols. By soundness, completeness, and finiteness, FOL + U would be compact. Consider now the FOL+U theory T consisting of the single axiom ¬Ux (x = x). T cannot have uncountable models, by assumption that the quantifier Ux is really capturing the semantics of “for uncountably many x”.

But if T can’t have uncountable models, then it can’t have a countably infinite models either, by the second result above! So T doesn’t have any infinite models. And then by the third result, T cannot have models of arbitrarily large finite size! This means that asserting “there aren’t uncountably many objects”, we actually end up ruling out all countably infinite models, as well as all models larger than some finite size! This is inconsistent with the claim that the U quantifier is correctly capturing the semantics of “uncountably many”; after all, the negation of “for uncountably many” should be “for countably many”, not “for small enough finite”!

Summing up the argument:

  1. If a logic has a sound, complete, and finitary deductive system, then it is compact.
  2. If a logic L is compact, and a set of sentences Σ in L has a countable model, then Σ has an uncountable model.
  3. If a logic L is compact, and a set of sentences Σ in L has models of arbitrary finite size, then Σ has infinite models.
  4. FOL+U = First order logic + “for uncountably many” is sound, complete, and finitary.
  5. The FOL+U theory T = {¬Ux (x = x)} doesn’t have uncountable models.
  6. By 1 and 4, FOL+U is compact.
  7. By 2 and 6, if a set of sentences Σ in FOL+U has a countable model, then Σ has an uncountable model.
  8. Suppose T had countably infinite models. Then by 7, it would also have to have uncountable models, contradicting 5. So T doesn’t have countably infinite models.
  9. Suppose T had arbitrarily large finite models. Then by 3 and 6, T would have infinite models, contradicting 5 and 8. So T doesn’t have arbitrarily large finite models.
  10. Rephrasing 9, there’s some finite n such that T has no models of cardinality larger than n.

✯✯✯

What’s great about the logic FOL+U is that it allows you to talk about the property of uncountability, so long as your theories are all only countably large (which isn’t too bad a restriction). So for instance, consider the FOL+U theory consisting of the axioms of first-order Peano arithmetic plus one additional axiom: ¬Ux (x = x). This theory successfully rules out all uncountable models! This is something that you couldn’t do in ordinary first order logic, since it obeys the upward Lowenheim-Skolem property (a model of one infinite cardinality implies models of every larger infinite cardinality). The order type of all countable nonstandard models of PA is known (each is ω + ℤ⋅ℚ, a natural number line followed by countably many integer lines arranged like the rationals), and much of the difficulty of the model theory of PA involves the uncountable models. So this extension of PA is a big improvement over first-order PA!

Remember that ZFC has those pesky countable models, in which “the set of all subsets” turns out to be missing some subsets? We can get rid of all of those countable models with the FOL+U theory consisting of the axioms of ZFC + the additional axiom Ux (x = x). I’m pretty sure that this does not fix all the issues with correctly pinning down the notion of the power set — that would be too good to be true — but it is certainly an improvement!

A Coloring Problem Equivalent to the Continuum Hypothesis

Summary
Is there an infinity in between ℵ0 and 20? Is there a way to color the plane with countably many colors so that there are no monochromatic triangles lined up with the x and y axes? If these questions seem unrelated to you, then we had the same initial reaction. But it turns out that an intermediate infinity exists if and only if no such coloring exists.

CH and Coloring

Imagine taking the plane ℝ2 and coloring it entirely. You are allowed infinitely many colors, but only a countable infinity (so no using the whole continuous spectrum). Some example colorings:

The colorings don’t have to be contiguous on the plane like in the ones you just saw. They can have individual points of one color such that neighborhoods of arbitrarily small radius around them are colored differently. So we can get wacky colorings like:

Think about right triangles on the plane that line up with the x and y axes. For instance:

For each of these triangles, of the three sides one must be parallel to the x-axis and another to the y-axis. Now, for a given coloring, we can ask if there exist any triangles whose corners are all the same color. Importantly, we’re only interested in the three corner points, not the entire side length. For the earlier colorings we looked at, we can find such triangles fairly easily:

Even for our wacky coloring, it shouldn’t be too hard to find three points of the same color that meet the criterion.

Now here’s a question for you to ponder: is there any way to color the plane so that no such monochromatic right triangles exist?

I encourage you to pause here and try your best to come up with such a coloring. Use this as a chance to develop some intuitions on whether you think that a coloring like this should exist or not. It’s important to keep in mind that the amount of possible right triangles on the plane is 20, while the number of colors you have available to you is just ℵ0.

(…)

(pause for thought)

(…)

Now, I shall prove to you that the existence of such a coloring is equivalent to the Continuum Hypothesis! The rest of this post will be more involved, and even though I’ve tried to make it less scary with pictures aplenty, it will most likely take a slow and attentive read-through to grasp. But the proof is really cool, so it’s worth it. Oh and also, I have assumed some things as background knowledge for the sake of space. In particular, understanding the proof requires that you understand what ordinals are and why the set of all countable ordinals is the first uncountable ordinal.

If the Continuum Hypothesis is true, then a coloring with no monochromatic right triangles exists.

The proof outline: Consider the first uncountable ordinal, ω1, which is the set of all countable ordinals. We construct a coloring on ω1 × ω1 that satisfies the no-monochromatic-right-triangles property. If the Continuum Hypothesis is true, then |ω1| = |ℝ|, so we form a bijection from ω1 to ℝ. Finally, we use this bijection to map our coloring on ω1 × ω1 to a coloring on ℝ × ℝ, and show that coloring also satisfies the same property.

Alright, so consider ω1 × ω1. If we were to visualize it, it would look something this:

This “plane” is discrete, so in a certain sense it much more closely resembles ℕ2 than ℝ2. The set of colors we’ll be using will be labeled with integers. We split the colors into two categories, the negatives N and the positives P.

N = {-1, -2, -3, …}
P = {0, 1, 2, 3,…}

For each α in ω1, we construct two bijections f: α → N and g: α → P. These bijections exist because every element of ω1 is countable.

We use f and g to construct our coloring. For each α in ω1, we assign the color f(β) to (α, β) for each β < α, and we assign the color g(β) to (β, α), for each β ≤ α.

Since f and g are both bijections, no colors are repeated along the path from (α, 0) to (α, α) to (0, α). And we can easily show that no monochromatic right triangles exist under this coloring!

If a triangle crosses y = x, then it has at least one point colored from N and another from P. N and P are disjoint, so the triangle cannot be monochromatic.

And if a triangle doesn’t cross y = x, then it has two points lined up so as to have different colors:

So we have no monochromatic triangles! Now, assuming the Continuum Hypothesis, |ω1| = |ℝ|, so we can form a bijection M from ω1 to ℝ. We use M to map our original coloring to a new coloring on ℝ × ℝ. If (a, b) ∈ ω1 × ω1 is colored C, then we also color (M(a), M(b)) ∈ ℝ × ℝ. Under this map, non-monochromatic triangles stay non-monochromatic. So we’ve constructed a coloring on ℝ × ℝ with no monochromatic triangles!

If a coloring with no monochromatic right triangles exists, then the Continuum Hypothesis is true.

The proof outline here is to show that any infinite subset A of ℝ either has cardinality ℵ0 or 20. This shows that there are no intermediate cardinalities between these two.

Consider any infinite subset A ⊆ ℝ. For any a, define Ya to be the y-coordinates of points along the vertical line through (a, 0) that have no color repeats on that line. So Ya = {y ∈ ℝ | the color of (a, y) is different from (a, y’) for all y’ ≠ y}.

Notice that Ya must be countable, because all its elements have different colors and there are only countably many different colors to have. So |Ya| = ℵ0.

Define YA to be the union of Ya, for each a in A. Since each Ya is countable and A is at least ℵ0, |YA| ≤ |A|⋅ℵ0 = |A|. There are two cases: either YA = ℝ, or YA ≠ ℝ.

Case 1: YA = ℝ. In this case |ℝ| = |YA| ≤ |A|, so |A| ≥ |ℝ|. Since A is a subset of ℝ, this implies that |A| = |ℝ|.

Case 2: YA ≠ ℝ. In this case, there’s some real number y* in ℝ – YA. By definition y* is outside YA, so for each vertical line {x} × ℝ there is at least one point with the same color as (x, y*) besides itself.

Each (x, y*) with x ∈ A must have a different color. Proof: Suppose not. Then we have two elements of A, x1 and x2, such that the color of (x1, y*) is the same as the color of (x2, y*). By the last paragraph’s result, we can pick a third point (x1, y’) that has the same color as (x1, y*). These three points form a monochromatic right triangle, but we’ve assumed that’s impossible! Contradiction.

So each point in A × {y*} must have a different color. Thus, the cardinality of A × {y*} must be countable. And A is an infinite set, so |A| = ℵ0.

So in case one |A| = |ℝ| = 20, and in case two |A| = ℵ0. And this is true for every infinite subset A of ℝ. So every subset of ℝ is either finite, countable, or has the same cardinality as ℝ. And this is the continuum hypothesis!

No known unknowns in Peano arithmetic

Reports that say that something hasn’t happened are always interesting to me, because as we know, there are known knowns; there are things we know we know. We also know there are known unknowns; that is to say we know there are some things we do not know. But there are also unknown unknowns—the ones we don’t know we don’t know. And if one looks throughout the history of our country and other free countries, it is the latter category that tend to be the difficult ones.

Donald Rumsfeld

Rumsfeld’s statement may have been correct about American politics, but he was wrong in the context of Peano arithmetic. In Peano arithmetic, every known is a known known, and every unknown is an unknown unknown. Let me explain.

A paragraph for background: PA expresses enough arithmetic to allow it to encode sentences like “φ is provable from the axiom set T” as arithmetic properties of the natural numbers. But Peano arithmetic also has nonstandard models containing objects that aren’t natural numbers. It’s possible for a sentence about arithmetic to be true of these nonstandard models and false of the standard models. And in particular, the sentences that Gödel-encode proofs of other sentences can take on a different meaning in nonstandard models. For instance, the Gödel encoding of a sentence like “PA is consistent” looks like “There is no number with the property that it encodes a proof of 0=1 from the axioms of PA”. This might be true of the standard natural numbers, but false in some nonstandard model. There might be some nonstandard number that satisfies Gödel’s arithmetic formula for a proof of 0=1 from the axioms of PA, but which doesn’t actually encode any such proof (as only standard natural numbers Gödel-encode valid proofs). Gödel encodings are only logically equivalent to the statements they encode in the standard model of the natural numbers.

Of all the sentences of the form “φ is provable from the axiom set T”, which are provable from the axioms of Peano arithmetic? How much does Peano arithmetic know about what arbitrary formal theories prove? For all of the following results, I will assume that PA is consistent.

Our first result: If PA proves “φ is provable from T”, then φ really is provable from T.

This follows from the soundness of first-order logic. If PA proves “φ is provable from T”, then this sentence must be true in all models. In particular, it’s true in the standard model. And if it’s true in the standard model, then it must actually be true that φ is provable from T.

Second result: If φ is provable from T, then PA proves “φ is provable from T.”

This follows from three facts: (1) that the standard natural numbers are a subset of every nonstandard model, (2) that the sentence “φ is provable from T” is a statement of the form “There exists a number with some property”, and (3) that first-order logic is complete.

The Gödel encoding of “φ is provable from T” is something like “there’s some number n with the property that n encodes a proof of φ from T.” Now, suppose that φ is provable from T. Then there’s a standard natural number that encodes this proof. And since every nonstandard model contains every natural number, this number exists in all these models as well! So the statement is true in all models. So by completeness, it’s provable from PA.

So far we have that PA proves “φ is provable from T” if and only if φ is actually provable from T. Loosely, if a theory can prove something, then Peano arithmetic knows this. Even though ZFC is a stronger theory than Peano arithmetic, there’s nothing that ZFC can prove that PA doesn’t know it can prove. In particular, ZFC proves the consistency of PA, so PA proves that ZFC proves the consistency of PA! This of course doesn’t mean that PA proves its own consistency, because PA doesn’t trust ZFC to be true (it doesn’t accept the axioms of ZFC).

Furthermore, for every sentence that PA can prove, PA can prove that PA can prove it. In this sense, everything that PA knows is a known known. What we’ll see next is that for PA, every unknown is an unknown unknown. For Peano arithmetic, and in fact for ANY consistent theory that expresses enough arithmetic to be subject to Gödel’s theorems, there are no known unknowns.

Third result: If φ is not provable from PA, then “φ is not provable from PA” is not provable from PA.

This follows from Gödel’s second incompleteness theorem and the principle of explosion.

Now, the proof of our third result. Suppose “φ is not provable from PA” is provable from PA. By the principle of explosion, if PA was inconsistent then EVERYTHING would be provable from it. So by finding something that isn’t provable from its axioms, PA is able to prove its own consistency! But then, by Gödel’s second incompleteness theorem, PA must be inconsistent. This contradicts our background assumption that PA is consistent.

Note that the sentence “φ is not provable from PA” isn’t the same type of sentence as “φ is provable from PA”. As we saw a moment ago, the second is a sentence of the form “there’s some number n with a particular property,” and all of these sentences are provable from PA if and only if they’re true of the standard naturals. But the first sentence is of the form “there’s no number n with a particular property”, which is not the same! It’s possible for this sentence to be true of the standards but false of the nonstandards. All we need is for there to be no standard numbers and at least one nonstandard number with that property.

So if PA is consistent, then it can never prove that it doesn’t prove something. Now, notice that all that I’ve said applies more generally than just Peano arithmetic. For any consistent mathematical theory that express sufficient arithmetic for Gödel’s incompleteness theorems to apply, every known is a known known and every unknown is an unknown unknown.

The subtlety of Gödel’s second incompleteness theorem

Gödel’s second incompleteness theorem is an order of magnitude more subtle than his first. It’s commonly summarized as “no consistent theory strong enough to do arithmetic can prove its own consistency.” But there’s a lot of subtlety in both the “strong enough to do arithmetic” and the “prove its own consistency.” First of all, what exactly counts as strong enough to do arithmetic? Peano arithmetic certainly does, but it has an infinite axiom schema. Do any finite theories meet the criterion of “strong enough to do arithmetic”? It urns out that the answer to this is yes! Robinson arithmetic, which is what you get if you remove the infinite axiom schema of induction from PA, meets the requirement.

There are weaker theories of arithmetic, like Presburger arithmetic (which has only addition) and Skolem arithmetic (only multiplication), that don’t meet the criterion, and are therefore not subject to the incompleteness theorems. And it turns out that both of these theories are actually decidable! This is even stronger than being sound and complete; soundness and completeness tell us that there’s an algorithm that determines after a finite amount of time that a given sentence is a tautology, but not necessarily that such an algorithm exists to determine that a sentence is NOT a tautology. Decidability gives us both: not only can we classify any tautology as such in a finite time, we can also classify any non-tautology as such in a finite time.

The amount of arithmetic we need is exactly the amount that Gödel uses to prove the incompleteness theorems. This has two parts: one, the theory must express enough arithmetic to do Gödel encoding (and therefore to express the notion of “provability”), and two, the theory must be able to formalize diagonalization. Dan Willard came up with theories that formalize enough arithmetic to do the first but not the second: these are theories that can talk about their own provability via Gödel coding, but are still too weak to be subject to the incompleteness theorems. Thus, these theories can actually prove their own consistency! These fascinating theories are called self-verifying theories.

Everything I’ve said so far has been about the subtlety of the notion of “strong enough to do arithmetic”. Now I want to talk about the subtlety of the notion of “proving one’s own consistency.” I’ll do this by first taking a brief interlude to talk about an argument I recently saw against the Consistency Thesis in philosophy of math that uses this notion. A friend of mine writes:

Consistency is a weak soundness property.

Here’s what I’ll call the Consistency Thesis: (Mathematical sentences are justified and/or true when they are part of a consistent theory.) What I want to do is to raise a problem for that thesis. Maintaining the Consistency Thesis leads to bizarre mathematical beliefs which don’t lend themselves to being true or justified. A merely ‘consistent’ theory can claim P(0), P(1), and so on, and yet claim that there’s some n for which ¬P(n). Such a theory is omega-inconsistent.

There are mathematical theories which despite their syntactic consistency, claim to be inconsistent. They will claim to be able to derive that 0=1 and yet never derive 0=1. One example of a consistent but unsound theory would be this: [suppose we add to ZFC the new axiom “ZFC is inconsistent”. If ZFC is consistent, then the new theory ZFC’ is consistent (by Godel’s second incompleteness theorem), even though it falsely proves that ZFC is inconsistent.]

So far, I’ve only mentioned omega-inconsistent theories. I should also mention that there are omega-consistent theories which are arithmetically unsound. A much stronger soundness property would be soundness in omega-logic, which entails consistency, omega-consistency, and arithmetical soundness.

Here’s the response I gave:

You say that an adherent to the Consistency Thesis believes in mathematical theories that are in fact consistent (no contradictions can be proved from them) but claim to be inconsistent; for instance ZFC + ¬Con(ZFC). This bit about “claiming to be inconsistent” is subtler than it might initially seem. There’s a very important difference between Con(ZFC) and “ZFC is consistent”. A first order theory can’t talk directly about its own consistency, it can only talk about properties of the objects in its models. We are allowed an indirect method to talk about consistency of theories, Gödel encoding. But this method has problems.

Gödel encoding allows us to write down statements that, if understood to be about the natural numbers, are equivalent to the assertion that a theory proves a contradiction. But this “if understood to be about the natural numbers” is a very important qualification, because no first order theory categorically defines the natural numbers (i.e. no first order theory has as its only model the natural numbers). More generally, no theory within a logic that has a sound, complete, and finitary proof system categorically describes the natural numbers (these are the only logics that a Formalist will see as well-defined, by the way).

What this means is that when we write “Con(ZFC)”, we’re actually using a short-hand for a complicated sentence about the objects in our models, and this complicated sentence is NOT equivalent to the claim that no contradiction can be proven from ZFC. Con(ZFC) could be false in a model even if ZFC is consistent, and Con(ZFC) could be true in a model even if ZFC is inconsistent, so long as that model is not the standard natural numbers.

So the adherent of the Consistency Thesis is not actually committed to weird beliefs about a theory being consistent and claiming its own consistency; they are just committed to the belief that the natural numbers are not well-defined.

The same objection applies to the claim that they have to accept as valid theories that claim P(0), P(1), and so on, but also that there’s some n for which ¬P(n). That’s true, and that’s fine! One can just say that such theories are not theories of the standard natural numbers; the n for which ¬P(n) is some other type of mathematical object that is not a natural number.

A TL;DR for my response: “Con(T)” only means “T is consistent” if T is about the natural numbers. Furthermore, the theories that assert their own inconsistency never have the natural numbers as a model. So it’s ultimately not very weird that these theories assert “¬Con(T)”… this statement doesn’t actually mean “T is inconsistent” in any of the models of the theory!

Undecidability results on lambda diagrams

For background on how lambda diagrams work, look HERE.

The equivalence of lambda calculus with Turing machine computations, and the logical limitations on computation, combine to give us interesting limitative results on lambda diagrams.

Reducibility, Absolute and Contingent

We begin by defining a notion of reducibility. A lambda diagram is reduced if there is no beta reduction that can be applied to the diagram. Here are some examples of reduced diagrams:

And here are some examples of non-reduced diagrams:

For each of these diagrams, if you keep applying beta reductions in any order you will eventually get to a point where the diagram is reduced. But this is not the case for all lambda diagrams. Check out the following diagram for M M (the Mockingbird function applied to itself):

Applying beta reduction gives you the following:

And we’re back where we started! This tells us that our diagram is irreducible.

Sometimes there are multiple beta reductions that can be applied. In these cases, it might be that one sequence of beta reductions reduces the diagram, but another sequence goes forever without reducing it. For instance, here is the diagram for False (M M):

If we first feed (M M) as input to False, then we get a reduced diagram:

But if we first feed M to M, then what we get back is the same thing we started with:

If we keep feeding M to M, we keep getting back the same diagram and never fully reduce. When a diagram reduces on some series of beta reductions and never reduces on others, we call the diagram contingently reducible. If every strategy of beta reduction eventually reduces the diagram, we call it absolutely reducible. And if no strategy reduces the diagram, we call it absolutely irreducible.

Some interesting observations:

It’s possible for two diagrams A and B to both be absolutely reducible, but for A B to be absolutely irreducible.

It’s also possible for a diagram A to be absolutely irreducible, but the diagram A B to be contingently reducible (though never absolutely reducible).

It’s not, however, possible for A to be absolutely irreducible but A B to be absolutely reducible. Why? Simply because you can choose your reduction strategy for A B to be any of the reduction strategies for A, each of which always fails. In general, if any sub-diagram within a lambda diagram is contingently reducible, then the whole diagram cannot be absolutely reducible. Why? Simply because you can choose your reduction strategy for the whole diagram to be a reduction strategy that fails for the sub-diagram (at least one such strategy must exist, by our assumption that the sub-diagram is only contingently reducible). This same reduction strategy that fails for the sub-diagram, also fails for the whole diagram. So the whole diagram cannot be absolutely reducible.

Here’s a question that I don’t know the answer to: Is it possible for an application of of an irreducible diagram to an irreducible diagram to be reducible? I suspect not.

The distinction between contingent and absolute reducibility goes away the moment you fix a general reduction strategy that for any diagram gives a unique prescription for which reduction should be applied (if any are possible). Henceforth, I will assume that a particular reduction strategy has been fixed, and so will only speak of reducibility and irreducibility, no absolutes and contingents.

Reducibility Oracles

It might be convenient if we could always know whether a given diagram reduces or not. We might even wonder if there’s a lambda diagram that “computes” the answer to this question. Let’s suppose there is such a diagram, which we’ll call a “reducibility oracle.” It’s a black box, so we’ll denote its diagram as a box with an R:

We’ll define the behavior of the reducibility behavior as follows:

We’ll now design a larger diagram R’ that uses R:

Let’s see how R’ behaves when given an input (call it F).

Notice that if F F is reducible, then R’ F is irreducible. And if F F is irreducible, then R’ F is reducible.

So what happens if we feed R’ to itself? Well, by the logic of the above paragraph, if R’ R’ is reducible, then R’ R’ is irreducible. And if R’ R’ is irreducible, then R’ R’ is reducible. We have obtained a contradiction! And so no such reducibility oracle diagram can exist.

But wait, there’s more!

Let’s define S(N) to be the size of the smallest lambda diagram that reduces to the Church numeral N, where size means the number of lines required to build the diagram.

There’s a tight correspondence between lambda calculus and programming languages. In fact, lambda calculus can be thought of as just one specific highly abstract functional programming language (along the lines of Haskell and Lisp). Recall now that the Kolmogorov complexity K of a string s (given a particular programming language) is defined as the shortest program that outputs that string. If we choose our programming language to be lambda calculus, then we get a correspondence between K(N) and S(N). This correspondence gives us the following results:

There is no lambda diagram that serves as a “size oracle” – i.e. a lambda diagram that when fed a number N, returns S(N). (Analogous to the uncomputability of Kolmogorov complexity)

For any sound proof system F, there’s a number L such that F cannot prove that the smallest lambda diagram that reduces to any number has more than L lines. (Analogous to Chaitin’s incompleteness theorem).

Sizes of lambda diagrams

Let me make some final notes on the function S(N). Every number has a “standard form”: λfx.f (f … (f x))), with N copies of f. The lambda diagram for this looks like an upside down ladder with N steps:

This has 2N + 3 lines, so we know that for all N, S(N) ≤ 2N + 3. We can define a number as “irreducibly complex” if S(N) = 2N + 3. For instance, 0 and 1 are irreducibly complex, and I think that 2 and 3 are as well:

This raises an interesting question: What is the smallest number that isn’t irreducibly complex? Another question is whether there are arbitrarily large irreducibly complex numbers (I strongly suspect not, but am not positive).

We can find some upper bounds for S(N), where N is a product or a power, as follows:

In general, for any computable function f whatsoever, there is a lambda diagram for f, to which we can append the diagram for any number N to represent f(N). The size of this diagram will just be the size of the diagram for f, plus the size of the diagram for N, plus 1 (for the line connecting the two). This means that if a number M can be written as f(N) for some computable f, then we can produce a diagram for M whose size is some constant + 2N. Thus S(f(N)) grows at most linearly with respect to N, no matter how fast-growing f is.

The Anti-Set Program

In ZFC set theory, we specify a collection of sentences within a first-order language to count as our axioms. The models of this collection of sentences are the set-theoretic universes (and many of these models are “unintended” – pesky perversions in which the set of naturals ω is uncountably large, as one example – but we’ll put this aside for this post). Most of the axioms act as constraints that all sets must follow. For instance, the axiom of pairing says that “For any sets x and y, there must exist another set containing as elements just x and y and nothing else”. This is an axiom that begins with universal quantification over the sets of the universe, and then states some requirement that must hold of all these sets.

The anti-set program is what you get when you take each of these restriction axioms, and negate the restriction it imposes on all sets. So, for instance, the axiom of anti-pairing says that for any sets x and y, there must NOT exist any set {x, y}. Contrast this with the simple negation of pairing, which would tell us only that there exist two sets x and y such that their pair doesn’t exist. The anti-axiom is much stronger than the negated axiom, in that it requires NO pairs to exist.

Not all axioms begin with universal quantifiers, in particular the axiom of infinity, which simply asserts that a set exists that satisfies a certain property. To form the axiom of anti-infinity, we simply negate the original axiom (so that no sets with that property exist).

As it turns out, the anti-set program, if applied to ALL the axioms of ZFC, ends in disaster and paradox. In particular, a contradiction can be derived from anti-comprehension, from anti-replacement, and from anti-extensionality. We don’t handle these cases the same way. Anti-comprehension and anti-replacement are simply discarded, being too difficult to patch. By contrast, anti-extensionality is replaced by ordinary extensionality. What’s up with that? The philosophical justification is simply that extensionality, being the most a priori of the bunch, is needed to justify us calling the objects in our universe sets at all.

There’s one last consideration we must address, which regards the axiom of anti-choice. I am currently uncertain as to whether adding this axiom makes the theory inconsistent. One thing that is currently known about the axiom of anti-choice is that with its addition, out go all finite models (there’s a really pretty proof of this that I won’t include here). In the rest of this post, I will be excluding anti-choice from the axioms and only exploring models of anti-ZF.

With that background behind us, let me list the axioms of anti-ZF.

Anti-Pairing
∀x∀y∀z∃w (w ∈ z ∧ w ≠ x ∧ w ≠ y)
No set is the pair of two others.

Anti-Union
∀x∀y∃z (z ∈ y ∧ ¬∃w (z ∈ w ∧ w ∈ x))
No set is the union of another.

Anti-Powerset
∀x∀y∃z (z ∈ y ∧ ∃w (w ∈ z ∧ w ∉ x))
No set contains all existing subsets of another.

Anti-Foundation
∀x∀y (y ∈ x → ∃z (z ∈ x ∧ z ∈ y))
Every set’s members have at least one element in common with it.

Anti-Infinity
∀x∃y ((y is empty ∧ y ∉ x) ∨ (y ∈ x ∧ ∃z (z = S(y) ∧ z ∉ x)))
Every set either doesn’t contain all empty sets, or has an element whose successor is outside the set.

If the axioms of ZF are considered to be a maximally nice setting for mathematics, then perhaps the axioms of anti-ZF can be considered to be maximally bad for mathematics.

We need to now address some issues involving the axiom of anti-infinity. First, the abbreviations used: “y is empty” is shorthand for “∀z (z ∉ y)” and “z = S(y)” is shorthand for “∀w (w ∈ z ↔ (w ∈ y ∨ w = y))” (i.e. z = y ⋃ {y}).

Second, it turns out that from the other axioms we can prove that there are no empty sets. So the first part of the disjunction is always false, meaning that the second part must always be true. Thus we can simplify the axiom of anti-infinity to the following statement, which is logically equivalent in the context of the other axioms:

Anti-Infinity
∀x∃y (y ∈ x ∧ ∃z (z = S(y) ∧ z ∉ x))
Every set has an element whose successor is outside the set.

Let’s now prove some elementary consequences of these axioms.

Theorems

>> There are no empty sets.
Anti-Union ⊢ ¬∃x∀y (y ∉ x)
Suppose ∃x∀y (y ∉ x). Call this set ∅. So ∀y (y ∉ ∅) Then ⋃∅ = ∅. But this implies that the union of ∅ exists. This contradicts anti-union.

>> There are no one-element sets.
Anti-Pairing ⊢ ¬∃x∃y∀z (z ∈ x ↔ z = y)
Suppose that ∃x∃y∀z (z ∈ x ↔ z = y). Then ∃x∃y∀z (z ∈ x ↔ (z = y ∨ z = y)). But this is a violation of anti-pairing, as then x would be the pair of y and y. Contradiction.

>> There are no two-element sets.
Anti-Pairing ⊢ ¬∃x∃y∃z∀w (w ∈ x ↔ (w = y ∨ w = z))
Suppose that ∃x∃y∃z∀w (w ∈ x ↔ (w = y ∨ w = z)). Then w is the pair of y and z, so anti-pairing is violated. Contradiction.

>> No models of anti-ZF have just N-element sets (for any finite N).
Suppose that a model of anti-ZF had only N-element sets. Take any of these sets and call it X. By anti-infinity, X must contain an element with a successor that is outside the set. Call this element Y and its successor S(Y). Y cannot be its own successor, as then its successor would be inside the set. This means that Y ∉ Y. Also, Y is an N-element set by assumption. But since Y ≠ S(Y), S(Y) must contain all the elements of Y in addition to Y itself. So S(Y) contains N+1 elements. Contradiction.

>> There is no set of all sets.
Suppose that there is such a set, and call it X. By Anti-Infinity, X must contain an element Y with a successor that’s outside of X. But no set is outside of X, by assumption! Contradiction.

>> No N-element model of anti-ZF can have N-1 sets that each contain N-1 elements.
Suppose that this were true. Consider any of these sets that contain N-1 elements, and call it X. By the same argument made two above, this set must contain an element whose successor contains N elements. But there are only N sets in the model, so this is a set of all sets! But we already know that no such set can exist. Contradiction.

>> Every finite set of disjoint sets must be its own choice function.
By a choice function for a set X, I mean a set that contains exactly one element in common with each element of X. Suppose that X is a finite set of disjoint sets. Let’s give the elements of X names: A1, A2, A3, …, AN. By anti-foundation, each An must contain at least one element of X. Since the Ans are disjoint, these elements cannot be the same. Also, since there are only N elements of X, no An can contain more than one element of X. So each element of X contains exactly one element of X. Thus X is a choice function for X!

The next results come from a program I wrote that finds models of anti-ZF of a given size.

>> There are no models of size one, two, three or four.

>> There are exactly two non-isomorphic models of size five.

Here are pictures of the two:

Now, some conjectures! I’m pretty sure of each of these, especially Conjecture 2, but haven’t been able to prove them.

Conjecture 1: There’s always a set that contains itself.

Conjecture 2: There can be no sets of disjoint sets.

Conjecture 3: In an N-element model, there are never less than three sets with fewer than N-1 elements.

How to draw lambda diagrams

If you don’t want spoilers for my puzzle a few days ago, don’t read ahead!

I think lambda diagrams are extremely cool, and haven’t seen any detailed description on how they work online. I’ll start by showing some very simple examples of lambda diagrams, and then build up to more complicated ones.

First of all, what are lambda diagrams? They are pictorial representations of lambda expressions, and hence count as a pictorial system for a large portion of mathematics. I will assume that you understand how lambda calculus works for this post, and if you aren’t familiar then you can do no better than this video for a basic introduction.

The Basics

Let’s start simple: the lambda expression for “True”: λx.λy.x

Each time we see a new bound variable, we draw a horizontal line to represent that variable. So parsing from left to right, we start with a horizontal line for x.

Next is “λy.“, so we insert another horizontal line for y (below the first horizontal line).

Next we have an x in the function body, which we represent by a vertical line leaving the x line:

And there we have it, the lambda diagram for True!

Now let’s look at False: λx.λy.y

We start the same way as before, with two horizontal lines for our variables x and y:

But now we have a y in our function body, not an x. So we draw our vertical line leaving the y line instead of the x line:

And that’s the diagram for False! The actual diagram needn’t have the labels “x” and “y” in it, because our convention that we draw lines for new variables below existing horizontal lines uniquely specifies which lines stand for which variables.

Let’s now do a slightly more complicated function: λx.λy.y x

As before, we start with our two variables, x and y.

Now we have a y in the function body, so we add a vertical line leaving the y bar:

Next is an x, which is being fed as input to the y. We draw this as follows:

Two things have happened here: First I’ve moved the original vertical line for y over to the left to make space. Next, I’ve represented “feeding x to y as input” as a line leaving the x bar and terminating at the vertical line for y.

Take a guess what λx.λy.x y would look like!

Here it is:

Next, let’s look at λx.λy.x x y (which is incidentally the “or” function).

We start with the introduction of the variables x and y.

Next, an x:

And another x:

And finally a y:

Notice that this y connects below the first connection, to the original branch. What would it mean if it were connected to the second branch?

As you can see, this would indicate that y is first fed to x, and then the result of THAT is fed to x.

With these principles, you should now be able to draw out diagrams for much more complicated lambda expressions. Try this one: λx.λy.λz.z (y x) (y z x) x

Here it is!

Make sure that this makes sense before moving on!

Lambda Expressions within Lambda Expressions

Next, we’ll look at how to deal with lambda expressions that contain new lambda expressions within their function body. Here’s a simple example: λx.λy.x (λz.z) y

Everything up until the third λ is the same as always:

Now, to deal with our new variable z, we draw a new horizontal line. But importantly, this horizontal line comes after the vertical line for the x that has already been used!

After the “λz.” we have a “z“, so we draw a line from the z bar to our original vertical line leaving x.

And finally, after all of this we feed y to the original vertical line:

Try this one: λx.λy.x (λz.z z) (λw.λv.v (w w)) y

And here’s another one, where the inside function uses outside variables: λx.λy.x (λz.y z)

Alright, the final category you need to know to understand how to draw any lambda diagram whatsoever is…

Function Application

Suppose we have the lambda expression (λx.λy.x) (λz.z z). We first draw the lambda diagrams for each of the two component expressions side by side:

And now we simply attach the second to the first, indicating that the entire second lambda expression is fed as input to the first!

Here’s another example for you to try: (λx.x) (λy.λz.z z y) (λw.λv.v (v w))

And one more example, this one using all the tricks we’ve seen so far: (λx.x (λy.λz.z y y) (λw.λv.w (v x))) (λu.u u)

Beta Reduction

The final thing to know about lambda diagrams is how to actually do computations with them. The basic idea is that when we have one lambda diagram fed as input to another, we can substitute the “input” diagram for each vertical line leaving the topmost horizontal bar, and then erase this bar. Let’s look at some examples:

You can also do function application within larger lambda diagrams. Take a look at the following example, which is a calculation that shows that the successor of zero is one:

The first beta reduction here is just like the previous ones we’ve seen. But the second one does a substitution within the main lambda expression, as does the third. This works in much the same way as the earlier reductions we saw, the primary difference being that references to variables outside the scope of the function being applied must be maintained. You can see this in the final step above, where we remove the line representing the variable y, and attach the line connecting to it to the line representing the variable x.

Now for the fun part! I’ve found some great animations of calculations using lambda diagrams on Youtube. Each of them is using just the rules I’ve described above, nothing more! And I must say that the music is delightful. Take a look!

Beautiful!