Nonstandard integers, rationals, and reals

In discussions of first-order logic and set theory, one often hears about nonstandard natural numbers and their unusual properties. But rarely do you hear about nonstandard integers, nonstandard rationals, or nonstandard reals. This post will be devoted to the ways in which these other number systems pick up unintended interpretations in first-order logic, and in particular first-order ZFC.

Nonstandard ℕ

Nonstandard interpretations of the natural numbers are the most well-known, so I won’t spend much time on them. In ZFC, the set of natural numbers is called ω, and is defined to be the intersection of all inductive sets. An inductive set is one that includes zero and is closed under successorship (if n is in the set, then so is S(n)).

This definition feels so right when you think about it for a little while that it’s hard to see how it could go wrong. How could there be anything OTHER than the standard natural numbers in the intersection of all inductive sets?

Well, one can prove from the compactness of first order logic that there must be such nonstandard numbers, and the proof is so simple and pretty that it’s probably already appeared on this blog six or seven times. Add to ZFC a constant K and the axioms “K ∈ ω”, “K > 0”, “K > 1”, “K > 2”, and so on for all the standard naturals (this is a decidable set of sentences). Every finite subset of this new theory has a model, so by compactness the theory as a whole also has a model. This model is nonstandard by construction, and is also a model of ZFC by monotonicity (removing axioms never removes models).

So how do nonstandard naturals appear in the intersection of all inductive sets? The key term to focus on is the “all” in “all inductive sets”. While ZFC guarantees the existence of an inductive set (by the axiom of infinity) and every definable subset of this set (by comprehension/separation), it does not actually guarantee the existence of ALL subsets of this set. (This is actually a general feature of first-order theories, that they are able to guarantee the existence of all definable subsets of a set, but not all subsets.) There are models where the inductive set given to us by the axiom of infinity is enormous (uncountably large), and all of its subsets contain (in addition to the standard naturals) infinitely descending membership chains of sets, each of which contains every standard natural. In these models, omega is not just the standard naturals, but also includes these infinite elements, these numbers with infinitely many predecessors. And the proof from compactness shows us that we can’t eliminate these nonstandard natural numbers.

Nonstandard ℤ

The usual way of defining the integers in ZFC is as equivalence classes of pairs of natural numbers. In particular, we define the equivalence relation ~ on ω×ω as follows: (a, b) ~ (c, d) if and only if a + d = b + c. Two ordered pairs of naturals are equivalent under this relation so long as their difference is the same. So (1, 0) is in the same class as (2, 1), which is in the same class as (15, 14).

The idea here is that the integer represented by the equivalence class of (a, b) is supposed to be what we think of as a – b. So the integer 2 is the equivalence class {(2, 0), (3, 1), (4, 2), (5, 3), …}. And the integer -2 is the equivalence class {(0, 2), (1, 3), (2, 4), (3, 5), …}. For shorthand, we’ll write the equivalence class of (a, b) as [a, b]. Thus for each positive integer n, we can write n = [n, 0], and for each negative integer n, n = [0, n].

Addition and multiplication can be defined on the integers in terms of addition and multiplication on ω.

[a, b] + [c, d] = [a + c, b + d]
[a, b] ⋅ [c, d] = [a⋅c + b⋅d, a⋅d + b⋅c]

Check to make sure that this makes sense in light of our interpretation of [a, b] as a – b. We can also define new operations on the integers that didn’t apply to the naturals, like negation.

-[a, b] = [b, a]

And we can define an order on the integers as follows:

[a, b] < [c, d] if and only if a + d < b + c

Check to make sure that this makes sense.

Now, the integers are built out of ω, so it makes sense that the nonstandard interpretations of ω carry over to nonstandard interpretations of ℤ. But there are in fact two ways in which integers can be nonstandard.

(1) The natural number 0 refers to the same object (the empty set) in every model of ZFC. But the set of natural numbers ω refers to different objects in different models, as we saw a moment ago. In this sense, ω is not categorically defined, while 0 is. But since integers are infinite sets of pairs of elements of ω, individual integers aren’t categorically defined either!

For instance, the integer 0 is the set {(n, m) ∈ ω×ω | n = m}. The number of elements in this set is the same as the number of elements of ω. So the integer 0 has as many elements as ω! This means that in a nonstandard model of ω that has nonstandard numbers like K, 0 will also contain nonstandard ordered pairs like (K, K) and (K+1, K+1).

(2) Consider the nonstandard element of ω that we’re calling K, which is larger than every standard natural number. There’s an integer [K, 0], which is the equivalence class of the ordered pair (K, 0). [K, 0] > [1, 0], because K + 0 > 0 + 1. And [K, 0] > [55, 0], because K + 0 > 0 + 55. And so on for every finite integer. Just like ω, there are models in which ℤ has integers larger than all the standard integers.

But unlike ω, in nonstandard models ℤ also has nonstandard integers less than all standard integers. Consider -[K, 0] = [0, K]. Again, you can prove that this integer is less than the integer 0, the integer -55, and so on for every standard integer you can think of.

From now on I’ll write the integer [n, 0] as n and the integer [0, n] as -n.

Nonstandard ℚ

Just as the integers were equivalence classes of pairs of naturals, the rationals are equivalence classes of pairs of integers. We define a new equivalence relation (which I’ll use the same symbol for) as follows:

For integers a, b, c, and d, (a, b) ~ (c, d) if and only if a⋅d = b⋅c.

This equates any two pairs of integers that have the same ratio. So (1, 2) ~ (2, 4) ~ (15, 30), and (-33, 17) ~ (66, -34) ~ (-99, 51). The equivalence classes of this relationship are the rational numbers. Like before, we’ll write [a, b] to represent the equivalence class that (a, b) belongs to. You can think of [a, b] as representing the rational number a/b.

Defining addition, multiplication, subtraction, and division on the rationals is pretty easy:

[a, b] + [c, d] = [a⋅d + b⋅c, b⋅d]
[a, b] ⋅ [c, d] = [a⋅c, b⋅d]
[a, b] – [c, d] = [a⋅d – b⋅c, b⋅d]
[a, b] / [c, d] = [a⋅d, b⋅c]

And the order is similarly easy to define in terms of the order on integers:

[a, b] < [c, d] if and only if a⋅d < b⋅c

Once we’ve moved to ℤ, there’s an explosion of nonstandard elements. For instance, we still have nonstandard rationals larger than every ordinary rational (like [K, 1]). But now we also have rationals like [1, K]. This rational is smaller than every ordinary positive rational [1, 10], [1, 100], [1, 1000], and so on. And it is greater than the rational number 0! So in other words we now have infinitesimal rational numbers!

What’s more, we can construct a rational like [K+1, K]. This guy is infinitesimally larger than the rational number 1. So is [K+1, K⋅2], but this one is even closer to 1 than the previous. And we also have [K+1, K2], which is even closer!

The nonstandard rational number line is crowded with these infinitesimal and infinite elements. The ordinary rationals are usually thought to be dense, but we’ve just seen that there are nonstandard rationals that are “too close together” to fit any standard rationals in. However, between any two nonstandard rationals K and K’, there’s another nonstandard rational (K + K’) / 2.

If you’ve heard of the hyperreals, you might be wondering if any of the nonstandard rationals have the same structure. The simple reason why they don’t is that the hyperreals are complete – there are no gaps – whereas the rationals are not. For instance, ZFC can prove that there is no rational [a, b] such that [a, b] ⋅ [a, b] = 2, but there is a hyperreal with that property. To get completion of the number line, we have to move on to the next step, the reals.

Nonstandard ℝ

The construction of the reals from the rationals is slightly different from the previous constructions. Instead of considering ordered pairs of rationals, we’ll consider sets of rationals. In particular, we’ll look at nonempty sets of rationals that are closed downwards (if n is in the set and m < n, then m is in the set also), and have no greatest element, but don’t include all rationals.

Each real number is one of these subsets, and ℝ is the set of all such subsets. So for instance, the real number √2 is the set {x ∈ ℚ : x2 < 2}. The order on the reals is simple to describe:

x ≤ y if and only if x ⊆ y

Addition is also easily defined:

x + y = {r + s : (r ∈ x) ∧ (s ∈ y)}

Multiplication of reals has a messier definition, but it’s nothing too crazy. And importantly, ZFC can prove that ℝ has the following feature: any non-empty subset of ℝ which is bounded above has a least upper bound.

Now, in nonstandard models, there are real numbers like K, 1/K, 1 + 1/K2 and so on. But now there’s also real numbers like √K, 2K, log(K), and any other function you can define on the reals, applied to infinite reals and infinitesimals. So in one sense, the nonstandard models have many more reals than the “ordinary reals” we learn about in calculus.

But the way we constructed the reals as subsets of the rationals opened up a new type of nonstandard phenomena, in which there end up being many fewer reals that we expect there to be. Remember that we identified reals with subsets of the rationals, and that earlier we said that there is in general no way to guarantee the existence of all subsets of an infinite set. The same applies here; when we say that ℝ = {A ⊆ ℚ | A is a Dedekind cut}, we are actually only guaranteeing the existence of those reals that can be definable as Dedekind cuts. So for instance, ZFC can prove the existence of √2 as a real number, because ZFC can define the Dedekind cut {x ∈ ℚ : x2 < 2}. Same with most real numbers you’re probably familiar with, like π and e. But considering that there are only countably many definitions in ZFC, and uncountably many reals, there must be uncountably many reals that are undefinable! These undefinable reals cannot be proven to exist, and thus there are models in which they don’t exist.

In fact, the set ℝ is the first of the sets we’ve discussed that not only has nonstandard interpretations in which it’s too large, but also nonstandard interpretations in which it’s too small. There are models of ZFC where ℝ has only countably many items! There’s a subtlety here, which is that ZFC can prove that |ℝ| > |ω|. How is this possible if there are models where ℝ is countable?

It’s possible because these models, which are missing many real numbers, are ALSO missing lots of functions, in particular the functions that would put ℝ and ω in bijective correspondence! So even though ℝ is “in reality” countable in these models, the model itself doesn’t know that, because it’s missing the functions that prove its countability.

Transfinite Nim: uncomputable games and games whose winner depends on the Continuum Hypothesis

In the game of Nim, you start with piles of various (whole number) heights. Each step, a player chooses one pile and shrinks it by some non-zero amount. Once a pile’s height has been shrunk to zero, it can no longer be selected by a player for shrinking. The winner of the game is the one that takes the last pile to zero.

Here’s a sample game of Nim:

Starting state
3, 2
After Frank’s move
2, 2
After Marie’s move
2, 1
After Frank’s move
0, 1
After Marie’s move
0, 0

Marie takes the last pile to zero, so she is the winner. Frank’s second-to last move was a big mistake; by reducing the first pile from 2 to 0, he left the only remaining pile free to be taken by Marie. In a game of Nim, you should never leave only one pile remaining at the end of your turn. If Frank had instead shrunk the first pile from 2 to 1, then the state of the piles would be (1, 1). Marie would be forced to shrink one of the two piles to zero, leaving Frank to take the final pile and win.

The strategy of Nim with two piles is extremely simple: in your turn you should always even out the two piles if possible. This is only possible if the heights are different at the start of your turn. See if you can figure out why this strategy guarantees a win!

Transfinite Nim is a version of Nim where the piles are allowed to take infinite ordinal values. So for instance, a game might have the starting position:

Starting state
ω2 + ω, ω1 + ε0

If Marie is moving first, then can she guarantee a win? What move should she make?

It turns out that the strategy for two-pile Transfinite Nim is exactly the same as for Finite Nim. Marie has a guaranteed win, because the two piles are different values. Each move she’ll just even the piles out. So for her first move, she should do the following:

Starting state
ω2 + ω, ω1 + ε0
After Marie’s move
ω2 + ω, ω2 + ω

No matter what Frank does next, Marie can just “copy” that move on the other pile, guaranteeing that Marie always has a move as long as Frank does. This proves that Marie must have the last move, and therefore win.

One important feature of Transfinite Nim is that even though we’re dealing with infinitely large piles, every game can only last finitely long. In other words, Frank has no strategy for delaying his loss infinitely long, and thus forcing a sort of “stalemate by exhaustion.” This is because the ordinals are well-ordered, and any decreasing sequence of well-ordered items must terminate. (Why? Just consider the definition of a well-ordered set: every subset has a least element. If the game were to continue infinitely long, each step decreasing the state but never terminating, then the sequence of states would be a subset of the ordinals which has no least element!)

Although the strategy of Transfinite Nim is no more interesting than Finite Nim, the game does have some interesting features that it inherits from the ordinals. There are sets of ordinal numbers such that the ordering between them is uncomputable (i.e. that no Turing machine can take as input any two ordinals from that set and correctly assess whether one is larger than the other). For such sets, the ability to compute a winning strategy is called into question.

For instance, the set of all countable ordinals is uncomputable. The quick proof is that there are uncountably many countable ordinals – otherwise in ZFC the set of countable ordinals would itself be a countable ordinal and would thus contain itself – and any Turing machine can only compare countably many things.

The smallest uncomputable ordinal (which, in ZFC, is exactly the set of all computable ordinals) is called the Church Kleene ordinal and written ω1CK. Imagine the starting state of the game is two different ordinals that are both larger than ω1CK. If you’re moving first, then you have to determine which of the two ordinals is larger, in order to even them out. But this is not in general possible! So even if you go first and the two piles are different sizes, you might not be able to guarantee a win.

Suppose Marie is allowed uncomputable strategies, and Frank is only allowed computable strategies. Suppose further that the starting state involves two ordinals A and B, both larger than the Church-Kleene, and that the ordinals are expressed in some standard notation (so that you can’t write the same ordinal two different ways). There are a few cases.

Case 1: A = B, Marie goes first.
Marie decreases one of the two ordinals. Despite not being able to compute the order on the ordinals, Frank can just mimic her move. This will continue until Frank wins.

Case 2: A = B, Frank goes first.
Frank decreases one of the two ordinals, and Marie mimics. Marie eventually wins.

Case 3: A ≠ B, Marie goes first.
Marie can tell which of the ordinals is larger, and decreases that one to even out the two piles. Marie wins.

Case 4: A ≠ B, Frank goes first.
Frank can’t tell which of the ordinals is larger and can’t try to even them out, as doing so might result in an invalid move (trying to increase the smaller pile to the height of the larger one). So Frank does some random move, after which Marie is able to even out the two piles. Marie wins.

Finally, here’s a starting state for a game of Transfinite Nim:

ω1, ℶ1

ω1 is the first uncountable ordinal, and ℶ1 is the first ordinal with continuum cardinality. Frank goes first. Does he have a winning strategy?

It depends on whether ω1 = ℶ1. If the two are equal, then Frank can’t win, because he’s starting with two even piles. And if ω1 < ℶ1, then Marie can’t win, because Frank can decrease the ℶ1 pile to ω1.

If we suppose that the players must be able to prove a move’s validity in ZFC before playing that move, then the first player couldn’t decrease the ℶ1 pile to ω1. The first player still has to do something, and whatever he does will change the state to two ordinals that are comparable by ZFC.

What about larger starting ordinals whose size comparison is independent of ZFC, like ω15 and ℶ15? If the new state after the first player’s move move also involves two ordinals whose size comparison is independent of ZFC, then the second player will also be unable to even them out. This continues until one of them eventually decreases a pile to an ordinal whose size is comparable by ZFC to the other pile. So the winner will depend on who knows more pairs of ordinals less than the starting values with values that ZFC can’t compare. In fact, each player wants to force the other player to make the values ZFC-comparable, so they’ll be able to even the piles out on their turn.

A Coloring Problem Equivalent to the Continuum Hypothesis

Summary
Is there an infinity in between ℵ0 and 20? Is there a way to color the plane with countably many colors so that there are no monochromatic triangles lined up with the x and y axes? If these questions seem unrelated to you, then we had the same initial reaction. But it turns out that an intermediate infinity exists if and only if no such coloring exists.

CH and Coloring

Imagine taking the plane ℝ2 and coloring it entirely. You are allowed infinitely many colors, but only a countable infinity (so no using the whole continuous spectrum). Some example colorings:

The colorings don’t have to be contiguous on the plane like in the ones you just saw. They can have individual points of one color such that neighborhoods of arbitrarily small radius around them are colored differently. So we can get wacky colorings like:

Think about right triangles on the plane that line up with the x and y axes. For instance:

For each of these triangles, of the three sides one must be parallel to the x-axis and another to the y-axis. Now, for a given coloring, we can ask if there exist any triangles whose corners are all the same color. Importantly, we’re only interested in the three corner points, not the entire side length. For the earlier colorings we looked at, we can find such triangles fairly easily:

Even for our wacky coloring, it shouldn’t be too hard to find three points of the same color that meet the criterion.

Now here’s a question for you to ponder: is there any way to color the plane so that no such monochromatic right triangles exist?

I encourage you to pause here and try your best to come up with such a coloring. Use this as a chance to develop some intuitions on whether you think that a coloring like this should exist or not. It’s important to keep in mind that the amount of possible right triangles on the plane is 20, while the number of colors you have available to you is just ℵ0.

(…)

(pause for thought)

(…)

Now, I shall prove to you that the existence of such a coloring is equivalent to the Continuum Hypothesis! The rest of this post will be more involved, and even though I’ve tried to make it less scary with pictures aplenty, it will most likely take a slow and attentive read-through to grasp. But the proof is really cool, so it’s worth it. Oh and also, I have assumed some things as background knowledge for the sake of space. In particular, understanding the proof requires that you understand what ordinals are and why the set of all countable ordinals is the first uncountable ordinal.

If the Continuum Hypothesis is true, then a coloring with no monochromatic right triangles exists.

The proof outline: Consider the first uncountable ordinal, ω1, which is the set of all countable ordinals. We construct a coloring on ω1 × ω1 that satisfies the no-monochromatic-right-triangles property. If the Continuum Hypothesis is true, then |ω1| = |ℝ|, so we form a bijection from ω1 to ℝ. Finally, we use this bijection to map our coloring on ω1 × ω1 to a coloring on ℝ × ℝ, and show that coloring also satisfies the same property.

Alright, so consider ω1 × ω1. If we were to visualize it, it would look something this:

This “plane” is discrete, so in a certain sense it much more closely resembles ℕ2 than ℝ2. The set of colors we’ll be using will be labeled with integers. We split the colors into two categories, the negatives N and the positives P.

N = {-1, -2, -3, …}
P = {0, 1, 2, 3,…}

For each α in ω1, we construct two bijections f: α → N and g: α → P. These bijections exist because every element of ω1 is countable.

We use f and g to construct our coloring. For each α in ω1, we assign the color f(β) to (α, β) for each β < α, and we assign the color g(β) to (β, α), for each β ≤ α.

Since f and g are both bijections, no colors are repeated along the path from (α, 0) to (α, α) to (0, α). And we can easily show that no monochromatic right triangles exist under this coloring!

If a triangle crosses y = x, then it has at least one point colored from N and another from P. N and P are disjoint, so the triangle cannot be monochromatic.

And if a triangle doesn’t cross y = x, then it has two points lined up so as to have different colors:

So we have no monochromatic triangles! Now, assuming the Continuum Hypothesis, |ω1| = |ℝ|, so we can form a bijection M from ω1 to ℝ. We use M to map our original coloring to a new coloring on ℝ × ℝ. If (a, b) ∈ ω1 × ω1 is colored C, then we also color (M(a), M(b)) ∈ ℝ × ℝ. Under this map, non-monochromatic triangles stay non-monochromatic. So we’ve constructed a coloring on ℝ × ℝ with no monochromatic triangles!

If a coloring with no monochromatic right triangles exists, then the Continuum Hypothesis is true.

The proof outline here is to show that any infinite subset A of ℝ either has cardinality ℵ0 or 20. This shows that there are no intermediate cardinalities between these two.

Consider any infinite subset A ⊆ ℝ. For any a, define Ya to be the y-coordinates of points along the vertical line through (a, 0) that have no color repeats on that line. So Ya = {y ∈ ℝ | the color of (a, y) is different from (a, y’) for all y’ ≠ y}.

Notice that Ya must be countable, because all its elements have different colors and there are only countably many different colors to have. So |Ya| = ℵ0.

Define YA to be the union of Ya, for each a in A. Since each Ya is countable and A is at least ℵ0, |YA| ≤ |A|⋅ℵ0 = |A|. There are two cases: either YA = ℝ, or YA ≠ ℝ.

Case 1: YA = ℝ. In this case |ℝ| = |YA| ≤ |A|, so |A| ≥ |ℝ|. Since A is a subset of ℝ, this implies that |A| = |ℝ|.

Case 2: YA ≠ ℝ. In this case, there’s some real number y* in ℝ – YA. By definition y* is outside YA, so for each vertical line {x} × ℝ there is at least one point with the same color as (x, y*) besides itself.

Each (x, y*) with x ∈ A must have a different color. Proof: Suppose not. Then we have two elements of A, x1 and x2, such that the color of (x1, y*) is the same as the color of (x2, y*). By the last paragraph’s result, we can pick a third point (x1, y’) that has the same color as (x1, y*). These three points form a monochromatic right triangle, but we’ve assumed that’s impossible! Contradiction

So each point in A × {y} must have a different color. Thus, the cardinality of A × {y} must be countable. A is an infinite set, so |A| = ℵ0.

So in case one |A| = |ℝ| = 20, and in case two |A| = ℵ0. And this is true for every infinite subset A of ℝ. So every subset of ℝ is either finite, countable, or has the same cardinality as ℝ. And this is the continuum hypothesis!

No known unknowns in Peano arithmetic

Reports that say that something hasn’t happened are always interesting to me, because as we know, there are known knowns; there are things we know we know. We also know there are known unknowns; that is to say we know there are some things we do not know. But there are also unknown unknowns—the ones we don’t know we don’t know. And if one looks throughout the history of our country and other free countries, it is the latter category that tend to be the difficult ones.

Donald Rumsfeld

Rumsfeld’s statement may have been correct about American politics, but he was wrong in the context of Peano arithmetic. In Peano arithmetic, every known is a known known, and every unknown is an unknown unknown. Let me explain.

A paragraph for background: PA expresses enough arithmetic to allow it to encode sentences like “φ is provable from the axiom set T” as arithmetic properties of the natural numbers. But Peano arithmetic also has nonstandard models containing objects that aren’t natural numbers. It’s possible for a sentence about arithmetic to be true of these nonstandard models and false of the standard models. And in particular, the sentences that Gödel-encode proofs of other sentences can take on a different meaning in nonstandard models. For instance, the Gödel encoding of a sentence like “PA is consistent” looks like “There is no number with the property that it encodes a proof of 0=1 from the axioms of PA”. This might be true of the standard natural numbers, but false in some nonstandard model. There might be some nonstandard number that satisfies Gödel’s arithmetic formula for a proof of 0=1 from the axioms of PA, but which doesn’t actually encode any such proof (as only standard natural numbers Gödel-encode valid proofs). Gödel encodings are only logically equivalent to the statements they encode in the standard model of the natural numbers.

Of all the sentences of the form “φ is provable from the axiom set T”, which are provable from the axioms of Peano arithmetic? How much does Peano arithmetic know about what arbitrary formal theories prove? For all of the following results, I will assume that PA is consistent.

Our first result: If PA proves “φ is provable from T”, then φ really is provable from T.

This follows from the soundness of first-order logic. If PA proves “φ is provable from T”, then this sentence must be true in all models. In particular, it’s true in the standard model. And if it’s true in the standard model, then it must actually be true that φ is provable from T.

Second result: If φ is provable from T, then PA proves “φ is provable from T.”

This follows from three facts: (1) that the standard natural numbers are a subset of every nonstandard model, (2) that the sentence “φ is provable from T” is a statement of the form “There exists a number with some property”, and (3) that first-order logic is complete.

The Gödel encoding of “φ is provable from T” is something like “there’s some number n with the property that n encodes a proof of φ from T.” Now, suppose that φ is provable from T. Then there’s a standard natural number that encodes this proof. And since every nonstandard model contains every natural number, this number exists in all these models as well! So the statement is true in all models. So by completeness, it’s provable from PA.

So far we have that PA proves “φ is provable from T” if and only if φ is actually provable from T. Loosely, if a theory can prove something, then Peano arithmetic knows this. Even though ZFC is a stronger theory than Peano arithmetic, there’s nothing that ZFC can prove that PA doesn’t know it can prove. In particular, ZFC proves the consistency of PA, so PA proves that ZFC proves the consistency of PA! This of course doesn’t mean that PA proves its own consistency, because PA doesn’t trust ZFC to be true (it doesn’t accept the axioms of ZFC).

Furthermore, for every sentence that PA can prove, PA can prove that PA can prove it. In this sense, everything that PA knows is a known known. What we’ll see next is that for PA, every unknown is an unknown unknown. For Peano arithmetic, and in fact for ANY consistent theory that expresses enough arithmetic to be subject to Gödel’s theorems, there are no known unknowns.

Third result: If φ is not provable from PA, then “φ is not provable from PA” is not provable from PA.

This follows from Gödel’s second incompleteness theorem and the principle of explosion.

Now, the proof of our third result. Suppose “φ is not provable from PA” is provable from PA. By the principle of explosion, if PA was consistent then EVERYTHING would be provable from it. So by finding something that isn’t provable from its axioms, PA is able to prove its own consistency! But then, by Gödel’s second incompleteness theorem, PA must be inconsistent. This contradicts our background assumption that PA is consistent.

Note that the sentence “φ is not provable from PA” isn’t the same type of sentence as “φ is provable from PA”. As we saw a moment ago, the second is a sentence of the form “there’s some number n with a particular property,” and all of these sentences are provable from PA if and only if they’re true of the standard naturals. But the first sentence is of the form “there’s no number n with a particular property”, which is not the same! It’s possible for this sentence to be true of the standards but false of the nonstandards. All we need is for there to be no standard numbers and at least one nonstandard number with that property.

So if PA is consistent, then it can never prove that it doesn’t prove something. Now, notice that all that I’ve said applies more generally than just Peano arithmetic. For any consistent mathematical theory that express sufficient arithmetic for Gödel’s incompleteness theorems to apply, every known is a known known and every unknown is an unknown unknown.

The subtlety of Gödel’s second incompleteness theorem

Gödel’s second incompleteness theorem is an order of magnitude more subtle than his first. It’s commonly summarized as “no consistent theory strong enough to do arithmetic can prove its own consistency.” But there’s a lot of subtlety in both the “strong enough to do arithmetic” and the “prove its own consistency.” First of all, what exactly counts as strong enough to do arithmetic? Peano arithmetic certainly does, but it has an infinite axiom schema. Do any finite theories meet the criterion of “strong enough to do arithmetic”? It urns out that the answer to this is yes! Robinson arithmetic, which is what you get if you remove the infinite axiom schema of induction from PA, meets the requirement.

There are weaker theories of arithmetic, like Presburger arithmetic (which has only addition) and Skolem arithmetic (only multiplication), that don’t meet the criterion, and are therefore not subject to the incompleteness theorems. And it turns out that both of these theories are actually decidable! This is even stronger than being sound and complete; soundness and completeness tell us that there’s an algorithm that determines after a finite amount of time that a given sentence is a tautology, but not necessarily that such an algorithm exists to determine that a sentence is NOT a tautology. Decidability gives us both: not only can we classify any tautology as such in a finite time, we can also classify any non-tautology as such in a finite time.)

The amount of arithmetic we need is exactly the amount that Gödel uses to prove the incompleteness theorems. This has two parts: one, the theory must express enough arithmetic to do Gödel encoding (and therefore to express the notion of “provability”), and two, the theory must be able to formalize diagonalization. Dan Willard came up with theories that formalize enough arithmetic to do the first but not the second: these are theories that can talk about their own provability via Gödel coding, but are still too weak to be subject to the incompleteness theorems. Thus, these theories can actually prove their own consistency! These fascinating theories are called self-verifying theories.

Everything I’ve said so far has been about the subtlety of the notion of “strong enough to do arithmetic”. Now I want to talk about the subtlety of the notion of “proving one’s own consistency.” I’ll do this by first taking a brief interlude to talk about an argument I recently saw against the Consistency Thesis in philosophy of math that uses this notion. A friend of mine writes:

Consistency is a weak soundness property.

Here’s what I’ll call the Consistency Thesis: (Mathematical sentences are justified and/or true when they are part of a consistent theory.) What I want to do is to raise a problem for that thesis. Maintaining the Consistency Thesis leads to bizarre mathematical beliefs which don’t lend themselves to being true or justified. A merely ‘consistent’ theory can claim P(0), P(1), and so on, and yet claim that there’s some n for which ¬P(n). Such a theory is omega-inconsistent.

There are mathematical theories which despite their syntactic consistency, claim to be inconsistent. They will claim to be able to derive that 0=1 and yet never derive 0=1. One example of a consistent but unsound theory would be this: [suppose we add to ZFC the new axiom “ZFC is inconsistent”. If ZFC is consistent, then the new theory ZFC’ is consistent (by Godel’s second incompleteness theorem), even though it falsely proves that ZFC is inconsistent.]

So far, I’ve only mentioned omega-inconsistent theories. I should also mention that there are omega-consistent theories which are arithmetically unsound. A much stronger soundness property would be soundness in omega-logic, which entails consistency, omega-consistency, and arithmetical soundness.

Here’s the response I gave:

You say that an adherent to the Consistency Thesis believes in mathematical theories that are in fact consistent (no contradictions can be proved from them) but claim to be inconsistent; for instance ZFC + ¬Con(ZFC). This bit about “claiming to be inconsistent” is subtler than it might initially seem. There’s a very important difference between Con(ZFC) and “ZFC is consistent”. A first order theory can’t talk directly about its own consistency, it can only talk about properties of the objects in its models. We are allowed an indirect method to talk about consistency of theories, Gödel encoding. But this method has problems.

Gödel encoding allows us to write down statements that, if understood to be about the natural numbers, are equivalent to the assertion that a theory proves a contradiction. But this “if understood to be about the natural numbers” is a very important qualification, because no first order theory categorically defines the natural numbers (i.e. no first order theory has as its only model the natural numbers). More generally, no theory within a logic that has a sound, complete, and finitary proof system categorically describes the natural numbers (these are the only logics that a Formalist will see as well-defined, by the way).

What this means is that when we write “Con(ZFC)”, we’re actually using a short-hand for a complicated sentence about the objects in our models, and this complicated sentence is NOT equivalent to the claim that no contradiction can be proven from ZFC. Con(ZFC) could be false in a model even if ZFC is consistent, and Con(ZFC) could be true in a model even if ZFC is inconsistent, so long as that model is not the standard natural numbers.

So the adherent of the Consistency Thesis is not actually committed to weird beliefs about a theory being consistent and claiming its own consistency; they are just committed to the belief that the natural numbers are not well-defined.

The same objection applies to the claim that they have to accept as valid theories that claim P(0), P(1), and so on, but also that there’s some n for which ¬P(n). That’s true, and that’s fine! One can just say that such theories are not theories of the standard natural numbers; the n for which ¬P(n) is some other type of mathematical object that is not a natural number.

A TL;DR for my response: “Con(T)” only means “T is consistent” if T is about the natural numbers. Furthermore, the theories that assert their own inconsistency never have the natural numbers as a model. So it’s ultimately not very weird that these theories assert “¬Con(T)”… this statement doesn’t actually mean “T is inconsistent” in any of the models of the theory!

Undecidability results on lambda diagrams

For background on how lambda diagrams work, look HERE.

The equivalence of lambda calculus with Turing machine computations, and the logical limitations on computation, combine to give us interesting limitative results on lambda diagrams.

Reducibility, Absolute and Contingent

We begin by defining a notion of reducibility. A lambda diagram is reduced if there is no beta reduction that can be applied to the diagram. Here are some examples of reduced diagrams:

And here are some examples of non-reduced diagrams:

For each of these diagrams, if you keep applying beta reductions in any order you will eventually get to a point where the diagram is reduced. But this is not the case for all lambda diagrams. Check out the following diagram for M M (the Mockingbird function applied to itself):

Applying beta reduction gives you the following:

And we’re back where we started! This tells us that our diagram is irreducible.

Sometimes there are multiple beta reductions that can be applied. In these cases, it might be that one sequence of beta reductions reduces the diagram, but another sequence goes forever without reducing it. For instance, here is the diagram for False (M M):

If we first feed (M M) as input to False, then we get a reduced diagram:

But if we first feed M to M, then what we get back is the same thing we started with:

If we keep feeding M to M, we keep getting back the same diagram and never fully reduce. When a diagram reduces on some series of beta reductions and never reduces on others, we call the diagram contingently reducible. If every strategy of beta reduction eventually reduces the diagram, we call it absolutely reducible. And if no strategy reduces the diagram, we call it absolutely irreducible.

Some interesting observations:

It’s possible for two diagrams A and B to both be absolutely reducible, but for A B to be absolutely irreducible.

It’s also possible for a diagram A to be absolutely irreducible, but the diagram A B to be contingently reducible (though never absolutely reducible).

It’s not, however, possible for A to be absolutely irreducible but A B to be absolutely reducible. Why? Simply because you can choose your reduction strategy for A B to be any of the reduction strategies for A, each of which always fails. In general, if any sub-diagram within a lambda diagram is contingently reducible, then the whole diagram cannot be absolutely reducible. Why? Simply because you can choose your reduction strategy for the whole diagram to be a reduction strategy that fails for the sub-diagram (at least one such strategy must exist, by our assumption that the sub-diagram is only contingently reducible). This same reduction strategy that fails for the sub-diagram, also fails for the whole diagram. So the whole diagram cannot be absolutely reducible.

Here’s a question that I don’t know the answer to: Is it possible for an application of of an irreducible diagram to an irreducible diagram to be reducible? I suspect not.

The distinction between contingent and absolute reducibility goes away the moment you fix a general reduction strategy that for any diagram gives a unique prescription for which reduction should be applied (if any are possible). Henceforth, I will assume that a particular reduction strategy has been fixed, and so will only speak of reducibility and irreducibility, no absolutes and contingents.

Reducibility Oracles

It might be convenient if we could always know whether a given diagram reduces or not. We might even wonder if there’s a lambda diagram that “computes” the answer to this question. Let’s suppose there is such a diagram, which we’ll call a “reducibility oracle.” It’s a black box, so we’ll denote its diagram as a box with an R:

We’ll define the behavior of the reducibility behavior as follows:

We’ll now design a larger diagram R’ that uses R:

Let’s see how R’ behaves when given an input (call it F).

Notice that if F F is reducible, then R’ F is irreducible. And if F F is irreducible, then R’ F is reducible.

So what happens if we feed R’ to itself? Well, by the logic of the above paragraph, if R’ R’ is reducible, then R’ R’ is irreducible. And if R’ R’ is irreducible, then R’ R’ is reducible. We have obtained a contradiction! And so no such reducibility oracle diagram can exist.

But wait, there’s more!

Let’s define S(N) to be the size of the smallest lambda diagram that reduces to the Church numeral N, where size means the number of lines required to build the diagram.

There’s a tight correspondence between lambda calculus and programming languages. In fact, lambda calculus can be thought of as just one specific highly abstract functional programming language (along the lines of Haskell and Lisp). Recall now that the Kolmogorov complexity K of a string s (given a particular programming language) is defined as the shortest program that outputs that string. If we choose our programming language to be lambda calculus, then we get a correspondence between K(N) and S(N). This correspondence gives us the following results:

There is no lambda diagram that serves as a “size oracle” – i.e. a lambda diagram that when fed a number N, returns S(N). (Analogous to the uncomputability of Kolmogorov complexity)

For any sound proof system F, there’s a number L such that F cannot prove that the smallest lambda diagram that reduces to any number has more than L lines. (Analogous to Chaitin’s incompleteness theorem).

Sizes of lambda diagrams

Let me make some final notes on the function S(N). Every number has a “standard form”: λfx.f (f … (f x))), with N copies of f. The lambda diagram for this looks like an upside down ladder with N steps:

This has 2N + 3 lines, so we know that for all N, S(N) ≤ 2N + 3. We can define a number as “irreducibly complex” if S(N) = 2N + 3. For instance, 0 and 1 are irreducibly complex, and I think that 2 and 3 are as well:

This raises an interesting question: What is the smallest number that isn’t irreducibly complex? Another question is whether there are arbitrarily large irreducibly complex numbers (I strongly suspect not, but am not positive).

We can find some upper bounds for S(N), where N is a product or a power, as follows:

In general, for any computable function f whatsoever, there is a lambda diagram for f, to which we can append the diagram for any number N to represent f(N). The size of this diagram will just be the size of the diagram for f, plus the size of the diagram for N, plus 1 (for the line connecting the two). This means that if a number M can be written as f(N) for some computable f, then we can produce a diagram for M whose size is some constant + 2N. Thus S(f(N)) grows at most linearly with respect to N, no matter how fast-growing f is.

The Anti-Set Program

In ZFC set theory, we specify a collection of sentences within a first-order language to count as our axioms. The models of this collection of sentences are the set-theoretic universes (and many of these models are “unintended” – pesky perversions in which the set of naturals ω is uncountably large, as one example – but we’ll put this aside for this post). Most of the axioms act as constraints that all sets must follow. For instance, the axiom of pairing says that “For any sets x and y, there must exist another set containing as elements just x and y and nothing else”. This is an axiom that begins with universal quantification over the sets of the universe, and then states some requirement that must hold of all these sets.

The anti-set program is what you get when you take each of these restriction axioms, and negate the restriction it imposes on all sets. So, for instance, the axiom of anti-pairing says that for any sets x and y, there must NOT exist any set {x, y}. Contrast this with the simple negation of pairing, which would tell us only that there exist two sets x and y such that their pair doesn’t exist. The anti-axiom is much stronger than the negated axiom, in that it requires NO pairs to exist.

Not all axioms begin with universal quantifiers, in particular the axiom of infinity, which simply asserts that a set exists that satisfies a certain property. To form the axiom of anti-infinity, we simply negate the original axiom (so that no sets with that property exist).

As it turns out, the anti-set program, if applied to ALL the axioms of ZFC, ends in disaster and paradox. In particular, a contradiction can be derived from anti-comprehension, from anti-replacement, and from anti-extensionality. We don’t handle these cases the same way. Anti-comprehension and anti-replacement are simply discarded, being too difficult to patch. By contrast, anti-extensionality is replaced by ordinary extensionality. What’s up with that? The philosophical justification is simply that extensionality, being the most a priori of the bunch, is needed to justify us calling the objects in our universe sets at all.

There’s one last consideration we must address, which regards the axiom of anti-choice. I am currently uncertain as to whether adding this axiom makes the theory inconsistent. One thing that is currently known about the axiom of anti-choice is that with its addition, out go all finite models (there’s a really pretty proof of this that I won’t include here). In the rest of this post, I will be excluding anti-choice from the axioms and only exploring models of anti-ZF.

With that background behind us, let me list the axioms of anti-ZF.

Anti-Pairing
∀x∀y∀z∃w (w ∈ z ∧ w ≠ x ∧ w ≠ y)
No set is the pair of two others.

Anti-Union
∀x∀y∃z (z ∈ y ∧ ¬∃w (z ∈ w ∧ w ∈ x))
No set is the union of another.

Anti-Powerset
∀x∀y∃z (z ∈ y ∧ ∃w (w ∈ z ∧ w ∉ x))
No set contains all existing subsets of another.

Anti-Foundation
∀x∀y (y ∈ x → ∃z (z ∈ x ∧ z ∈ y))
Every set’s members have at least one element in common with it.

Anti-Infinity
∀x∃y ((y is empty ∧ y ∉ x) ∨ (y ∈ x ∧ ∃z (z = S(y) ∧ z ∉ x)))
Every set either doesn’t contain all empty sets, or has an element whose successor is outside the set.

If the axioms of ZF are considered to be a maximally nice setting for mathematics, then perhaps the axioms of anti-ZF can be considered to be maximally bad for mathematics.

We need to now address some issues involving the axiom of anti-infinity. First, the abbreviations used: “y is empty” is shorthand for “∀z (z ∉ y)” and “z = S(y)” is shorthand for “∀w (w ∈ z ↔ (w ∈ y ∨ w = y))” (i.e. z = y ⋃ {y}).

Second, it turns out that from the other axioms we can prove that there are no empty sets. So the first part of the disjunction is always false, meaning that the second part must always be true. Thus we can simplify the axiom of anti-infinity to the following statement, which is logically equivalent in the context of the other axioms:

Anti-Infinity
∀x∃y (y ∈ x ∧ ∃z (z = S(y) ∧ z ∉ x))
Every set has an element whose successor is outside the set.

Let’s now prove some elementary consequences of these axioms.

Theorems

>> There are no empty sets.
Anti-Union ⊢ ¬∃x∀y (y ∉ x)
Suppose ∃x∀y (y ∉ x). Call this set ∅. So ∀y (y ∉ ∅) Then ⋃∅ = ∅. But this implies that the union of ∅ exists. This contradicts anti-union.

>> There are no one-element sets.
Anti-Pairing ⊢ ¬∃x∃y∀z (z ∈ x ↔ z = y)
Suppose that ∃x∃y∀z (z ∈ x ↔ z = y). Then ∃x∃y∀z (z ∈ x ↔ (z = y ∨ z = y)). But this is a violation of anti-pairing, as then x would be the pair of y and y. Contradiction.

>> There are no two-element sets.
Anti-Pairing ⊢ ¬∃x∃y∃z∀w (w ∈ x ↔ (w = y ∨ w = z))
Suppose that ∃x∃y∃z∀w (w ∈ x ↔ (w = y ∨ w = z)). Then w is the pair of y and z, so anti-pairing is violated. Contradiction.

>> No models of anti-ZF have just N-element sets (for any finite N).
Suppose that a model of anti-ZF had only N-element sets. Take any of these sets and call it X. By anti-infinity, X must contain an element with a successor that is outside the set. Call this element Y and its successor S(Y). Y cannot be its own successor, as then its successor would be inside the set. This means that Y ∉ Y. Also, Y is an N-element set by assumption. But since Y ≠ S(Y), S(Y) must contain all the elements of Y in addition to Y itself. So S(Y) contains N+1 elements. Contradiction.

>> There is no set of all sets.
Suppose that there is such a set, and call it X. By Anti-Infinity, X must contain an element Y with a successor that’s outside of X. But no set is outside of X, by assumption! Contradiction.

>> No N-element model of anti-ZF can have N-1 sets that each contain N-1 elements.
Suppose that this were true. Consider any of these sets that contain N-1 elements, and call it X. By the same argument made two above, this set must contain an element whose successor contains N elements. But there are only N sets in the model, so this is a set of all sets! But we already know that no such set can exist. Contradiction.

>> Every finite set of disjoint sets must be its own choice function.
By a choice function for a set X, I mean a set that contains exactly one element in common with each element of X. Suppose that X is a finite set of disjoint sets. Let’s give the elements of X names: A1, A2, A3, …, AN. By anti-foundation, each An must contain at least one element of X. Since the Ans are disjoint, these elements cannot be the same. Also, since there are only N elements of X, no An can contain more than one element of X. So each element of X contains exactly one element of X. Thus X is a choice function for X!

The next results come from a program I wrote that finds models of anti-ZF of a given size.

>> There are no models of size one, two, three or four.

>> There are exactly two non-isomorphic models of size five.

Here are pictures of the two:

Now, some conjectures! I’m pretty sure of each of these, especially Conjecture 2, but haven’t been able to prove them.

Conjecture 1: There’s always a set that contains itself.

Conjecture 2: There can be no sets of disjoint sets.

Conjecture 3: In an N-element model, there are never less than three sets with fewer than N-1 elements.

Pictures of some extremely large numbers

(Note: in here are spoilers for my puzzle a couple days ago. Also, familiarity with both lambda calculus and lambda diagrams is assumed; here’s a link for the latter.)

In lambda calculus, numbers are sort of like adverbs; they tell you how many times to do something. So “2” is like “twice”; it’s a function that takes in an f and an x and returns f of f of x, written in lambda calculus as f (f x). “3” is like “thrice”; 3 f x = f (f (f x)).

One of the reasons that this turns out to be a really nice definition of numbers is that it makes arithmetic very simple, especially with large numbers. For instance, the operation of taking a number to the power of another number can be defined extremely simply as follows:

↑ = λnm.m n

In other words, we do an m-fold application of the operation “do an n-fold application.”

Once we’ve defined exponentiation, we can pretty easily get tetration too. n ↑↑ m is defined to be n ↑ (n ↑ (… ↑ n)), with m copies of n. But this is just m-fold application of the operation (n ↑) to 1. So we can write:

↑↑ = λnm.m (↑ n) 1

Similarly, for pentation (or hyper-5), we can write:

↑↑↑ = λnm.m (↑↑ n) 1

And in general, for any n, we can write:

N+1 = λnm.m (↑N n) 1

Together with our definition of ↑, this gives us a nice construction of all the hyperoperations:

↑ = λnm.m n
N+1 = λnm.m (↑N n) 1

There’s a problem though. If we actually tried to write out something like ↑100 just in terms of the basic syntax of the lambda calculus, our expression would end up extremely long; we’d have to write something like ↑99, ↑98, ↑97, and so on. We can do better than this by “automating” the process of going from ↑N to ↑N+1 (after all, we’re doing the same thing each time!). So we define a function that implements this process, arbitrarily choosing to name it g:

g = λfnm.m (f n) 1

Some examples of how g behaves:

↑↑= g ↑
↑↑↑ = g (↑↑) = g (g ↑)
↑↑↑↑ = g (↑↑↑) = g (g (g ↑))

You might already see where this is going. Now to get to ↑N+1, we just need to do an N-fold application of g to ↑! So we can write:

N+1 = N g ↑

Now writing out something massive like (3 ↑257 3) merely requires us writing out the functions g, ↑, 3, and 256, each of which is pretty simple. And the end result is much more concise than what we would have gotten using our previous construction.

There are the lambda diagrams for what we have so far:

There’s a bit of redundancy here that can be removed, shrinking the diagrams:

But suppose now we want to go a step further. Let’s define the following extremely fast-growing sequence of numbers:

a0 = 1
a1 = 3 ↑(a0 + 1) 3
a2 = 3 ↑(a1 + 1) 3

Some of you may recognize this sequence as closely resembling the usual construction of Graham’s number. It’s not quite the same as this construction, but Graham’s number is approximately a66. This sequence can be encoded into lambda calculus using the same trick as before: define a new function which takes us from aN to aN+1, and then apply it n-fold to a0. Let’s call this function h.

h = λa.a g ↑ 3 3

Verify for yourself that h takes us from one number in our sequence to the next!

a1 = h a0
a2 = h a1 = h (h a0) = 2 h a0
a3 = h a2 = h (h a1) = h (h (h a0)) = 3 h a0

We can construct a general function now that takes in a number n and returns an:

a = λn.n h a0

We can do a little better than this by removing some redundancy. Notice the repeated 3s in the definition of h; we can built this repetition into our definition of g and get an even more concise representation of massive numbers.

g’ = λfn.n (f n) n
h’ = λa.a g ↑ 3
a’ = λn.n h’ 1

Here’s what these new and improved functions look like:

And finally, here’s an image of a256, much much larger than Graham’s number, to hang on your wall:

Keep in mind that Graham’s number is really really big. From wiki: “it is so large that the observable universe is far too small to contain an ordinary digital representation of Graham’s number, assuming that each digit occupies one Planck volume, possibly the smallest measurable space. But even the number of digits in this digital representation of Graham’s number would itself be a number so large that its digital representation cannot be represented in the observable universe. Nor even can the number of digits of that number—and so forth, for a number of times far exceeding the total number of Planck volumes in the observable universe.”

It’s pretty wild to think that if you start applying beta reductions to this diagram, the lambda diagram you’ll build up (the normal form of the expression) will be so large that it could not be represented within the entire observable universe!

How to draw lambda diagrams

If you don’t want spoilers for my puzzle a few days ago, don’t read ahead!

I think lambda diagrams are extremely cool, and haven’t seen any detailed description on how they work online. I’ll start by showing some very simple examples of lambda diagrams, and then build up to more complicated ones.

First of all, what are lambda diagrams? They are pictorial representations of lambda expressions, and hence count as a pictorial system for a large portion of mathematics. I will assume that you understand how lambda calculus works for this post, and if you aren’t familiar then you can do no better than this video for a basic introduction.

The Basics

Let’s start simple: the lambda expression for “True”: λx.λy.x

Each time we see a new bound variable, we draw a horizontal line to represent that variable. So parsing from left to right, we start with a horizontal line for x.

Next is “λy.“, so we insert another horizontal line for y (below the first horizontal line).

Next we have an x in the function body, which we represent by a vertical line leaving the x line:

And there we have it, the lambda diagram for True!

Now let’s look at False: λx.λy.y

We start the same way as before, with two horizontal lines for our variables x and y:

But now we have a y in our function body, not an x. So we draw our vertical line leaving the y line instead of the x line:

And that’s the diagram for False! The actual diagram needn’t have the labels “x” and “y” in it, because our convention that we draw lines for new variables below existing horizontal lines uniquely specifies which lines stand for which variables.

Let’s now do a slightly more complicated function: λx.λy.y x

As before, we start with our two variables, x and y.

Now we have a y in the function body, so we add a vertical line leaving the y bar:

Next is an x, which is being fed as input to the y. We draw this as follows:

Two things have happened here: First I’ve moved the original vertical line for y over to the left to make space. Next, I’ve represented “feeding x to y as input” as a line leaving the x bar and terminating at the vertical line for y.

Take a guess what λx.λy.x y would look like!

Here it is:

Next, let’s look at λx.λy.x x y (which is incidentally the “or” function).

We start with the introduction of the variables x and y.

Next, an x:

And another x:

And finally a y:

Notice that this y connects below the first connection, to the original branch. What would it mean if it were connected to the second branch?

As you can see, this would indicate that y is first fed to x, and then the result of THAT is fed to x.

With these principles, you should now be able to draw out diagrams for much more complicated lambda expressions. Try this one: λx.λy.λz.z (y x) (y z x) x

Here it is!

Make sure that this makes sense before moving on!

Lambda Expressions within Lambda Expressions

Next, we’ll look at how to deal with lambda expressions that contain new lambda expressions within their function body. Here’s a simple example: λx.λy.x (λz.z) y

Everything up until the third λ is the same as always:

Now, to deal with our new variable z, we draw a new horizontal line. But importantly, this horizontal line comes after the vertical line for the x that has already been used!

After the “λz.” we have a “z“, so we draw a line from the z bar to our original vertical line leaving x.

And finally, after all of this we feed y to the original vertical line:

Try this one: λx.λy.x (λz.z z) (λw.λv.v (w w)) y

And here’s another one, where the inside function uses outside variables: λx.λy.x (λz.y z)

Alright, the final category you need to know to understand how to draw any lambda diagram whatsoever is…

Function Application

Suppose we have the lambda expression (λx.λy.x) (λz.z z). We first draw the lambda diagrams for each of the two component expressions side by side:

And now we simply attach the second to the first, indicating that the entire second lambda expression is fed as input to the first!

Here’s another example for you to try: (λx.x) (λy.λz.z z y) (λw.λv.v (v w))

And one more example, this one using all the tricks we’ve seen so far: (λx.x (λy.λz.z y y) (λw.λv.w (v x))) (λu.u u)

Beta Reduction

The final thing to know about lambda diagrams is how to actually do computations with them. The basic idea is that when we have one lambda diagram fed as input to another, we can substitute the “input” diagram for each vertical line leaving the topmost horizontal bar, and then erase this bar. Let’s look at some examples:

You can also do function application within larger lambda diagrams. Take a look at the following example, which is a calculation that shows that the successor of zero is one:

The first beta reduction here is just like the previous ones we’ve seen. But the second one does a substitution within the main lambda expression, as does the third. This works in much the same way as the earlier reductions we saw, the primary difference being that references to variables outside the scope of the function being applied must be maintained. You can see this in the final step above, where we remove the line representing the variable y, and attach the line connecting to it to the line representing the variable x.

Now for the fun part! I’ve found some great animations of calculations using lambda diagrams on Youtube. Each of them is using just the rules I’ve described above, nothing more! And I must say that the music is delightful. Take a look!

Beautiful!

Fundamentals of Logic: Syntax, Semantics, and Proof

A while back, I wrote a several-part introduction to mathematical logic. I wrote this when I was first wading into the subject and noticed that a lot of the stuff I was learning was not easy to find online in a concise beginner-level essay. Since then, I’ve learned an enormous amount more and become a lot more comfortable with the basics of the field. Looking back on my introduction series, I’m pretty happy with it. There’s nothing in there that’s incorrect. But I do notice that when I read it over, I wish that I had framed certain things slightly differently. In particular, one of the distinctions that I think is absolutely central to mathematical logic – the distinction between syntax, semantics, and proof – is muddled a bit in the write-up, and proof is bundled in with syntax in a way that I’m not totally satisfied with.

Given that, and given that this series is for whatever reason now one of the most popular reads on my blog, I want to provide a brief update post. I sort of did this a week or so ago, with this post, which linked to the slides for a presentation I gave for a discussion group. But I want to do it for REAL now, as an actual post of its own. And honestly, I also just love this subject and am happy for any excuse to write about it more.

So, let me begin by laying out what seems to me to be the primary trichotomy that you’ll see pop up over and over again in logic: syntax, semantics, and proof. One at a time, then.

Syntax

The syntax of a logic is a detailing of the allowed symbols of a language and the conditions for their grammatical use. It’s a set of characters, along with a function that defines the grammar of the language by taking in (finite, in most logics) strings of these characters and returning true or false. This function is usually inductively defined. So for instance in propositional logic we have some set of characters designated to be our propositional variables (we’ll denote them p, q, r, …), and another set of characters designated to be our logical symbols (∧, ∨, ¬, →, and parentheses). We inductively define our grammar function G as follows: Every propositional variable is grammatical. If X and Y are grammatical strings, then so are (¬X), (X ∧ Y), (X ∨ Y), and (X → Y). And no other string is grammatical. This definition is what allows us to say that (p ∧ (¬(q ∨ r))) is grammatical but (p ∧ (¬(q ∨ ¬))) is not. The grammar for propositional logic is especially nice and simple, and first-order logic is only mildly more complicated. The important thing is that the syntax for these logics as well as higher order logics is algorithmically checkable; it’s possible to write a simple program that verifies whether any given input string counts as grammatical or not.

Semantics

The semantics of a logic is a system for assigning some sort of meaning to the grammatical strings of the language. There are different types of meanings that might be assigned, but the primary one for classical logic is truth and falsity. For propositional logic, our semantics is a set of truth functions, which are functions that take in grammatical strings and return either true or false (you’ve encountered these before if you’ve seen truth tables; each row in a truth table corresponds to a particular truth function). Not just any such function will do; these functions will have to satisfy certain constraints, such as that whatever a truth function assigns to a string X, it must assign the opposite value to the string (¬X), and that (A ∧ B) is assigned true if and only if both A and B are assigned true. These constraints on our functions are really what endow our strings with meaning; they induce a sort of structure on the truth values of strings that lines up with our intended interpretation of them.

For propositional logic, we give an inductive definition for our collection of valid truth functions. We construct a truth function by first doing any assignment of truth values to the propositional variables (p, q, r, and so on), and then defining what the function does to the rest of the strings in terms of these truth values. So for any strings X and Y, the truth function assigns true to (¬X) if and only if it assigns false to X. It assigns true to (X ∧ Y) if and only if it assigns true to X and true to Y. And so on. By our construction of the grammar of propositional logic, we’ve guaranteed that our function is defined on all grammatical strings. Of course, there are many such truth functions (one for every way of assigning truth values to the propositional variables) – and this collection of truth functions is what defines our semantics for propositional logic.

First order logic is again more complicated than propositional logic in its semantics, but not hugely so. When talking about the semantics of first-order logic, we call its truth functions models (or structures), and each model comes with a universe of objects that the quantifiers of the logic range over. An important difference between the semantics of propositional logic and the semantics of first order logic is that in the former, we can construct an algorithm that directly searches through the possible truth functions. In the latter, getting a similar sort of direct access to the models is much more difficult. Consider, for instance, that some models will have universes with infinitely many elements, meaning that to verify the truth of a universally quantified statement requires verifying it for infinitely many elements.

Last thing I need to say about semantics before moving on to proof: there is the all-important concept of semantic entailment. A set of sentences (call it A) is said to semantically entail another sentence (call it X), if every truth function that assigns everything in A true also assigns X true. When this is the case, we write: A ⊨ X.

Examples
{(p ∧ q)} ⊨ p
{(p ∨ q), (¬q)} ⊨ p
{p, (p→q), (q→r)} ⊨ r

Proof

Notice that we defined our semantics using complicated words like “functions” and “inductive definition”. More troubling, perhaps, we seem to have engaged in a little bit of circularity when we said in our definition of the semantics that (A ∧ B) is assigned true only when A and B are both assigned true. Suppose that some pesky philosopher were to come up to us and ask us what we meant by “only when A and B are both assigned true”. What would we have to say? We couldn’t refer to our formal definition of the semantics of ∧ to ground our meaning; as this definition uses the concept of “and” in it explicitly! The circularity is even more embarrassingly vivid when in first order logic we’re asked to define the semantics of “∀”, and we say “∀x φ(x) is true whenever φ(x) is true of every object in our model”. In other words, there’s a certain sense in which our semantics is a little hand-wavy and non-rigorous. It relies on the reader already having an understanding of mathematical concepts that are usually much more advanced than the ones being defined, and indeed implicitly relies on the reader’s understanding of the ones being defined in the first place!

With this criticism of semantics fresh in our heads, we might want to ask for a purely mechanical system that would take in some starting strings and churn out new strings that follow from them according to the semantics. Such a system wouldn’t rely on the user’s understanding of any higher-order concepts; instead a mere understanding of the concept of string manipulations would suffice. That is, we want to rigorously define such a procedure so that the results of the procedure mirror the structure that we know the semantics has, but without relying on any high-level concepts or circularity. We can instead just point to the mechanical process by which strings of these symbols are manipulated and be content with that. A proof system is just such a mechanical system.

There are multiple styles of proof systems, varying primarily with respect to how many inference rules and axioms they have. On the side of many axioms and few inference rules we have Hilbert-style systems. And on the side of many inference rules and few (sometimes zero) axioms we have Gentzen-style natural deduction systems. But hold on, what are these “axioms” and “inference rules” I’m suddenly bringing up? Axioms are simply strings that can be used in a proof at any time whatsoever. Inference rules are functions that take in some set of strings and produce new ones. A proof is just an ordered list of strings, where each is either an axiom or a result of applying some inference rule to earlier strings.

For propositional logic, one common proof system involves just three axioms and one inference rule. The three: X → (Y → X), (X → (Y → Z)) → ((X → Y) → (X → Z)), and ((¬Y) → (¬X)) → (X → Y). The inference rule is the famous modus ponens, which takes the form of a function that takes as input X and X → Y, and produces as output Y.

(The discerning amongst you might notice that the “three axioms” are actually three infinite axiom schemas; for each you can populate X, Y, and Z with any three grammatical strings you like and you get a valid instance of the axioms. The even more discerning might notice that these axioms only involve the → and ¬ symbols, but we seem to be missing ∧ and ∨. That’s correct; we neglect them for economy of our proof system and are allowed to do so because all sentences with ∧ and ∨ can be translated into sentences using just → and ¬. To expand our proof system to deal with ∧ and ∨, we simply add a few more axioms to incorporate these translations.)

Soundness and Completeness

I’ve said that the purpose of a proof system is to mimic the semantics of a logic. I will now say more precisely what this mimicry amounts to.

There is a close cousin to the semantic-entailment relation (⊨). This is known as the syntactic entailment relation, and is written ⊢. We say that a set of sentences A syntactically entails a sentence X if there exists some proof that uses the sentences in A as assumptions and concludes with the sentence X. So A ⊢ X can be thought of as “X can be proven, provided that we assume A.”

To say that a proof system for a logic mimics the semantics of the logic is to say that the syntactic entailments and semantic entailments line up perfectly. This has two components:

Soundness: If A ⊢ X, then A ⊨ X.
Completeness: If A ⊨ X, then A ⊢ X.

There’s one other feature of a proof system that’s totally essential if it’s to do the job we want it to do; namely, that it should be actually computable. We should be able to write an algorithm that verifies whether any given sequence of strings counts as a valid proof according to the proof system. This is known as effective enumerability.

To have a sound, complete, and effectively enumerable proof system for a logic is to know that our semantics is on solid ground; that we’re not just waving our hands around and saying things that cannot be cashed out in any concrete way, but that in fact everything we are saying can be grounded in some algorithm. There’s a general philosophy in play here, something like “if I can compute it then it’s real.” We can rest a lot more comfortably if we know that our logic is computable in this sense; that our deductive reasoning can be reduced to a set of clear and precise rules that fully capture its structure.

Perhaps predictably, our next question is whether we really do have a sound, complete, and effectively enumerable proof system for a logic. It turns out that the answer depends on which logic is being discussed. This is already perhaps a little disconcerting; we might have hoped that every logic of any use can be nailed down concretely with a proof system. But alas, it’s not the case.

With that said, there do exist sound, complete, and effectively enumerable proof systems for propositional logic. Maybe that’s not so exciting, since propositional logic is pretty limited in terms of what sort of math you can do with it. But it turns out that sound complete and effective proof systems ALSO exist for first-order logic! And first-order logic is where we form theories like Peano arithmetic (for natural number arithmetic) and ZFC (for set theory).

In other words, there’s a logic within which one can talk about things like numbers and sets, and the semantics of this logic can be made precise with a proof system. That’s great news! Math is on sound grounds, right? Well…

You might be skeptical right now, the phrase “incompleteness theorems” bubbling into your mind, and you’re right to be. While we can form languages in first order logic whose sentences look to be describing natural numbers (in PA) or sets (in ZFC), it turns out that none of these languages have the expressive power to actually pin down these structures uniquely. Said more carefully, there is no first-order collection of sentences which has as a unique model the natural numbers. There’s a theorem called the Löwenheim-Skolem Theorem, which is one of the most shocking results in the field of mathematical logic. It says that if a first-order theory has any infinite model, then it has infinite models of every cardinality. So if we want to describe the natural numbers, we have to also be simultaneously describing structures of every cardinality. At this point, this is getting into well-covered territory for this blog, so I’ll just direct you to these posts for more detail about how this works: here, here, and here.

So, in first order logic you have a proof system that perfectly mirrors your semantics but no theory whose semantics line up with the natural numbers. There are many ways we can strengthen our semantics to give us the strength to talk about the natural numbers categorically, the most popular of which is second-order logic. In second order logic, we are allowed to quantify not only over objects in our universe, but over sets of objects as well. This turns out to be a really big deal. In second-order logic, we can actually write down a set of sentences that uniquely pick out the natural numbers as their only model. But guess what? Second-order logic has no sound and complete proof system!

This is a pattern that holds very generally. Either we have a logic with the expressive power to talk categorically about the natural numbers, in which case we have no sound and complete proof system, OR we have a logic with a sound and complete proof system, in which case we lack the ability to talk categorically about the natural numbers. No matter how we go, we end up at a loss as to how to ground the semantics of ℕ in an algorithmic procedure. And as a result, we also end up unable to have any algorithmic procedure that enumerates all the truths about natural numbers. This is what has become known as the incompleteness phenomenon.

I’ll stop there for now! Here’s a brief summary of the main concepts I’ve discussed. A logic defined by its syntax, semantics, and proof system. The syntax details the allowed symbols of the language and which combinations are grammatical. The semantics details the possible truth assignments to these sentences, consistent with our intended interpretation of the symbols. And the proof system provides a mechanical procedure for deriving new strings from old strings. A set of strings X semantically entails a string Y if every truth assignment that satisfies all the strings in X also satisfies Y. A set of strings X syntactically entails Y if there’s some proof of Y from the assumptions of X. A proof system is said to be sound and complete if its syntactic entailment relation perfectly mirrors the semantic entailment relation. And it’s said to be effectively enumerable if there’s an algorithm for determining whether a given sequence of strings counts as a proof or not. First-order logic has a sound, complete, and effectively enumerable proof system. But it lacks the expressive power to talk categorically about the natural numbers. No set of sentences in a first-order language has the natural numbers as a unique model. Second-order logic has the expressive power to talk categorically about the natural numbers, but it has no sound and complete proof system. In general, no logic has both (1) a sound, complete, and effectively enumerable proof system, and (2) theories that are categorical for the natural numbers.