# Undecidability results on lambda diagrams

For background on how lambda diagrams work, look HERE.

The equivalence of lambda calculus with Turing machine computations, and the logical limitations on computation, combine to give us interesting limitative results on lambda diagrams.

Reducibility, Absolute and Contingent

We begin by defining a notion of reducibility. A lambda diagram is reduced if there is no beta reduction that can be applied to the diagram. Here are some examples of reduced diagrams:

And here are some examples of non-reduced diagrams:

For each of these diagrams, if you keep applying beta reductions in any order you will eventually get to a point where the diagram is reduced. But this is not the case for all lambda diagrams. Check out the following diagram for M M (the Mockingbird function applied to itself):

Applying beta reduction gives you the following:

And we’re back where we started! This tells us that our diagram is irreducible.

Sometimes there are multiple beta reductions that can be applied. In these cases, it might be that one sequence of beta reductions reduces the diagram, but another sequence goes forever without reducing it. For instance, here is the diagram for False (M M):

If we first feed (M M) as input to False, then we get a reduced diagram:

But if we first feed M to M, then what we get back is the same thing we started with:

If we keep feeding M to M, we keep getting back the same diagram and never fully reduce. When a diagram reduces on some series of beta reductions and never reduces on others, we call the diagram contingently reducible. If every strategy of beta reduction eventually reduces the diagram, we call it absolutely reducible. And if no strategy reduces the diagram, we call it absolutely irreducible.

Some interesting observations:

It’s possible for two diagrams A and B to both be absolutely reducible, but for A B to be absolutely irreducible.

It’s also possible for a diagram A to be absolutely irreducible, but the diagram A B to be contingently reducible (though never absolutely reducible).

It’s not, however, possible for A to be absolutely irreducible but A B to be absolutely reducible. Why? Simply because you can choose your reduction strategy for A B to be any of the reduction strategies for A, each of which always fails. In general, if any sub-diagram within a lambda diagram is contingently reducible, then the whole diagram cannot be absolutely reducible. Why? Simply because you can choose your reduction strategy for the whole diagram to be a reduction strategy that fails for the sub-diagram (at least one such strategy must exist, by our assumption that the sub-diagram is only contingently reducible). This same reduction strategy that fails for the sub-diagram, also fails for the whole diagram. So the whole diagram cannot be absolutely reducible.

Here’s a question that I don’t know the answer to: Is it possible for an application of of an irreducible diagram to an irreducible diagram to be reducible? I suspect not.

The distinction between contingent and absolute reducibility goes away the moment you fix a general reduction strategy that for any diagram gives a unique prescription for which reduction should be applied (if any are possible). Henceforth, I will assume that a particular reduction strategy has been fixed, and so will only speak of reducibility and irreducibility, no absolutes and contingents.

Reducibility Oracles

It might be convenient if we could always know whether a given diagram reduces or not. We might even wonder if there’s a lambda diagram that “computes” the answer to this question. Let’s suppose there is such a diagram, which we’ll call a “reducibility oracle.” It’s a black box, so we’ll denote its diagram as a box with an R:

We’ll define the behavior of the reducibility behavior as follows:

We’ll now design a larger diagram R’ that uses R:

Let’s see how R’ behaves when given an input (call it F).

Notice that if F F is reducible, then R’ F is irreducible. And if F F is irreducible, then R’ F is reducible.

So what happens if we feed R’ to itself? Well, by the logic of the above paragraph, if R’ R’ is reducible, then R’ R’ is irreducible. And if R’ R’ is irreducible, then R’ R’ is reducible. We have obtained a contradiction! And so no such reducibility oracle diagram can exist.

But wait, there’s more!

Let’s define S(N) to be the size of the smallest lambda diagram that reduces to the Church numeral N, where size means the number of lines required to build the diagram.

There’s a tight correspondence between lambda calculus and programming languages. In fact, lambda calculus can be thought of as just one specific highly abstract functional programming language (along the lines of Haskell and Lisp). Recall now that the Kolmogorov complexity K of a string s (given a particular programming language) is defined as the shortest program that outputs that string. If we choose our programming language to be lambda calculus, then we get a correspondence between K(N) and S(N). This correspondence gives us the following results:

There is no lambda diagram that serves as a “size oracle” – i.e. a lambda diagram that when fed a number N, returns S(N). (Analogous to the uncomputability of Kolmogorov complexity)

For any sound proof system F, there’s a number L such that F cannot prove that the smallest lambda diagram that reduces to any number has more than L lines. (Analogous to Chaitin’s incompleteness theorem).

Sizes of lambda diagrams

Let me make some final notes on the function S(N). Every number has a “standard form”: λfx.f (f … (f x))), with N copies of f. The lambda diagram for this looks like an upside down ladder with N steps:

This has 2N + 3 lines, so we know that for all N, S(N) ≤ 2N + 3. We can define a number as “irreducibly complex” if S(N) = 2N + 3. For instance, 0 and 1 are irreducibly complex, and I think that 2 and 3 are as well:

This raises an interesting question: What is the smallest number that isn’t irreducibly complex? Another question is whether there are arbitrarily large irreducibly complex numbers (I strongly suspect not, but am not positive).

We can find some upper bounds for S(N), where N is a product or a power, as follows:

In general, for any computable function f whatsoever, there is a lambda diagram for f, to which we can append the diagram for any number N to represent f(N). The size of this diagram will just be the size of the diagram for f, plus the size of the diagram for N, plus 1 (for the line connecting the two). This means that if a number M can be written as f(N) for some computable f, then we can produce a diagram for M whose size is some constant + 2N. Thus S(f(N)) grows at most linearly with respect to N, no matter how fast-growing f is.

# Fractal Trees

Consider the following simple algorithm:

We start with a single painted particle moving upwards at a decreasing velocity, with a trajectory that is slowly rotating at a constant velocity. At each step, there is some small probability for the particle to “split” into two particles, one of which begins rotating in the opposite direction. And if a particle’s trajectory is ever about to start turning downwards, it instead reverses its angular velocity and begins turning upwards.

Those are all the rules. So what sort of shapes does this incredibly simple algorithm generate? Take a look!    It might be unbelievable that such a simple algorithm could generate such complex and organic shapes, but you can try it for yourself: the images above are produced by only 25 lines of code! And of course, it makes one very curious about whether there is a real connection between this algorithm and the way that plants actually grow.

# Can chess be solved?

There are some chess positions in which one player can force a win. Here’s an extremely simple example: White just moves their queen up to a8 and checkmates Black. Some winning positions are harder to see than this. Take a look at the following position. Can you find a guaranteed two-move checkmate for White? And for fun, here’s a harder one, again a guaranteed two-move checkmate, this time by Black: Notice that in this last one, the opponent had multiple possible moves to choose from. A forced mate does not necessarily mean restricting your opponent to exactly one move on each of their turns. It just means that no matter what they do, you can still guarantee a win. Forced wins can become arbitrarily complicated and difficult to see if you’re looking many moves down the line, as you have to consider all the possible responses your opponent has at each turn. The world record for the longest forced win is the following position: It’s White’s move, and White does have a strategy for a forced win. It just takes 549 turns to actually do this! (This strategy does violate the 50-move rule, which says that after 50 turns with no pawn moves or capture the game is drawn.) At this link you can watch the entire 549 move game. Most of it is totally incomprehensible to human players, and apparently top chess players that look at this game have reported that the reasoning behind the first 400 moves is opaque to them. Interestingly, White gets a pawn promotion after six moves, and it promotes it to a knight instead of a queen! It turns out that promoting to a queen actually loses for White, and their only way to victory is the knight promotion!

This position is the longest forced win with 7 pieces on the board. There are a few others that are similarly long. All of them represent a glimpse at the perfect play we might expect to see if a hypercomputer could calculate the whole game tree for chess and select the optimal move.

A grandmaster wouldn’t be better at these endgames than someone who had learned chess yesterday. It’s a sort of chess that has nothing to do with chess, a chess that we could never have imagined without computers. The Stiller moves are awesome, almost scary, because you know they are the truth, God’s Algorithm – it’s like being revealed the Meaning of Life, but you don’t understand one word.

Tim Krabbe

With six pieces on the board, the longest mate takes 262 moves (you can play out this position here). For five pieces, it’s 127 moves, for four it’s 43 moves, and the longest 3-man mate takes 28 moves. But now a natural question arises. We know that a win can be forced in some positions. But how about the opening position? That is, is there a guaranteed win for White (or for Black) starting in this position? Said more prosaically: Can chess be solved?

Zermelo’s theorem, published in “On an Application of Set Theory to the Theory of the Game of Chess” (1913), was the first formal theorem in game theory. It predated the work of von Neumann (the so-called “father of game theory”) by 15 years. It proves that yes, it is in fact possible to solve chess. We don’t know what the solution is, but we know that either White can force a win, or Black can force a win, or the result will be a draw if both play perfectly.

Of course, the guarantee that in principle there is a solution to chess doesn’t tell us much in practice. The exponential blowup in the number of possible games is so enormous that humans will never find this solution. Nonetheless, I still find it fascinating to think that the mystery of chess is ultimately a product of computational limitations, and that in principle, if we had a hypercomputer, we could just find the unique best chess game and watch it play out, either to a win by one side or to a draw. That would be a game that I would love to see.

✯✯✯

Here’s another fun thing. There’s an extremely bizarre variant on chess called suicide chess (or anti-chess). The goal of suicide chess is to lose all your pieces. Of course, if all the rules of play were the same, it would be practically impossible to win (since your opponent could always just keep refusing to take a piece that you are offering them). To remedy this, in suicide chess, capturing is mandatory! And if you have multiple possible captures, then you can choose among them.

Suicide chess gameplay is extremely complicated and unusual looking, and evaluating who is winning at any given moment tends to be really difficult, as sudden turnarounds are commonplace compared to ordinary chess. But one simplifying factor is that it tends to be easier to restrict your opponents’ moves. In ordinary chess, you can only restrict your opponents’ moves by blocking off their pieces or threatening their king. But in suicide chess, your opponents’ moves are restricted ANY time you put one of your pieces in their line of fire! This feature of the gameplay makes the exponential blow up in possible games more manageable.

Given this, it probably won’t be much of a surprise that suicide chess is, just like ordinary chess, in principle solvable. But here’s the crazy part. Suicide chess is solved!!

That’s right: it was proven a triple of years ago that White can force a win by moving first with e3! Here’s the paper. The proof amounts to basically running a program that looks at all possible responses to e3 and expands out the game tree, ultimately showing that all branches can be terminated with White losing all pieces and winning the game.

Not only do we know that by starting with e3, White is guaranteed a win, we also know that Black can force a win if White starts with any of the following moves: a3, b4, c3, d3, d4, e4, f3, f4, h3, h4, Nc3, Nf3. As far as I was able to tell, there are only six opening moves remaining for which we don’t know if White wins, Black wins, or they draw: a4, b3, c4, e3, g3, and g4.

✯✯✯

Alright, final chess variant trivia. Infinite chess is just chess, but played on an infinite board. There’s a mind-blowing connection between infinite chess and mathematical logic. As a refresher, a little while back I discussed the first-order theory of Peano arithmetic. This is the theory of natural numbers with addition and multiplication. If you recall, we found that Peano arithmetic was incomplete (in that not all first-order sentences about the natural numbers can be proven from its axioms). First order PA is also undecidable, in that there exists no algorithm that takes in a first order sentence and returns whether it is provable from the axioms. (In fact, first order logic in general is undecidable! To get decidability, you have to go to a weaker fragment of first order logic known as monadic predicate calculus, in which predicates take only one argument and there are no functions. As soon as you introduce a single binary predicate, you lose decidability.)

Okay, so first order PA (the theory of natural numbers with addition and multiplication) is incomplete and undecidable. But there are weaker fragments of first order PA that are decidable! Take away multiplication, and you have Presburger arithmetic, the theory of natural numbers with addition. Take away addition, and you have Skolem arithmetic, the theory of natural numbers with multiplication. Both of these fragments are significantly weaker than Peano arithmetic (each is unable to prove general statements about the missing operation, like that multiplication is commutative for Presburger arithmetic). But in exchange for this weakness, you get both decidability and completeness!

How does all this relate to infinite chess? Well, consider the problem of determining whether there exists a checkmate in n turns from a given starting position. This seems like a really hard problem, because unlike in ordinary chess, now it’s possible for there to be literally infinite possible moves for a given player from a position. (For instance, a queen on an empty diagonal can move to any of the infinite locations on this diagonal.) So apparently, the game tree for infinite chess, in general, branches infinitely. Given this, we might expect that this problem is not decidable.

Well, it turns out that any instance of this problem (any particular board setup, with the question of whether there’s a mate-in-n for one of the players) can be translated into a sentence in Presburger arithmetic. You do this by translating a position into a fixed length sequence of natural numbers, where each piece is given a sequence of numbers indicating its type and location. The possibility of attacks can be represented as equations about these numbers. And since the distance pieces (bishops, rooks, and queens – those that have in general an infinite number of available moves) all move in straight lines, there are simple equations expressible in Presburger arithmetic that describe whether these pieces can attack other pieces! From the attack relations, you can build up more complicated relations, including the mate-in-n relation.

So we have a translation from the mate-in-n problem to a sentence in Presburger arithmetic. But Presburger arithmetic is decidable! So there must also be a decision procedure for the mate-in-n problem in infinite chess. And not only is there a decision procedure for the mate-in-n problem, but there’s an algorithm that gives the precise strategy that achieves the win in the fewest number of moves!

Here’s the paper in which all of this is proven. It’s pretty wild. Many other infinite chess problems can be proven to be decidable by the same method (demonstrating interpretability of the problem in Presburger arithmetic). But interestingly, not all of them! This has a lot to do with the limitations of first-order logic. The question of whether, in general, there is a forced win from a given position can not be shown to be decidable in this way. (This relates to the general impossibility in first-order logic of expressing infinitely long statements. Determining whether a given position is a winning position for a given player requires looking at the mate-in-n problem, but without any upper bound on what this n is – on how many moves the win may take.) It’s not even clear whether the winning-position problem can be phrased in first-order arithmetic, or whether it requires going to second-order!

The paper takes this one step further. This proof of the decidability of the mate-in-n problem for infinite chess doesn’t crucially rest upon the two-dimensionality of the chess board. We could easily translate the proof to a three-dimensional board, just by changing the way we code positions! So in fact, we have a proof that the mate-in-n problem for k-dimensional infinite chess is decidable! I’ll leave you with this infinite chess puzzle: It’s White’s turn. Can they guarantee a checkmate in 12 moves or less?

# For Loops and Bounded Quantifiers in Lambda Calculus

Lambda calculus is an extremely minimal language that is powerful enough to express every computation that a Turing machine can do. Here is a fantastic video explaining the basic “rules of the game”:

Figuring out how to translate programs into lambda calculus can be a challenging puzzle. I recently discovered a nice way to use lambda calculus to generate for loops and bounded quantifiers, thus allowing recursion without requiring the Y Combinator. I’m sure similar things have been discovered by others, but things feel cooler than they are when you’ve found them, so I’ll present this function here in this post. 🙂

First, let me lay out some of the most important basic functions in lambda calculus. (A lot of this will be rehashing bits of the video lecture.)

Useful Tools Propositional Logic To give you a sense of how proving things works in lambda calculus, here’s a quick practice proof that ¬¬T is the same thing as T. We’ll prove this by reducing ¬¬T to T, which will show us that the two functions are extensionally equivalent (they return the same values on all inputs). Each step, we will either be substituting in the definition of some symbol, or evaluating a function by substituting its inputs in. Another more convoluted approach is to show that the function (¬¬T ↔ T) reduces to the function T, which is to say that the two are equivalent. Notice that we’re not saying that the expression (¬¬T ↔ T) “has the truth value True”. There is no such thing as expressions with truth values in lambda calculus, there are just functions, and some functions are equivalent to the True function.

To shorten this proof, I’ll start out by proving the equivalence of ¬T and F, as well as the equivalence of ¬F and T. This will allow me to translate between these functions in one step. Got the hang of it? Ok, on to the natural numbers!

Natural Numbers In lambda calculus, numbers are adverbs. 1 is not “one”, it’s “once”. 1 is a function that takes in a function f and tells you to apply it one time. 2 is “twice”; it’s a function that tells you to apply f twice. And so on.

Defining things this way allows us to have beautifully simple definitions of addition, multiplication, and exponentiation. For instance, the successor of n is just what you get by one more application of f on top of n applications of f. And n plus m is just n S m, because this means that S (the successor function) should be applied n times to m. See if you can see for yourself why the definitions of multiplication and exponentiation make sense (and how they’re different! It’s not just the order of the n and m!).

Let’s prove that 1 + 1 = 2 using lambda calculus! One more: Here’s a proof that 0 + k = k. Interestingly, formally proving that k + 0 = k is significantly more convoluted, and there’s no obvious way to do it in general for all k (as opposed to producing a separate proof for each possible value of k). In addition, the proof length will end up being the size of k.

Okay, let’s dive deeper by looking at our first non-trivial lambda calculus data structure, the pair.

Pair Data Structure The pair function can be fed two values (a and b) which can then be referenced at a later time by feeding it one more function. Feeding the function either of T or F will just select either the first or the second element in the pair. It might not be immediately obvious, but having this ability to, as it were, store functions “in memory” for later reference gives us enormous power. In fact, at this point we have everything we need to introduce the magical function which opens the door to quantification and recursion:

Magic Function All that Φ does is take a pair of functions (a, b) to a new pair (f(a, b), g(a, b)). Here it’s written in more familiar notation:

Φ: (a, b)  (f(a, b), g(a, b))

Different choices of f and g give us very different types of behavior when we repeatedly apply Φ. First of all, let’s use Φ to subtract and compare numbers.

Subtraction and Comparison φ1 simply chooses f and g to get the following behavior:

φ1: (a, b)  (b, b+1)

Here’s what happens with repeated application of φ1 to the pair (0, 0):

(0, 0) → (0, 1) → (1, 2) → (2, 3) → …

Looking at this, you can see how n applications of φ1 to (0, 0) gives the pair (n – 1, n), which justifies our definition of the predecessor function.

Our power is enhancing every minute; now, we create for loops!

For Loops and Quantifiers B S F is the composition of successor and True, so it takes two inputs, fetches the value of the first input, and then adds one to it. Since B S T is the first input to Φ, it will be the function that determines the value of the first member of the output pair. In other words, φ2 gives the following mapping:

φ2: (a, b)  (a + 1, f(a, b))

To get a for loop, we now just iterate this function the appropriate number of times!

The for function I’ve written takes in four parameters (n, m, f, a), and is equivalent to the following program: The  function is equivalent to the following:

𝑥 ∈ [n, m-1] θ(𝑥)

In code, this is: And the  function is equivalent to:

𝑥 ∈ [n, m-1] θ(𝑥) With these powerful tools in hand, we can now define more interesting functions, like one that detects if an input number is prime! Let’s use lambda calculus to see if 2 is prime! That got a little messy, but it’s nothing compared to the computation of whether 3 is prime! This might sound strange, but I actually find it amazing how short and simple this is. To be fair, I skipped steps if all that they did was substitute in the definition of a function, instead opting to just immediately apply the definition and cut the number of steps in half. The full proof would actually be twice as long!

But nonetheless, try writing a full proof of the primality of 3 using only ZFC set theory in anywhere near as few lines! It’s interesting to me that a language as minimal and bare-bones as that of lambda calculus somehow manages to produce such concise proofs of interesting and complicated statements.

# How to sort a list of numbers

It’s funny how some of the simplest questions you can think of actually have interesting and nontrivial answers.

Take this question, for instance. If I hand you a list of N numbers, how can you quickly sort this list from smallest to largest? Suppose that you are only capable of sorting two numbers at a time.

You might immediately think of some obvious ways to solve this problem. For instance, you could just start at the beginning of the list and repeatedly traverse it, swapping numbers if they’re in the wrong order. This will guarantee that each pass-through, the largest number out of those not yet sorted will “bubble up” to the top. This algorithm is called Bubble Sort.

The problem is that for a sizable list, this will take a long long time. On average, for a list of size N, Bubble Sort takes O(N^2) steps to finish sorting. Can you think of a faster algorithm?

It turns out that we can go much faster with some clever algorithms – from O(N^2) to O(N logN). If you have a little time to burn, here are some great basic videos describing and visualizing different sorting techniques:

## Visualizations

Beautiful visualization of Bubble, Shell, and Quick Sort

(I’d be interested to know why the visualization of Bubble Sort in the final frame gave a curve of unsorted values that looked roughly logarithmic)

Visualization of 9 sorting algorithms

Auditory presentation of 15 sorting algorithms

## Bonus clip…

Cool proof of the equivalence of Selection Sort and Insertion Sort

## Searching a list

By the way, how about if you have a sorted list and want to find a particular value in it? If we’re being dumb, we could just ignore the fact that our list came pre-sorted and search through every element on the list in order. We’d find our value in O(N). We can go much faster by just starting in the middle of our list, looking at the size of the middle value to determine which half of the list to look at next, and then repeating with the appropriate half of the list. This is called binary search and takes O(logN), a massive speedup.

Now, let’s look back at the problem of finding a particular value in an unsorted list. Can we think of any techniques to accomplish this more quickly than O(N)?

No. Try as you might, as long as the list has no structure for you to exploit, your optimal performance will be limited to O(N).

Or so everybody thought until quantum mechanics showed up and blew our minds.

It turns out that a quantum computer would be able to solve this exact problem – searching through an unsorted list – in only O(√N) steps!

This should seem impossible to you. If a friend hands you a random jumbled list of 10,000 names and asks you to locate one particular name on the list, you’re going to end up looking through 5,000 names on average. There’s no clever trick you can use to speed up the process. Except that quantum mechanics says you’re wrong! In fact, if you were a quantum computer, you’d only have to search through ~100 names to find your name.

This quantum algorithm is called Grover’s algorithm, and I want to write up a future post about it. But that’s not even the coolest part! It turns out that if a non-local hidden variable interpretation of quantum mechanics is correct, then a quantum computer could search an N-item database in O(∛N)! You can imagine the triumphal feeling of the hidden variables people if one day they managed to build a quantum computer that simultaneously proved them right and broke the record for search algorithm speed!