Can chess be solved?

There are some chess positions in which one player can force a win. Here’s an extremely simple example:

Chess Easy win

White just moves their queen up to a8 and checkmates Black. Some winning positions are harder to see than this. Take a look at the following position. Can you find a guaranteed two-move checkmate for White?

Chess 2100 Puzzle

And for fun, here’s a harder one, again a guaranteed two-move checkmate, this time by Black:

Chess 2498 Puzzle

Notice that in this last one, the opponent had multiple possible moves to choose from. A forced mate does not necessarily mean restricting your opponent to exactly one move on each of their turns. It just means that no matter what they do, you can still guarantee a win. Forced wins can become arbitrarily complicated and difficult to see if you’re looking many moves down the line, as you have to consider all the possible responses your opponent has at each turn. The world record for the longest forced win is the following position:

549-move endgame

It’s White’s move, and White does have a strategy for a forced win. It just takes 549 turns to actually do this! (This strategy does violate the 50-move rule, which says that after 50 turns with no pawn moves or capture the game is drawn.) At this link you can watch the entire 549 move game. Most of it is totally incomprehensible to human players, and apparently top chess players that look at this game have reported that the reasoning behind the first 400 moves is opaque to them. Interestingly, White gets a pawn promotion after six moves, and it promotes it to a knight instead of a queen! It turns out that promoting to a queen actually loses for White, and their only way to victory is the knight promotion!

This position is the longest forced win with 7 pieces on the board. There are a few others that are similarly long. All of them represent a glimpse at the perfect play we might expect to see if a hypercomputer could calculate the whole game tree for chess and select the optimal move.

A grandmaster wouldn’t be better at these endgames than someone who had learned chess yesterday. It’s a sort of chess that has nothing to do with chess, a chess that we could never have imagined without computers. The Stiller moves are awesome, almost scary, because you know they are the truth, God’s Algorithm – it’s like being revealed the Meaning of Life, but you don’t understand one word.

Tim Krabbe

With six pieces on the board, the longest mate takes 262 moves (you can play out this position here). For five pieces, it’s 127 moves, for four it’s 43 moves, and the longest 3-man mate takes 28 moves.

Longest n move mates.png

But now a natural question arises. We know that a win can be forced in some positions. But how about the opening position? That is, is there a guaranteed win for White (or for Black) starting in this position?

Screen Shot 2019-11-09 at 2.16.41 AM.png

Said more prosaically: Can chess be solved?

Zermelo’s theorem, published in “On an Application of Set Theory to the Theory of the Game of Chess” (1913), was the first formal theorem in game theory. It predated the work of von Neumann (the so-called “father of game theory”) by 15 years. It proves that yes, it is in fact possible to solve chess. We don’t know what the solution is, but we know that either White can force a win, or Black can force a win, or the result will be a draw if both play perfectly.

Of course, the guarantee that in principle there is a solution to chess doesn’t tell us much in practice. The exponential blowup in the number of possible games is so enormous that humans will never find this solution. Nonetheless, I still find it fascinating to think that the mystery of chess is ultimately a product of computational limitations, and that in principle, if we had a hypercomputer, we could just find the unique best chess game and watch it play out, either to a win by one side or to a draw. That would be a game that I would love to see.

✯✯✯

Here’s another fun thing. There’s an extremely bizarre variant on chess called suicide chess (or anti-chess). The goal of suicide chess is to lose all your pieces. Of course, if all the rules of play were the same, it would be practically impossible to win (since your opponent could always just keep refusing to take a piece that you are offering them). To remedy this, in suicide chess, capturing is mandatory! And if you have multiple possible captures, then you can choose among them.

Suicide chess gameplay is extremely complicated and unusual looking, and evaluating who is winning at any given moment tends to be really difficult, as sudden turnarounds are commonplace compared to ordinary chess. But one simplifying factor is that it tends to be easier to restrict your opponents’ moves. In ordinary chess, you can only restrict your opponents’ moves by blocking off their pieces or threatening their king. But in suicide chess, your opponents’ moves are restricted ANY time you put one of your pieces in their line of fire! This feature of the gameplay makes the exponential blow up in possible games more manageable.

Given this, it probably won’t be much of a surprise that suicide chess is, just like ordinary chess, in principle solvable. But here’s the crazy part. Suicide chess is solved!!

That’s right: it was proven a triple of years ago that White can force a win by moving first with e3!

Screen Shot 2019-11-09 at 2.17.03 AM

Here’s the paper. The proof amounts to basically running a program that looks at all possible responses to e3 and expands out the game tree, ultimately showing that all branches can be terminated with White losing all pieces and winning the game.

Not only do we know that by starting with e3, White is guaranteed a win, we also know that Black can force a win if White starts with any of the following moves: a3, b4, c3, d3, d4, e4, f3, f4, h3, h4, Nc3, Nf3. As far as I was able to tell, there are only six opening moves remaining for which we don’t know if White wins, Black wins, or they draw: a4, b3, c4, e3, g3, and g4.

✯✯✯

Alright, final chess variant trivia. Infinite chess is just chess, but played on an infinite board.

Infinite_chess.png

There’s a mind-blowing connection between infinite chess and mathematical logic. As a refresher, a little while back I discussed the first-order theory of Peano arithmetic. This is the theory of natural numbers with addition and multiplication. If you recall, we found that Peano arithmetic was incomplete (in that not all first-order sentences about the natural numbers can be proven from its axioms). First order PA is also undecidable, in that there exists no algorithm that takes in a first order sentence and returns whether it is provable from the axioms. (In fact, first order logic in general is undecidable! To get decidability, you have to go to a weaker fragment of first order logic known as monadic predicate calculus, in which predicates take only one argument and there are no functions. As soon as you introduce a single binary predicate, you lose decidability.)

Okay, so first order PA (the theory of natural numbers with addition and multiplication) is incomplete and undecidable. But there are weaker fragments of first order PA that are decidable! Take away multiplication, and you have Presburger arithmetic, the theory of natural numbers with addition. Take away addition, and you have Skolem arithmetic, the theory of natural numbers with multiplication. Both of these fragments are significantly weaker than Peano arithmetic (each is unable to prove general statements about the missing operation, like that multiplication is commutative for Presburger arithmetic). But in exchange for this weakness, you get both decidability and completeness!

How does all this relate to infinite chess? Well, consider the problem of determining whether there exists a checkmate in n turns from a given starting position. This seems like a really hard problem, because unlike in ordinary chess, now it’s possible for there to be literally infinite possible moves for a given player from a position. (For instance, a queen on an empty diagonal can move to any of the infinite locations on this diagonal.) So apparently, the game tree for infinite chess, in general, branches infinitely. Given this, we might expect that this problem is not decidable.

Well, it turns out that any instance of this problem (any particular board setup, with the question of whether there’s a mate-in-n for one of the players) can be translated into a sentence in Presburger arithmetic. You do this by translating a position into a fixed length sequence of natural numbers, where each piece is given a sequence of numbers indicating its type and location. The possibility of attacks can be represented as equations about these numbers. And since the distance pieces (bishops, rooks, and queens – those that have in general an infinite number of available moves) all move in straight lines, there are simple equations expressible in Presburger arithmetic that describe whether these pieces can attack other pieces! From the attack relations, you can build up more complicated relations, including the mate-in-n relation.

So we have a translation from the mate-in-n problem to a sentence in Presburger arithmetic. But Presburger arithmetic is decidable! So there must also be a decision procedure for the mate-in-n problem in infinite chess. And not only is there a decision procedure for the mate-in-n problem, but there’s an algorithm that gives the precise strategy that achieves the win in the fewest number of moves!

Here’s the paper in which all of this is proven. It’s pretty wild. Many other infinite chess problems can be proven to be decidable by the same method (demonstrating interpretability of the problem in Presburger arithmetic). But interestingly, not all of them! This has a lot to do with the limitations of first-order logic. The question of whether, in general, there is a forced win from a given position can not be shown to be decidable in this way. (This relates to the general impossibility in first-order logic of expressing infinitely long statements. Determining whether a given position is a winning position for a given player requires looking at the mate-in-n problem, but without any upper bound on what this n is – on how many moves the win may take.) It’s not even clear whether the winning-position problem can be phrased in first-order arithmetic, or whether it requires going to second-order!

The paper takes this one step further. This proof of the decidability of the mate-in-n problem for infinite chess doesn’t crucially rest upon the two-dimensionality of the chess board. We could easily translate the proof to a three-dimensional board, just by changing the way we code positions! So in fact, we have a proof that the mate-in-n problem for k-dimensional infinite chess is decidable!

I’ll leave you with this infinite chess puzzle:

Screen Shot 2019-11-09 at 3.09.47 AM.png

It’s White’s turn. Can they guarantee a checkmate in 12 moves or less?

 

For Loops and Bounded Quantifiers in Lambda Calculus

Lambda calculus is an extremely minimal language that is powerful enough to express every computation that a Turing machine can do. Here is a fantastic video explaining the basic “rules of the game”:

Figuring out how to translate programs into lambda calculus can be a challenging puzzle. I recently discovered a nice way to use lambda calculus to generate for loops and bounded quantifiers, thus allowing recursion without requiring the Y Combinator. I’m sure similar things have been discovered by others, but things feel cooler than they are when you’ve found them, so I’ll present this function here in this post. 🙂

First, let me lay out some of the most important basic functions in lambda calculus. (A lot of this will be rehashing bits of the video lecture.)

Useful Tools

Screen Shot 2019-10-27 at 7.53.17 PM

Propositional Logic

Screen-Shot-2019-10-27-at-7.53.27-PM-980891701-1572224379237.png

To give you a sense of how proving things works in lambda calculus, here’s a quick practice proof that ¬¬T is the same thing as T. We’ll prove this by reducing ¬¬T to T, which will show us that the two functions are extensionally equivalent (they return the same values on all inputs). Each step, we will either be substituting in the definition of some symbol, or evaluating a function by substituting its inputs in.

Screen Shot 2019-10-27 at 7.53.35 PM

Another more convoluted approach is to show that the function (¬¬T ↔ T) reduces to the function T, which is to say that the two are equivalent. Notice that we’re not saying that the expression (¬¬T ↔ T) “has the truth value True”. There is no such thing as expressions with truth values in lambda calculus, there are just functions, and some functions are equivalent to the True function.

To shorten this proof, I’ll start out by proving the equivalence of ¬T and F, as well as the equivalence of ¬F and T. This will allow me to translate between these functions in one step.

Screen Shot 2019-10-27 at 8.08.25 PM.png

Got the hang of it? Ok, on to the natural numbers!

Natural Numbers

Screen-Shot-2019-10-27-at-7.54.02-PM.png

In lambda calculus, numbers are adverbs. 1 is not “one”, it’s “once”. 1 is a function that takes in a function f and tells you to apply it one time. 2 is “twice”; it’s a function that tells you to apply f twice. And so on.

Defining things this way allows us to have beautifully simple definitions of addition, multiplication, and exponentiation. For instance, the successor of n is just what you get by one more application of f on top of n applications of f. And n plus m is just n S m, because this means that S (the successor function) should be applied n times to m. See if you can see for yourself why the definitions of multiplication and exponentiation make sense (and how they’re different! It’s not just the order of the n and m!).

Let’s prove that 1 + 1 = 2 using lambda calculus!

Screen Shot 2019-10-27 at 7.54.16 PM

One more: Here’s a proof that 0 + k = k.

Screen Shot 2019-10-27 at 7.54.27 PM

Interestingly, formally proving that k + 0 = k is significantly more convoluted, and there’s no obvious way to do it in general for all k (as opposed to producing a separate proof for each possible value of k). In addition, the proof length will end up being the size of k.

Okay, let’s dive deeper by looking at our first non-trivial lambda calculus data structure, the pair.

Pair Data Structure

Screen Shot 2019-10-27 at 10.32.25 PM.png

The pair function can be fed two values (a and b) which can then be referenced at a later time by feeding it one more function. Feeding the function Screen Shot 2019-10-27 at 7.46.48 PM.pngeither of T or F will just select either the first or the second element in the pair. It might not be immediately obvious, but having this ability to, as it were, store functions “in memory” for later reference gives us enormous power. In fact, at this point we have everything we need to introduce the magical function which opens the door to quantification and recursion:

Magic Function

Screen Shot 2019-10-27 at 7.54.50 PM

All that Φ does is take a pair of functions (a, b) to a new pair (f(a, b), g(a, b)). Here it’s written in more familiar notation:

    Φ: (a, b)  (f(a, b), g(a, b))

Different choices of f and g give us very different types of behavior when we repeatedly apply Φ. First of all, let’s use Φ to subtract and compare numbers.

Subtraction and Comparison

Screen Shot 2019-10-27 at 8.18.26 PM

φ1 simply chooses f and g to get the following behavior:

    φ1: (a, b)  (b, b+1)

Here’s what happens with repeated application of φ1 to the pair (0, 0):

(0, 0) → (0, 1) → (1, 2) → (2, 3) → …

Looking at this, you can see how n applications of φ1 to (0, 0) gives the pair (n – 1, n), which justifies our definition of the predecessor function.

Our power is enhancing every minute; now, we create for loops!

For Loops and Quantifiers

Screen Shot 2019-10-27 at 8.18.02 PM

B S F is the composition of successor and True, so it takes two inputs, fetches the value of the first input, and then adds one to it. Since B S T is the first input to Φ, it will be the function that determines the value of the first member of the output pair. In other words, φ2 gives the following mapping:

φ2: (a, b)  (a + 1, f(a, b))

To get a for loop, we now just iterate this function the appropriate number of times!

The for function I’ve written takes in four parameters (n, m, f, a), and is equivalent to the following program:

Screen Shot 2019-11-07 at 12.04.45 PM

The  function is equivalent to the following:

    𝑥 ∈ [n, m-1] θ(𝑥)

In code, this is:

Screen Shot 2019-10-27 at 7.55.31 PM

And the  function is equivalent to:

   𝑥 ∈ [n, m-1] θ(𝑥)

Screen Shot 2019-10-27 at 8.05.21 PM.png

With these powerful tools in hand, we can now define more interesting functions, like one that detects if an input number is prime!

Screen Shot 2019-10-27 at 7.55.52 PM

Let’s use lambda calculus to see if 2 is prime!

Screen Shot 2019-10-27 at 8.17.40 PM

That got a little messy, but it’s nothing compared to the computation of whether 3 is prime!

Screen Shot 2019-10-27 at 8.17.26 PM

This might sound strange, but I actually find it amazing how short and simple this is. To be fair, I skipped steps if all that they did was substitute in the definition of a function, instead opting to just immediately apply the definition and cut the number of steps in half. The full proof would actually be twice as long!

But nonetheless, try writing a full proof of the primality of 3 using only ZFC set theory in anywhere near as few lines! It’s interesting to me that a language as minimal and bare-bones as that of lambda calculus somehow manages to produce such concise proofs of interesting and complicated statements.

How to sort a list of numbers

It’s funny how some of the simplest questions you can think of actually have interesting and nontrivial answers.

Take this question, for instance. If I hand you a list of N numbers, how can you quickly sort this list from smallest to largest? Suppose that you are only capable of sorting two numbers at a time.

You might immediately think of some obvious ways to solve this problem. For instance, you could just start at the beginning of the list and repeatedly traverse it, swapping numbers if they’re in the wrong order. This will guarantee that each pass-through, the largest number out of those not yet sorted will “bubble up” to the top. This algorithm is called Bubble Sort.

The problem is that for a sizable list, this will take a long long time. On average, for a list of size N, Bubble Sort takes O(N^2) steps to finish sorting. Can you think of a faster algorithm?

It turns out that we can go much faster with some clever algorithms – from O(N^2) to O(N logN). If you have a little time to burn, here are some great basic videos describing and visualizing different sorting techniques:

Comparisons of Merge, Quick, Heap, Bubble and Insertion Sort

Visualizations

Beautiful visualization of Bubble, Shell, and Quick Sort

(I’d be interested to know why the visualization of Bubble Sort in the final frame gave a curve of unsorted values that looked roughly logarithmic)

Visualization of 9 sorting algorithms

Auditory presentation of 15 sorting algorithms

Bonus clip…

Cool proof of the equivalence of Selection Sort and Insertion Sort

Searching a list

By the way, how about if you have a sorted list and want to find a particular value in it? If we’re being dumb, we could just ignore the fact that our list came pre-sorted and search through every element on the list in order. We’d find our value in O(N). We can go much faster by just starting in the middle of our list, looking at the size of the middle value to determine which half of the list to look at next, and then repeating with the appropriate half of the list. This is called binary search and takes O(logN), a massive speedup.

Now, let’s look back at the problem of finding a particular value in an unsorted list. Can we think of any techniques to accomplish this more quickly than O(N)?

No. Try as you might, as long as the list has no structure for you to exploit, your optimal performance will be limited to O(N).

Or so everybody thought until quantum mechanics showed up and blew our minds.

It turns out that a quantum computer would be able to solve this exact problem – searching through an unsorted list – in only O(√N) steps! 

This should seem impossible to you. If a friend hands you a random jumbled list of 10,000 names and asks you to locate one particular name on the list, you’re going to end up looking through 5,000 names on average. There’s no clever trick you can use to speed up the process. Except that quantum mechanics says you’re wrong! In fact, if you were a quantum computer, you’d only have to search through ~100 names to find your name.

This quantum algorithm is called Grover’s algorithm, and I want to write up a future post about it. But that’s not even the coolest part! It turns out that if a non-local hidden variable interpretation of quantum mechanics is correct, then a quantum computer could search an N-item database in O(∛N)! You can imagine the triumphal feeling of the hidden variables people if one day they managed to build a quantum computer that simultaneously proved them right and broke the record for search algorithm speed!