Can chess be solved?

There are some chess positions in which one player can force a win. Here’s an extremely simple example:

Chess Easy win

White just moves their queen up to a8 and checkmates Black. Some winning positions are harder to see than this. Take a look at the following position. Can you find a guaranteed two-move checkmate for White?

Chess 2100 Puzzle

And for fun, here’s a harder one, again a guaranteed two-move checkmate, this time by Black:

Chess 2498 Puzzle

Notice that in this last one, the opponent had multiple possible moves to choose from. A forced mate does not necessarily mean restricting your opponent to exactly one move on each of their turns. It just means that no matter what they do, you can still guarantee a win. Forced wins can become arbitrarily complicated and difficult to see if you’re looking many moves down the line, as you have to consider all the possible responses your opponent has at each turn. The world record for the longest forced win is the following position:

549-move endgame

It’s White’s move, and White does have a strategy for a forced win. It just takes 549 turns to actually do this! (This strategy does violate the 50-move rule, which says that after 50 turns with no pawn moves or capture the game is drawn.) At this link you can watch the entire 549 move game. Most of it is totally incomprehensible to human players, and apparently top chess players that look at this game have reported that the reasoning behind the first 400 moves is opaque to them. Interestingly, White gets a pawn promotion after six moves, and it promotes it to a knight instead of a queen! It turns out that promoting to a queen actually loses for White, and their only way to victory is the knight promotion!

This position is the longest forced win with 7 pieces on the board. There are a few others that are similarly long. All of them represent a glimpse at the perfect play we might expect to see if a hypercomputer could calculate the whole game tree for chess and select the optimal move.

A grandmaster wouldn’t be better at these endgames than someone who had learned chess yesterday. It’s a sort of chess that has nothing to do with chess, a chess that we could never have imagined without computers. The Stiller moves are awesome, almost scary, because you know they are the truth, God’s Algorithm – it’s like being revealed the Meaning of Life, but you don’t understand one word.

Tim Krabbe

With six pieces on the board, the longest mate takes 262 moves (you can play out this position here). For five pieces, it’s 127 moves, for four it’s 43 moves, and the longest 3-man mate takes 28 moves.

Longest n move mates.png

But now a natural question arises. We know that a win can be forced in some positions. But how about the opening position? That is, is there a guaranteed win for White (or for Black) starting in this position?

Screen Shot 2019-11-09 at 2.16.41 AM.png

Said more prosaically: Can chess be solved?

Zermelo’s theorem, published in “On an Application of Set Theory to the Theory of the Game of Chess” (1913), was the first formal theorem in game theory. It predated the work of von Neumann (the so-called “father of game theory”) by 15 years. It proves that yes, it is in fact possible to solve chess. We don’t know what the solution is, but we know that either White can force a win, or Black can force a win, or the result will be a draw if both play perfectly.

Of course, the guarantee that in principle there is a solution to chess doesn’t tell us much in practice. The exponential blowup in the number of possible games is so enormous that humans will never find this solution. Nonetheless, I still find it fascinating to think that the mystery of chess is ultimately a product of computational limitations, and that in principle, if we had a hypercomputer, we could just find the unique best chess game and watch it play out, either to a win by one side or to a draw. That would be a game that I would love to see.


Here’s another fun thing. There’s an extremely bizarre variant on chess called suicide chess (or anti-chess). The goal of suicide chess is to lose all your pieces. Of course, if all the rules of play were the same, it would be practically impossible to win (since your opponent could always just keep refusing to take a piece that you are offering them). To remedy this, in suicide chess, capturing is mandatory! And if you have multiple possible captures, then you can choose among them.

Suicide chess gameplay is extremely complicated and unusual looking, and evaluating who is winning at any given moment tends to be really difficult, as sudden turnarounds are commonplace compared to ordinary chess. But one simplifying factor is that it tends to be easier to restrict your opponents’ moves. In ordinary chess, you can only restrict your opponents’ moves by blocking off their pieces or threatening their king. But in suicide chess, your opponents’ moves are restricted ANY time you put one of your pieces in their line of fire! This feature of the gameplay makes the exponential blow up in possible games more manageable.

Given this, it probably won’t be much of a surprise that suicide chess is, just like ordinary chess, in principle solvable. But here’s the crazy part. Suicide chess is solved!!

That’s right: it was proven a triple of years ago that White can force a win by moving first with e3!

Screen Shot 2019-11-09 at 2.17.03 AM

Here’s the paper. The proof amounts to basically running a program that looks at all possible responses to e3 and expands out the game tree, ultimately showing that all branches can be terminated with White losing all pieces and winning the game.

Not only do we know that by starting with e3, White is guaranteed a win, we also know that Black can force a win if White starts with any of the following moves: a3, b4, c3, d3, d4, e4, f3, f4, h3, h4, Nc3, Nf3. As far as I was able to tell, there are only six opening moves remaining for which we don’t know if White wins, Black wins, or they draw: a4, b3, c4, e3, g3, and g4.


Alright, final chess variant trivia. Infinite chess is just chess, but played on an infinite board.


There’s a mind-blowing connection between infinite chess and mathematical logic. As a refresher, a little while back I discussed the first-order theory of Peano arithmetic. This is the theory of natural numbers with addition and multiplication. If you recall, we found that Peano arithmetic was incomplete (in that not all first-order sentences about the natural numbers can be proven from its axioms). First order PA is also undecidable, in that there exists no algorithm that takes in a first order sentence and returns whether it is provable from the axioms. (In fact, first order logic in general is undecidable! To get decidability, you have to go to a weaker fragment of first order logic known as monadic predicate calculus, in which predicates take only one argument and there are no functions. As soon as you introduce a single binary predicate, you lose decidability.)

Okay, so first order PA (the theory of natural numbers with addition and multiplication) is incomplete and undecidable. But there are weaker fragments of first order PA that are decidable! Take away multiplication, and you have Presburger arithmetic, the theory of natural numbers with addition. Take away addition, and you have Skolem arithmetic, the theory of natural numbers with multiplication. Both of these fragments are significantly weaker than Peano arithmetic (each is unable to prove general statements about the missing operation, like that multiplication is commutative for Presburger arithmetic). But in exchange for this weakness, you get both decidability and completeness!

How does all this relate to infinite chess? Well, consider the problem of determining whether there exists a checkmate in n turns from a given starting position. This seems like a really hard problem, because unlike in ordinary chess, now it’s possible for there to be literally infinite possible moves for a given player from a position. (For instance, a queen on an empty diagonal can move to any of the infinite locations on this diagonal.) So apparently, the game tree for infinite chess, in general, branches infinitely. Given this, we might expect that this problem is not decidable.

Well, it turns out that any instance of this problem (any particular board setup, with the question of whether there’s a mate-in-n for one of the players) can be translated into a sentence in Presburger arithmetic. You do this by translating a position into a fixed length sequence of natural numbers, where each piece is given a sequence of numbers indicating its type and location. The possibility of attacks can be represented as equations about these numbers. And since the distance pieces (bishops, rooks, and queens – those that have in general an infinite number of available moves) all move in straight lines, there are simple equations expressible in Presburger arithmetic that describe whether these pieces can attack other pieces! From the attack relations, you can build up more complicated relations, including the mate-in-n relation.

So we have a translation from the mate-in-n problem to a sentence in Presburger arithmetic. But Presburger arithmetic is decidable! So there must also be a decision procedure for the mate-in-n problem in infinite chess. And not only is there a decision procedure for the mate-in-n problem, but there’s an algorithm that gives the precise strategy that achieves the win in the fewest number of moves!

Here’s the paper in which all of this is proven. It’s pretty wild. Many other infinite chess problems can be proven to be decidable by the same method (demonstrating interpretability of the problem in Presburger arithmetic). But interestingly, not all of them! This has a lot to do with the limitations of first-order logic. The question of whether, in general, there is a forced win from a given position can not be shown to be decidable in this way. (This relates to the general impossibility in first-order logic of expressing infinitely long statements. Determining whether a given position is a winning position for a given player requires looking at the mate-in-n problem, but without any upper bound on what this n is – on how many moves the win may take.) It’s not even clear whether the winning-position problem can be phrased in first-order arithmetic, or whether it requires going to second-order!

The paper takes this one step further. This proof of the decidability of the mate-in-n problem for infinite chess doesn’t crucially rest upon the two-dimensionality of the chess board. We could easily translate the proof to a three-dimensional board, just by changing the way we code positions! So in fact, we have a proof that the mate-in-n problem for k-dimensional infinite chess is decidable!

I’ll leave you with this infinite chess puzzle:

Screen Shot 2019-11-09 at 3.09.47 AM.png

It’s White’s turn. Can they guarantee a checkmate in 12 moves or less?


There’s a problem with infinity

Last post I described the Ross-Littlewood paradox, in which an ever-expanding quantity of numbered billiard balls are placed into a cardboard box in such a way that after an infinite number of steps the box ends up empty. Here’s a version of this paradox:

Process 1
Step 1: Put 1 through 9 into the box.
Step 2: Take out 1, then put 10 through 19 into the box.
Step 3: Take out 2, then put 20 through 29 into the box.
Step 4: Take out 3, then put 30 through 39 into the box.
And so on.

Box contents after each step
Step 1: 1 through 9
Step 2: 2 through 19
Step 3: 3 through 29
Step 4: 4 through 39
And so on.

Now take a look at a similar process, where instead of removing balls from the box, we just change the number that labels them (so, for example, we paint a 0 after the 1 to turn “Ball 1” to “Ball 10″).

Process 2
Step 1: Put 1 through 9 into the box
Step 2: Change 1 to 10, then put 11 through 19 into the box.
Step 3: Change 2 to 20, then put 21 through 29 in.
Step 3: Change 3 to 30, then put 31 through 39 in.
And so on.

Box contents after each step
Step 1: 1 through 9
Step 2: 2 through 19
Step 3: 3 through 29
Step 4: 4 through 39
And so on.

Notice that the box contents are identical after each step. If that’s all that you are looking at (and you are not looking at what the person is doing during the step), then the two processes are indistinguishable. And yet, Process 1 ends with an empty box, and Process 2 ends with infinitely many balls in the box!

Why does Process 2 end with an infinite number of balls in it, you ask?

Process 2 ends with infinitely many balls in the box, because no balls are ever taken out. 1 becomes 10, which later becomes 100 becomes 1000, and so on forever. At infinity you have all the natural numbers, but with each one appended an infinite number of zeros.

So apparently the method you use matters, even when two methods provably get you identical results! There’s some sort of epistemic independence principle being violated here. The outputs of an agent’s actions should be all that matters, not the specific way in which the agent obtains those outputs! Something like that.

Somebody might respond to this: “But the outputs of the actions aren’t the same! In Process 1, each step ten are added and one removed, whereas in Process 2, each step nine are added. This is the same with respect to the box, but not with respect to the rest of the universe! After all, those balls being removed in Process 1 have to go somewhere. So somewhere in the universe there’s going to be a big pile of discarded balls, which will not be there in Process 2.

This responds holds water as long as our fictional universe doesn’t violate conservation of information, as if not, these balls can just vanish into thin air, leaving no trace of their existence. But that rebuttal feels cheap. Instead, let’s consider another variant that gets at the same underlying problem of “relevance of things that should be irrelevant”, but avoids this problem.

Process 1 (same as before)
Step 1: Put 1 through 9 into the box.
Step 2: Take out 1, then put 10 through 19 into the box.
Step 3: Take out 2, then put 20 through 29 into the box.
Step 4: Take out 3, then put 30 through 39 into the box.
And so on.

Box contents after each step
Step 1: 1 through 9
Step 2: 2 through 19
Step 3: 3 through 29
Step 4: 4 through 39
And so on.


Process 3
Step 1: Put 1 through 9 into the box.
Step 2: Take out 9, then put 10 through 19 into the box.
Step 3: Take out 19, then put 20 through 29 into the box.
Step 4: Take out 29, then put 30 through 39 into the box.
And so on.

Box contents after each step
Step 1: 1 through 9
Step 2: 1 to 8, 10 to 19
Step 3: 1 to 8, 10 to 18, 20 to 29
Step 4: 1 to 8, 10 to 18, 20 to 28, 30 to 39
And so on

Okay, so as I’ve written it, the contents of each box after each step are different in Processes 1 and 3. Just one last thing we need to do: erase the labels on the balls. The labels will now just be stored safely inside our minds as we look over the balls, which will be indistinguishable from one another except in their positions.

Now we have two processes that look identical at each step with respect to the box, AND with respect to the external world. And yet, the second process ends with an infinite number of balls in the box, and the first with none! (Every number that’s not one less than a multiple of ten will be in there.) It appears that you have to admit that the means used to obtain an end really do matter.

But it’s worse than this. You can arrange things so that you can’t tell any difference between the two processes, even when observing exactly what happens in each step. How? Well, if the labelling is all in your heads, then you can switch around the labels you’ve applied without doing any harm to the logic of the thought experiment. So let’s rewrite Process 3, but fill in both the order of the balls in the box and the mental labelling being used:

Process 3
Start with:
1 2 3 4 5 6 7 8 9
Mentally rotate labels to the right:
9 1 2 3 4 5 6 7 8
Remove the furthest left ball:
1 2 3 4 5 6 7 8
Add the next ten balls to the right in increasing order:
1 2 3 4 5 6 7 8 10 11 12 13 14 15 16 17 18 19

Compare this to Process 1, supposing that it’s done without any relabelling:

Process 1
Start with:
1 2 3 4 5 6 7 8 9
Remove the furthest left ball:
2 3 4 5 6 7 8 9
Add the next tell balls to the right in increasing order:
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 18 19

If the labels are all in your head, then these two processes are literally identical except for how a human being is thinking about them.

But looking at Process 3, you can prove that after Step 1 there will always be a ball labelled 1 in the box. Same with 2, 3, 4, and all other numbers that are not a multiple of 10 minus one. Even though we remove an infinity of balls, there are ball numbers that are never removed. And if we look at the pile of discarded balls, we’ll see that it consists of 9, 19, 29, 39, and so on, but none of the others. Unless some ball numbers vanish in the process (which they never do!), all the remainders must still be sitting in the box!

So we have two identical-in-every-relevant-way processes, one of which ends with an infinite number of balls in the box and the other with zero. Do you find this troubling? I find this very troubling. If we add some basic assumption that an objective reality exists independent of our thoughts about it, then we’ve obtained a straightforward contradiction.


Notice that it’s not enough to say “Well, in our universe this process could never be completed.” This is for two reasons:

First of all, it’s actually not obvious that supertasks (tasks involving the completion of an infinite number of steps in a finite amount of time) cannot be performed in our universe. In fact, if space and time are continuous, then every time you wave your hand you are completing a sort of supertask.

You can even construct fairly physically plausible versions of some of the famous paradoxical supertasks. Take the light bulb that blinks on and off at intervals that get shorter and shorter, such that after some finite duration it has blinked an infinity of times. We can’t say that the bulb is on at the end (as that would seem to imply that the sequence 0101010… had a last number) or that it is off (for much the same reason). But these are the only two allowed states of the bulb! (Assume the bulb is robust against bursting and all the other clever ways you can distract from the point of the thought experiment.)

Now, here’s a variant that seems fairly physically reasonable:

A ball is dropped onto a conductive plate that is attached by wire to a light bulb. The ball is also wired to the bulb, so that when the ball contacts the plate, a circuit is completed that switches the light bulb on. Each bounce, the ball loses some energy to friction, cutting its velocity exactly in half. This means that after each bounce, the ball hangs in the air for half as long as it did the previous bounce.

Suppose the time between the first and second bounce was 1 second. Then the time between the second and third will be .5 seconds. And next will be .25 seconds. And so on. At 2 seconds, the ball will have bounced an infinite number of times. So at 2 seconds, the light bulb will have switched on and off an infinite number of times.

And of course, at 2 seconds the ball is at rest on the plate, completing the circuit. So at 2 seconds, upon the completion of the supertask, the light will be on.

Notice that there are no infinite velocities here, or infinite quantities of energy. Just ordinary classical mechanics applied to a bouncing ball and a light bulb. What about infinite accelerations? Well even that is not strictly speaking necessary; we just imagine that each velocity reversal takes some amount of time, which shrinks to zero as the velocity shrinks to zero in such a way as to keep all accelerations finite and sum to a finite total duration.

All this is just to say that we shouldn’t be too hasty in dismissing the real-world possibility of apparently paradoxical supertasks.

But secondly, and more importantly, physical possibility is not the appropriate barometer of whether we should take a thought experiment seriously. Don’t be the person that argues that the fat man wouldn’t be sufficient to stop a trolley’s momentum. When we find that some intuitive conceptual assumptions lead us into trouble, the takeaway is that we need to closely examine and potentially revise our concepts!

Think about Russell’s paradox, which showed that some of our most central intuitions about the concept of a set lead us to contradiction. Whether or not the sets that Bertie was discussing can be pointed to in the physical world is completely immaterial to the argument. Thinking otherwise would have slowed down progress in axiomatic set theory immensely!

These thought experiments are a problem if you believe that it is logically possible for there to be a physical universe in which these setups are instantiated. That’s apparently all that’s required to get a paradox, not that the universe we live in happens to be that one.

So it appears that we have to conclude some limited step in the direction of finitism, in which we rule out a priori the possibility of a universe that allows these types of supertasks. I’m quite uncomfortable with this conclusion, for what it’s worth, but I don’t currently see a better option.

A Supertask Puzzle

The Puzzle

You have in front of you an empty box. You also have on hand an infinite source of billiard balls, numbered 0, 1, 2, 3, 4, and so on forever.

At time zero, you place balls 0 and 1 in the box.

In thirty minutes, you remove ball 0 from the box, and place in two new balls (2 and 3).

Fifteen minutes after that, you remove ball 1 from the box, and place in two new balls (4 and 5).

7.5 minutes after that, you remove ball 2 and place in balls 6 and 7.

And so on.

Untitled document (3)

After an hour, you will have taken an infinite number of steps. How many billiard balls will be in the box?


At time zero, the box contains two balls (0 and 1). After thirty minutes, it contains three (1, 2, and 3). After 45 minutes, it contains four (2, 3, 4, and 5). You can see where this is going…

Naively taking the limit of this process, we arrive at the conclusion that the box will contain an infinity of balls.

But hold on. Ask yourself the following question: If you think that the box contains an infinity of balls, name one ball that’s in there. Go ahead! Give me a single number such that at the end of this process, the ball with that number is sitting in the box.

The problem is that you cannot do this. Every single ball that is put in at some step is removed at some later step. So for any number you tell me, I can point you to the exact time at which that ball was removed from the box, never to be returned to it!

But if any ball that you can name can be proven to not be in the box.. and every ball you put in there was named… then there must be zero balls in the box at the end!

In other words, as time passes and you get closer and closer to the one-hour mark, the number of balls in the box appears to be growing, more and more quickly each moment, until you hit the one-hour mark. At that exact moment, the box suddenly becomes completely empty. Spooky, right??

Let’s make it weirder.

What if at each step, you didn’t just put in two new balls, but one MILLION? So you start out at time zero by putting balls 0, 1, 2, 3, and so on up to 1 million into the empty box. After thirty minutes, you take out ball 1, but replace it with the next 1 million numbered balls. And at the 45-minute mark, you take out ball 2 and add the next 1 million.

What’ll happen now?

Well, the exact same argument we gave initially applies here! Any ball that is put in the box at any point, is also removed at a later point. So you literally cannot name any ball that will still be in the box after the hour is up, because there are no balls left in the box! The magic of infinity doesn’t care about how many more balls you’ve put in than removed at any given time, it still delivers you an empty box at the end!

Now, here’s a final variant. What if, instead of removing the smallest numbered ball each step, you removed the largest numbered ball?

So, for instance, at the beginning you put in balls 0 and 1. Then at thirty minutes you take out ball 1, and put in balls 2 and 3. At 45 minutes, you take out ball 3, and put in balls 4 and 5. And so on, until you hit the one hour mark. Now how many balls are there in the box?


Infinity! Why not zero like before? Well, because now I can name you an infinity of numbers whose billiard balls are still guaranteed to be in the box when the hour’s up. Namely, 0, 2, 4, 6, and all the other even numbered balls are still going to be in there.

Take a moment to reflect on how bizarre this is. We removed the exact same number of balls each step as we did last time. All that changed is the label on the balls we removed! We could even imagine taking off all the labels so that all we have are identical plain billiard balls, and just labeling them purely in our minds. Now apparently the choice of whether to mentally label the balls in increasing or decreasing order will determine whether at the end of the hour the box is empty or packed infinitely full. What?!? It’s stuff like this that makes me sympathize with ultrafinitists.

One final twist: what happens if the ball that we remove each step is determined randomly? Then how many balls will there be once the hour is up? I’ll leave it to you all to puzzle over!

Can an irrational number raised to an irrational power be rational?

There’s a wonderful proof that yes, indeed, this is possible, and it goes as follows:

Let’s consider the number \sqrt 2 ^ {\sqrt 2} . This number is either rational or irrational. Let’s examine each case.

Case 1: \sqrt 2 ^ {\sqrt 2}  is rational.

Recall that \sqrt 2  is irrational. So if \sqrt 2 ^ {\sqrt 2}  is rational, then we have proven that it’s possible to raise an irrational number to an irrational power and get a rational value. Done!

Case 2: \sqrt 2 ^ {\sqrt 2}  is irrational.

In this case, \sqrt 2 ^ {\sqrt 2}  and \sqrt 2  are both irrational numbers. So what if we raise the \sqrt 2 ^ {\sqrt 2}  to the power of \sqrt 2 ?

\left( \sqrt 2 ^ {\sqrt 2} \right) ^{\sqrt 2} =  \sqrt 2 ^ {\sqrt 2 \cdot \sqrt 2} = \sqrt 2 ^ 2 = 2

So in this case, we have again found a pair of irrational numbers such that one raised to the power of the other is a rational number! Proof complete!


One thing that’s fun about this proof is that the result is pretty surprising. I would not have guessed a priori that you could get a rational by raising one irrational to another; it just seems like irrationality is the type of thing that would be closed under ordinary arithmetic operations.

But an even cooler thing is that it’s a non-constructive proof. By the end of the proof, we know for sure that there is a pair of irrational numbers such that one raised to the other gives us a rational number, but we have no idea whether it’s (\sqrt 2 , \sqrt 2 ) or (\sqrt 2 ^ {\sqrt 2} ,\sqrt 2 ).

(It turns out that it’s the second. The Gelfond–Schneider theorem tells us that for any two non-zero algebraic numbers a and b with a ≠ 1 and b irrational, the number ab is irrational. So \sqrt 2 ^ {\sqrt 2}  is in fact irrational.)

Now, most mathematicians are totally fine with non-constructive proofs, as long as they follow all the usual rules of proofs. But there is a branch of mathematics known as constructive mathematics that only accepts constructive proofs of existence. Within constructive mathematics, this proof is not valid! 

Now, it so happens that you can prove the irrationality of \sqrt 2 ^ {\sqrt 2}  by purely constructive means, but that’s besides the point. To my eyes, the refusal to accept such an elegant and simple proof because it asserts a number’s existence without telling us exactly what it is just looks a little silly!

Along similar lines, here’s one more fun problem.

Are \pi + e and \pi - e transcendental?

We know that \pi and e are both transcendental numbers (i.e. they cannot be expressed as the roots of any polynomial with rational coefficients). But are \pi + e and \pi - e both transcendental?

It turns out that this amazingly simple sounding problem is unsolved to this day! But one thing that we do know is that it can’t be that neither of them are transcendental. Because if this was the case, then their sum (\pi + e) + (\pi - e) = 2 \pi would also not be transcendental, which we know is false! So we know that at least one of them has to be true, using a proof that doesn’t guarantee the truth of either of them! Cool, right?

The Weirdest Consequence of the Axiom of Choice

This post is about the most startling puzzle I’ve ever heard. First, two warm-up hat puzzles.

Four prisoners, two black hats, two white hats

Four prisoners are in a line, with the front-most one hidden from the rest behind a wall. Each is wearing a hat, but they can’t see its color. There are two black hats and two white hats, and all four know this.


As soon as any prisoner figures out the color of their own hat, they must announce it. If they’re right, everybody goes free. They cannot talk to each other or look any direction but forwards.

Are they guaranteed freedom?


Yes, they are! If D sees the same color hat on B and C’s heads, then he can conclude his own hat’s color is the other, so everybody goes free. If he sees differently colored hats, then he cannot conclude his hat color. But C knows this, so if D doesn’t announce his hat color, then C knows that his hat color is different from B’s and they all go free. Done!


Ten prisoners, unknown number of black and white hats


Each prisoner is randomly assigned a hat. The number of black and white hats is unknown. Starting from the back, each must guess their hat color. If it matches they’re released, and if not then they’re killed on the spot. They can coordinate beforehand, but cannot exchange information once the process has started.

There is a strategy that gives nine of the prisoners 100% chance of survival, and the other a 50% chance. What is it?


A10 counts the number of white hats in front of him. If it’s odd, he says ‘white’. Otherwise he says ‘black’. This allows each prisoner to learn their own hat color once they hear the prisoner behind them.

— — —

Alright, now that you’re all warmed up, let’s make things a bit harder.

Countably infinite prisoners, unknown number of black and white hats, no hearing

There are a countable infinity of prisoners lined up with randomly assigned hats. They each know their position in line.


The hat-guessing starts from A1 at the back of the line. Same consequences as before: the reward for a right guess is freedom and the punishment for a wrong guess is death.

The prisoners did have a chance to meet and discuss a plan beforehand, but now are all deaf. Not only can they not coordinate once the guessing begins, but they also have no idea what happened behind them in the line.

The puzzle for you is: Can you find a strategy that ensures that only finitely many prisoners are killed?

Oh boy, the prisoners are in a pretty tough spot here. We’ll give them a hand; let’s allow them logical omniscience and the ability to memorize an uncountable infinity of information. Heck, let’s also give them an oracle to compute any function they like.

Give it a try!




Amazingly, the answer is yes. All but a finite number of prisoners can be saved.

Here’s the strategy. First start out by identifying white hats with the number 0, and black hats with the number 1. Now the set of all possible sequences of hats in the line is the set of all binary sequences. We define an equivalence relation on such sequences as follows: 𝑥 ~ 𝑦 if and only if 𝑥 and 𝑦 are identical after a finite number of digits in the sequence. This will partition all possible sequences into equivalence classes.

Hat Puzzle (2)

For example, the equivalence class of 0 will just be the subset of the rationals whose binary expansion ends at some point (i.e. the subset of the rationals that can be written as an integer over a power of 2). Why? Well, if a binary sequence 𝑥 is equivalent to .000000…, then after a finite number of digits of 𝑥, it will have to be all 0s forever. And this means that it can be written as some integer over a power of 2.

When the prisoners meet up beforehand, they use the axiom of choice to select one representative from each equivalence class. (Quick reminder: the axiom of choice says that for any set 𝑥 of nonempty disjoint sets, you can form a new set that shares exactly one element with each of the sets in 𝑥.) Now each prisoner is holding in their head an uncountable set of sequences, each one of which represents an equivalence class.

Hat Puzzle (3)

Once they’re in the room, every prisoner can see all but a finite number of hats, and therefore they know exactly which equivalence class the sequence of hats belongs to. So each prisoner guesses their hat color as if they were in the representative sequence from the appropriate equivalence class. Since the actual sequence and the representative sequence differ in only finitely many places (all at the start), all entries are going to be the same after some finite number of prisoners. So every single prisoner after this first finite batch will be saved!


This result is so ridiculous that it actually makes me crack up thinking about it. There is surely some black magic going on here. Remember, each prisoner can see all the hats in front of them, but they know nothing about the total number of hats of each color, so there is no correlation between the hats they see and the hat on their head. And furthermore, they are deaf! So they can’t learn any new information from what’s going on behind them! They literally have no information about the color of their own hat. So the best that each individual could do must be 50/50. Surely, surely, this means that there will be an infinite number of deaths.

But nope! Not if you accept the axiom of choice! You are guaranteed only a finite number of deaths, just a finite number that can be arbitrarily large. How is this not a contradiction? Well, for it to be a contradiction, there has to be some probability distribution over the possible outcomes which says that Pr(finite deaths) = 0. And it turns out that the set of representative sequences form a non-measurable set (a set which cannot have a well-defined size using the ordinary Lebesgue measure). So no probability can be assigned to it (not zero probability, literally undefined)! Now remember that zero deaths occur exactly when the real sequence is one of the representative sequences. This means that no probability can be assigned to this state of affairs. The same thing applies to the state of affairs in which one prisoner dies, or two prisoners, and so on. You literally cannot define a probability distribution over the number of prisoners to die.

By the way, what happens if you have an uncountable infinity of prisoners? Say we make them infinitely thin and then squeeze them along the real line so as to populate every point. Each prisoner can see all but a finite number of the other prisoner’s hats. Let’s even give them hats that have an uncountable number of different colors. Maybe we pick each hat color by just randomly selecting any frequency of visible light.

Turns out that we can still use the axiom of choice to guarantee the survival of all but finitely many prisoners!

Colbert’s reaction when he first learned about the uncountable version of the hat problem

One last one.

Countably infinite prisoners, unknown number of black and white hats, with hearing

We have a countable infinity of prisoners again, each with either a black or white hat, but this time they can hear the colors called out by previous prisoners. Now how well can they do?

The answer? (Assuming the axiom of choice) Every single prisoner can be guaranteed to survive except for the first one, who survives with 50% probability. I really want this to sink in. When we had ten prisoners with ten hats, they could pull this off by using their knowledge of the total number of black and white hats amongst them. Our prisoners don’t have this anymore! They start off knowing nothing about the number of white and black hats besides what they can see in front of them. And yet they still get out of it with all but one prisoner guaranteed freedom.

How do they do this? They start off same as before, defining the equivalence relation and selecting a representative sequence from each equivalence class. Now they label every single sequence with either a 0 or a 1. A sequence gets a 0 if it differs from the representative sequence in its equivalence class in an even number of places, and otherwise it gets a 1. This labeling has the result that any two sequences that differ by exactly one digit have opposite labels.


Now the first person (A) just says the label of the sequence he sees. For the next person up (B), this is the sequence that starts with their hat. And remember, they know which equivalence class they’re in, since they can see everybody in front of them! So all they need to do is consider what the label of the sequence starting with them would be if they had a white hat on. If it would be different than the label they just heard, then they know that their hat is black. And if it would be the same, then their hat is white!

The person in front of B knows the equivalence class they’re in, and now also knows what color B’s hat was. So they can do the exact same reasoning to figure out their hat color! And so on, saving every last countable prisoner besides A..

Let’s see an example of this, for the sequence 100000…


Hat Puzzle (6)

Hat Puzzle (7)Hat-Puzzle-8.png

And so on forever, until every last prisoner besides A is free.

This set of results is collectively the most compelling argument I know for rejecting AC, and I love it.

Introduction to Mathematical Logic (Part 1)

Mathematical logic is the study of the type of reasoning we perform when we do mathematics, and the attempt to formulate a general language as the setting in which all mathematics is done. In essence, it is an attempt to form a branch of mathematics, of which all other branches of mathematics will emerge as special cases.

You might sort of think that when speaking at this level of abstraction, there will nothing general and interesting to say. After all, we’re trying to prove statements not within a particular domain of mathematics, but theorems that are true across a wide swath of mathematics, potentially encompassing all of it.

The surprising and amazing thing is that this is not the case. It turns out that there are VERY general and VERY surprising things you can discover by looking at the logical language of mathematics, a host of results going by names like the Completeness Theorem, the Incompleteness Theorem, the Compactness Theorem, the Löwenheim-Skolem Theorem, and so on. These results inevitably have a great deal of import to our attitudes towards the foundations of mathematics, being that they generally establish limitations or demonstrate eccentricities in the types of things that we can say in the language of mathematics.

My goal in this post is to provide a soft introduction to the art of dealing in mathematics at this level of ultimate abstraction, and then to present some of the most strange things that we know to be true. I think that this is a subject that’s sorely missing this type of soft introduction, and hope I can convey some of the subject’s awesomeness!

— — —

To start out with, why think that there is any subject matter to be explored here? Different branches of mathematics sometimes appear to be studying completely different types of structures. I remember an anecdote from an old math professor of mine, who worked within one very narrow and precisely defined area of number theory, and who told me that when she goes to talks that step even slightly outside her area of specialty, the content of the  lectures quickly incomprehensible to her. Why think that there is such a common language of mathematics, if specialists in mathematics can’t even understand each other when talking between fields?

The key thing to notice here is that although different fields of mathematics are certainly wildly different in many ways, there nevertheless remain certain fundamental features that are shared in all fields. Group theorists, geometrists, and number theorists will all accept the logical inference rule of modus ponens (if P is true and P implies Q, then Q is true), but none of them will accept its converse (if Q is true and P implies Q, then P is true). No matter what area of mathematics you study, you will accept that if P(x) is true for all x, then it is true for any particular x you choose. And so on. These similarities may seem obvious and trivial, but they HAVE to be obvious and trivial to be things that every mathematician agrees on. The goal, then, is to formalize a language that has these fundamental inference rules and concepts built in, and that has many special cases to account for the differences between domains of math, specified by some parameters that are freely chosen by any user of the system.

There are actually several distinct systems that attempt to accomplish this task. Generally speaking, there are three main branches, in order of increasing expressive power: propositional logic, first order (predicate) logic, and second order logic.

Reasoning In Zeroth Order

Let’s start with propositional logic, sometimes called “zeroth order logic.” Propositional logic is the framework developed to deal with the validity of the following types of arguments:

Argument 1

  1. 2+2=4.
  2. If 2+2=4, then 1+3=4.
  3. So 1+3=4.

Argument 2

  1. The Riemann hypothesis is false and P = NP.
  2. So P = NP.

Notice that it doesn’t matter if our premises are true or not. Logical validity doesn’t care about this, it just cares that the conclusions really do follow from the premises. This is a sign of the great generality at which we’re speaking. We’re perfectly fine with talking about a mathematical system in which the Riemann hypothesis is false, or in which 2+2 is not 4, just so long as we accept the logical implications of our assumptions.

Propositional logic can express the validity of these arguments by formalizing rules about valid uses of the concepts ‘and’, ‘if…then…’, ‘or’, and so on. It remains agnostic to the subject matter being discussed by not fully specifying the types of sentences that are allowed to be used in the language. Instead, any particular user of the language can choose their set of propositions that they want to speak about.

To flesh this out more, propositional logic fulfills the following three roles:

  1. Defines an alphabet of symbols.
  2. Specifies a set of rules for which strings are grammatical and which are not.
  3. Details rules for how to infer new strings from existing strings.

In more detail:

  1. The set of symbols in propositional logic are split into two categories: logical symbols and what I’ll call “fill-in-the-blank” symbols. The logical symbols are (, ), , , ¬, and →. The fill-in-the-blank symbols represent specific propositions, that are specified by any particular user of the logic.
  2. Some strings are sensible and others not. For example, the string “P∧∧” will be considered to be nonsensical, while “PQ” will not. Some synonyms for sensible strings are well-formed formulas (WFFs), grammatical sentences, and truth-apt sentences. There is a nice way to inductively generate the set of all WFFs: Any proposition is a WFF, and for any two WFFs F and F’, the following are also WFFs: (FF’), (FF’), ¬F, (F→F’).
  3. These include rules like modus ponens (from P and P→Q, derive Q), conjunction elimination (from PQ, derive P), double negation elimination (from ¬¬P, derive P), and several more. They are mechanical rules that tell you how to start with one set of strings and generate new ones in a logically valid way, such that if the starting strings are true than the derived ones must also be true. There are several different but equivalent formulations of the rules of inference in propositional logic.

A propositional language fills in the blanks in the logic. Say that I want to talk about two sentences using propositional logic: “The alarm is going off”, “A robber has broken in.” For conciseness, we’ll abbreviate these two propositions as A for alarm and R for robber. All I’ll do to specify my language is to say “I have two propositions: {A, R}”

The next step is to fill in some of the details about the relationships between the propositions in my language. This is done by supplementing the language with a set of axioms, and we call the resulting constrained structure a propositional theory. For instance, in my example above we might add the following axioms:

  1. A→R
  2. AR

In plain English, these axioms tell us that (1) if the alarm is going off, then a robber has broken in, and (2) an alarm is going off or a robber has broken in.

Finally, we talk about the models of our theory. Notice that up until now, we haven’t talked at all about whether any statements are true or false, just about syntactic properties like “The string P¬ is not grammatical” and about what strings follow from each other. Now we interpret our theory by seeing what possible assignments of truth values to the WFFs in our language are consistent with our axioms and logical inference rules. In our above example, there are exactly two interpretations:

Model 1: A is true, R is true
Model 2: A is false, R is true

These models can be thought of as the possible worlds that are consistent with our axioms. In one of them, the alarm has gone off and a robber has broken in, and in the other, the alarm hasn’t gone off and the robber has broken in.

Notice that R turns out true in both models. When a formula F is true in all models of a theory, we say that the theory semantically entails F, and write this as T F. When a formula can be proven from the axioms of the theory using the rules of inference given by the logic, then we say that the theory syntactically entails F, and write this as T F.

This distinction between syntax and semantics is really important, and will come back to us in later discussion of several important theorems (notably the completeness and incompleteness theorems). To give a sneak peek: above we found that R was semantically entailed by our theory. If you’re a little familiar with propositional logic, you might have also realized that R can be proven from the axioms. In general, syntactic truths will always be semantic truths (if you can prove something, then it must be true in all models, or else the models would be inconsistent. But a model is by definition a consistent assignment of truth values to all WFFs). But a natural question is: are all semantic consequences of a theory also syntactic consequences? That is, are all universal truths of the theory (things that are true in every model of the theory) provable from the theory?

Summary of mathematical logic meeting

If the answer is yes, then we say that our logic is complete. And it turns out that the answer is yes, for propositional logic. Whether more complex logics are complete turns out to be a more interesting question. More on this later.

This four-step process I just laid out (logic to language to theory to model) is a general pattern we’ll see over and over again. In general, we have the following division of labor between the four concepts:

  1. Logic: The logic tells us the symbols we may use (including some fill-in-the-blank categories of symbols), the rules of grammar, and a set of inference rules for deriving new strings from an existing set.
  2. Language: The language fills in the blanks in our logic, fully specifying the set of symbols we will be using.
  3. Theory: The theory adds axioms to the language.
  4. Model: A model is an assignment of truth values to all WFFs in the language, consistent with the axioms and the inference rules of our logic.

It’s about time that we apply this four-step division to a more powerful logic. Propositional logic is pretty weak. Not much interesting math can be done in a purely propositional language, and it’s wildly insufficient to capture our notion of logically valid reasoning. Consider, for example, the following argument:

  1. Socrates is a man.
  2. All men are mortal.
  3. So, Socrates is mortal.

This is definitely a valid argument. No rational agent could agree that 1 and 2 are true, and yet deny the truth of 3. But can we represent the validity of this argument in propositional logic? No!

Consider that the three sentences “Socrates is a man”, “All men are mortal”, and “Socrates is mortal” are distinct propositions, and the relationships between them are too subtle for propositional logic to capture. Propositional logic can’t see that the first proposition is asserting the membership of Socrates to a general class of things (“men”), and that the second proposition is then making a statement about a universal property of things in this class. It just sees two distinct propositions. To propositional logic, this argument just looks like

  1. P
  2. Q
  3. Therefore, R

But this is not logically valid! We could make it valid by adding as a premise the sentence (PQ)→R, which corresponds to the English sentence “If Socrates is a man and all men are mortal, then Socrates is mortal.” But this should be seen as a tautology, something that is provable in any first order theory that contains the propositions P Q and R, not a required additional assumption. Worse, if somebody came along and stated the proposition A = “Aristotle is a man”, then we would need a whole ‘nother assumption to assert that Aristotle is also mortal! And in general, for any individual instance of this argument, we’d need an independent explanation for its validity. This is not parsimonious, and indicative that propositional logic is missing something big.

Missing what? To understand why this argument is valid, you must be able to reason about objects, properties, and quantification. This is why we must move on to an enormously more powerful and interesting logic: first order logic.

Reasoning In First Order

First order logic is a logic, so it must fill the same three roles as we saw propositional logic did above. Namely, it must define the alphabet, the grammar, and the inference rules.

Logical: ¬ → ( ) =
Variables: x y z w …
Constants: ______
Predicates: ______
Functions: ______

That’s right, in first order we have three distinct categories of fill-in-the-blank symbols. Intuitively, constants will be names that refer to objects, predicates will be functions from objects to truth values, and functions will take objects to objects. To take an everyday example, if the objects in consideration are people, then we might take ‘a’ to be a constant referring to a person named Alex, ’T’ to be a predicate representing ‘is tall’, and ‘f’ to be a function representing “the father of”. So T(a) is either true or false, while f(a) doesn’t have a truth value (it just refers to another object). But T(f(a)) does have a truth value, because it represents the sentence “Alex’s father is tall.”

Next, we define well-formed formulas. This process is more complicated than it was for propositional logic, because we have more types of things than we did before, but it’s not too bad. We start by defining a “term”. The set of terms is inductively generated by the following scheme: All constants and variables are terms, and any function of terms is itself a term. Intuitively, the set of terms is the set of objects that our language is able to “point at”.

With the concept of terms in hand, we can define WFFs through a similar inductive scheme: Any predicate of terms is a WFF. And for any WFFs F and F’, (FF’), (FF’), (¬F), (F→F’), x F, x F are all WFFs. The details of this construction are not actually that important, I just think it’s nice how you can generate all valid first order formulas from fairly simple rules.

Good! We have an alphabet, a grammar, and now all we need from our logic is a set of inference rules. It turns out that this set is just going to be the inference rules from propositional logic, plus some new ones:

Quantifier elimination: From x P(x) derive P(t) (for any term t and predicate P)
Quantifier introduction: From P(t) derive x P(x) (for any term t and predicate P)

That’s it, we’ve defined first order logic! Now let’s talk about a first order language. Just like before, the language is just obtained by filling in the blanks left open by our logic. So for instance, we might choose the following language:

Constants: a
Functions: f
Predicates: none

In specifying our function, we have to say exactly what type of function it is. Functions can take as inputs a single object (“the father of”) or multiple objects (“the nearest common ancestor of”), and this will make a difference to how they are treated. So for simplicity, let’s say that our function f just takes in a single object.

A first order theory will simply be a language equipped with some set of axioms. Using our language above, we might have as axioms:

  1. x (f(x) ≠ x)
  2. x (f(x) ≠ a)

In plain English, we’re saying that f never takes any object to itself or to a.

And lastly, we get to the models of a first order theory. There’s an interesting difference between models here and models in propositional logic, which is that to specify a first order model, you need to first decide on the size of your set of objects (the size of the ‘universe’, as it’s usually called), and then find a consistent assignment of truth values to all propositions about objects in this universe.

So, for instance, we can start searching for models of our theory above by starting with models with one object, then two objects, then three, and so on. We’ll draw little diagrams below in which points represent objects, and the arrow represents the action of the function f on an object.

  1. No 1-element universe.
  2. No 2-element universe.
  3. 3-element universe
  4. 4 element universes
    Summary-of-mathematical-logic-meeting-5.pngSummary-of-mathematical-logic-meeting-4.pngSummary-of-mathematical-logic-meeting-7.pngSummary of mathematical logic meeting (6)
  5. And so on…

It’s a fun little exercise to go through these cases and see if you can figure out why there are no models of size 1 or 2, or why the one above is the only model of size 3. Notice, by the way, that in the last two images, we have an object for which there is no explicit term! We can’t get there just using our constant and our functions. Of course, in this case we can still be clever with our quantifiers to talk about the Nameless One indirectly (for instance, in both of these models we can refer to the object that is not equal to a, f(a), or f(f(a))) But in general, the set of all things in the universe is not going to be the same as the set of things in the universe that we can name.

Here’s a puzzle for you: Can you make a first order sentence that says “I have exactly four objects”? That is, can you craft a sentence that, if added as an axiom to our theory, will rule out all models besides the ones that have exactly four elements?


(Think about it for a moment before moving on…)


Here’s how to do it for a universe with two objects (as well as a few other statements of interest). The case of four objects follows pretty quickly from this.

  • “I have at most two objects” = xyz (z=x z=y)
  • “I have exactly two objects” = xy (x≠y z(z=x z=y))
  • “I have at least two objects” = xy (x≠y)

Another puzzle: How would you say “I have an infinity of objects”, ruling out all finite models?


(Think about it for a moment before moving on…)


This one is trickier. One way to do it is to introduce an infinite axiom schema: “I have at least n objects” for each n.

This brings up an interesting point: theories with infinite axioms are perfectly permissible for us. This is a choice that we make that we could perfectly well deny, and end up with a different and weaker system of logic. How about infinitely long sentences? Are those allowed? No, not in any of the logics we’re talking about here. A logic in which infinite sentences are allowed is called an infinitary logic, and I don’t know too much about such systems (besides that propositional, first, and second order logics are not infinitary).

Okay… so how about infinitely long derivations? Are those allowed? No, we won’t allow those either. This one is more easy to justify, because if infinite derivations were allowed, then we could prove any statement P simply by the following argument “P, because P, because P, because P, because …”. Each step logically follows from the previous, and in any finite system we’d eventually bottom out and realize that the proof has no basis, but in a naive infinite-proof system, we couldn’t see this.

Alright, one last puzzle. How would you form the sentence “I have a finite number of objects”? I.e., suppose you want to rule out all infinite models but keep the finite ones. How can you do it?


(Think about it for a moment before moving on…)


This one is especially tricky, because it turns out to be impossible! (Sorry about that.) You can prove, and we will prove in a little bit, that no first order axiom (or even infinite axiom schema) is in general capable of restricting us to finite models. We have run up against our first interesting expressive limitation of first-order logic!

Okay, let’s now revisit the question of completeness that I brought up earlier. Remember, a logic is complete if for any theory in that logic, the theory’s semantic implications are the same as its syntactic implications (all necessary truths are provable). Do you think that first order logic is complete?

The answer: Yes! Kurt Gödel proved that it is in his 1929 doctoral dissertation. Anything that is true in all models of a first order theory, can be proven from the axioms of that theory (and vice versa). This is a really nice feature to have in a logic. It’s exactly the type of thing that David Hilbert was hoping would be true of mathematics in general. (“We must know. We will know!”) But this hope was dashed by the same Kurt Gödel as above, in his infamous incompleteness theorems.

There will be a lot more to say about that in the future, but I’ll stop this post for now. Next time, we’ll harness the power of first order logic to create a first order model of number theory! This will give us a chance to apply some powerful results in mathematical logic, and to discover surprising truths about the logic’s limitations.

A Talmudic Probability Puzzle

Today we’ll take a little break from the more intense abstract math stuff I’ve been doing, and do a quick dive into a fun probabilistic puzzle I found on the internet.

Background for the puzzle: In Ancient Israel, there was a court of 23 wise men that tried important cases, known as the Sanhedrin. If you were being tried for a crime by the Sanhedrin and a majority of them found you guilty, you were convicted. But there was an interesting twist on this! According to the Talmud (Tractate Sanhedrin: Folio 17a), if the Sanhedrin unanimously found you guilty, you were to be acquitted.

If the Sanhedrin unanimously find [the accused] guilty, he is acquitted. Why? — Because we have learned by tradition that sentence must be postponed till the morrow in hope of finding new points in favour of the defence. But this cannot be anticipated in this case.

Putting aside the dubious logic of this rule, it gives rise to an interesting probability puzzle with a counterintuitive answer. Imagine that an accused murderer has been brought before the Sanhedrin, and that the evidence is strong enough that no judge has any doubt in their mind about his guilt. Each judge obviously wants for the murderer to be convicted, and would ordinarily vote to convict. But under this Talmudic rule, they need to be worried about the prospect of them all voting guilty and therefore letting him off scot-free!

So: If a probability p can be chosen such that each and every judge votes to convict with probability p, and to acquit with probability 1 – p, which p will give them the highest probability of ultimately convicting the guilty man?

Furthermore, imagine that the number of judges is not 23, but some arbitrarily high number. As the number of judges goes to infinity, what does p converge to?

I want you to think about this for a minute and test your intuitions before moving on.










So, it turns out that the optimal p for 23 judges is actually ≈ 75.3%. And as the number of judges goes to infinity? The optimal value of p converges to…


This was a big shock to me. I think the natural first thought is that when you have thousands and thousands of judges, you only need a minuscule chance for any one judge to vote ‘acquit’ in order to ensure a majority and prevent him from getting off free. So I initially guessed that p would be something like 99%, and would converge to 100% in the limit of infinite judges.

But this is wrong! And of the small sample of mathematically gifted friends I asked this question to, they mostly guessed the same as me.

There’s clearly a balance going on between the risk of a minority voting to convict and the risk of a unanimous vote to convict. For small p, the first of these is ~1 and the second is ~0, and for p ~ 1, the first is ~0 and the second ~1.  It seems that we are naturally underemphasizing the danger of a minority vote to convict, and overemphasizing the danger of the unanimous vote.

Here are some plots of the various relevant values, for different numbers of judges:

10 Judges23 Judges100 Judges150 Judges

One thing to notice is that as the number of judges gets larger, the graph’s peak becomes more and more of a plateau. And in the limit of infinite judges, you can show that the graph is actually just a simple step function: Pr(conviction) = 0 if p < .5, and 1 if p > .5. This means that while yes, technically, 80% is the optimal value, you can do pretty much equally well by choosing any value of p greater than 50%.

My challenge to you is to come up with some justification for the value 80%. Good luck!

Some Curious Group Isomorphisms

Curiosity Number One

The additive group of polynomials with integer coefficients is isomorphic to the multiplicative group of positive rational numbers: ([x], +) (ℚ+, ).

Any element of [x] can be written like a0 + a1x + a2x2 + … + anxn, where all coefficients a0, a1, …, an are integers. Consider the following mapping: φ: (a0 + a1x + a2x2 + … + anxn) (p0a0p1a1pnan), where pk refers to the kth prime number. Now, this is a homomorphism from [x] to ℚ+ because φ(p(x) + q(x)) = φ(p(x))φ(q(x)). You can also easily show that it is onto and one-to-one, using the uniqueness of prime factorization. Therefore φ is an isomorphism!

I think this is a good example to test the intuitive notion that once one “knows” a group, one also knows all groups that it is isomorphic to. It’s often useful to think of isomorphic groups as being like one book written in two different languages, with the isomorphism being the translation dictionary. But the non-obviousness of the isomorphism here makes it feel like the two groups in fact have a considerably different internal structure.

Curiosity Number Two

Your friend comes up to you and tells you, “I’m thinking of some finite group. All I’ll tell you is that it has at least two order-2 elements. I’ll give you $20 if you can tell me what the group that’s generated by these two elements is (up to isomorphism)!“

What do you think your chances are of getting this $20 reward? It seems intuitively like your chances are probably pretty dismal… there are probably hundreds or maybe even an infinity of groups that match her description.

But it turns out that you can say exactly what group your friend is thinking of, just from this information!

Call the group G. G is finite and has two order-2 elements, which we’ll call a and b. So we want to find out what <a,b> is. Define r = ab and n = |r|.  Then ara = a(ab)a = (aa)ba = ba = (ab)-1 = r-1 = rn-1. So ara = rn-1, or in other words ar = rn-1a. Also, since b = ar, we can write the group we’re looking for either as <a,b> or as <a,r> (these two are equal).

Thus the group we’re looking for can be described as follows:

<a, r | a2 = rn = e, ar = rn-1a>.

Look familiar? This is just the dihedral group of size 2n; it is the group of symmetries of a regular n-gon, where a is a flip and r is a rotation!

So if G is a finite group with at least two order-2 elements a and b, then <a, b> Dih|ab|.

This serves to explain the ubiquity and importance of the dihedral groups! Any time a group contains more than one order-two element, it must have a dihedral group of some order as a subgroup.

Curiosity Number Three

If K  H G, then H/K G/K and (G/K) / (H/K) ≅ G/H.

(G/K) / (H/K) is a pretty wild object; it is a group of cosets, whose representatives are themselves cosets. But this theorem allows us to treat it as a much less exotic object, an ordinary quotient group with representatives as elements from G.

I don’t have much else to say about this one, besides that I love how the operation of forming a quotient group behaves so much like ordinary division!

Group Theory Toolkit

The Toolkit

  1. The Group Axioms
    1. Closure: a,b ∈ G, a•b ∈ G
    2. Associativity: a,b,c ∈ G, (a•b)•c = a•(b•c)
    3. Identity: ∈ G s.t. ∈ G, e•a = a•e = a
    4. Inverses: ∈ G, a’ ∈ G s.t. a’•a = e
  2. Cancellation Rule
    (a•b = a•c) (b = c)
  3. Order Doesn’t Matter
    You can write a Cayley table with the elements in any order that’s convenient.
  4. Sudoku Principle
    Each row and column in the Cayley table for a group G contains every element in G exactly once.
  5. Lagrange’s Theorem
    H ≤ G |G| is divisible by |H|
  6. Elements Generate Subgroups
    ∈ G, <g> = {gk for all integer k} ≤ G.
    (|<g>| = n) (gn = e).
  7. Cauchy’s Theorem
    (|G| = pk for prime p) (∈ G s.t. gp = e)
  8. Classification of Finite Abelian Groups
    Every finite abelian group is isomorphic to Zn or direct products of Zn
  9. Orbit-Stabilizer Theorem
    |orb(x)||stab(x)| = |G|
  10. Partition Equation
    ∑|orb(x)| = |X|. The sum is taken over one representative for each orbit.
  11. Center is a Normal Subgroup
    Z(G), the center of G, is a normal subgroup. So G/Z is a group.
  12. Class Equation
    |G| = |Z| + ∑|Cl(x)|, where all terms divide |G| and each |Cl(x)| is neither 1 nor |G|. The sum is taken over non-central representatives of conjugacy classes.
  13. G/Z is Cyclic Implies G is Abelian
    G/Z is cyclic  G is abelian
  14. Size of Product of Subgroups
    |AB| = |A| |B| / |A B|
  15. Sylow’s First Theorem
    If |G| = pnk for prime p and k not divisible by p, then G has a subgroup of size pn.
  16. Sylow’s Third Theorem
    If |G| = pnk for prime p and k not divisible by p, then the number of subgroups of size pn must be 1 mod p, and must divide k.


Example 1: Solving the group of size 3

Suppose |G| = 3. By the Identity Axiom (1.3), G has an identity element e. Call the other two elements a and b. So G = {e, a, b}. Let’s write out the Cayley table for the group:

e a b
e e a b
a a
b b

By the Cancellation Rule (2), a•a ≠ a. By Lagrange’s Theorem (5), a•a ≠ e, as then the set {e, a} would be a subgroup of size 2, which doesn’t divide 3. Therefore by Closure (1.1), a•a = b.

e a b
e e a b
a a b
b b

We can fill out the rest of the Cayley table by the Sudoku principle (4).

e a b
e e a b
a a b e
b b e a

Example 2: Solving the group of size 5

e a b c d
e e a b c d
a a
b b
c c
d d

If a•a = e, then {e,a} would be a subgroup of size 2. By Lagrange’s theorem, then, a•a ≠ e. Also, by the Sudoku principle, a•a ≠ a. Thus we can choose the order of the rows/columns such that a•a = b (3).

e a b c d
e e a b c d
a a b
b b
c c
d d

Now look at a•b = a•(a•a) = a3. If a3 = e, then <a> would be a subgroup of size 3 (by 6). 3 doesn’t divide 5, so a•b ≠ e (Lagrange’s theorem again). By the Sudoku principle, a•b also isn’t a or b. So we can arrange the Cayley table so that a•b = c. Since a•b = a3 =  b•a, b•a = c as well.

e a b c d
e e a b c d
a a b c
b b c
c c
d d

Now we can use the Sudoku principle to fill out the rest!

e a b c d
e e a b c d
a a b c d e
b b c d e a
c c d e a b
d d e a b c

Example 3: Solving all groups of prime order

Suppose |G| = p, where p is some prime number. By Lagrange’s theorem every subgroup of G has either size 1 (the trivial subgroup {e}) or size p (G itself). Since the set generated by any element is a subgroup, any element g in G must have the property that |<g>| = 1 or p.

Choose g to be a non-identity element. <g> includes e and g, so |<g>| ≥ 2. So |<g>| = p. So <g> = G. So G is a cyclic group (generated by one of its elements). This fully specifies every entry in the Cayley table (gn•gm = gn+m), so there is only one group of size p up to isomorphism.

Example 4: Solving size 6 groups

Suppose |G| = 6.

By Cauchy’s theorem (7), there exists at least one element of order 2 and at least one element of order 3. Name the order-2 element a and the order-3 b. Fill in the Cayley table accordingly:

e a b ? ? ?
e e a b
a a e
b b

Suppose b2 = a. Then (b2)2 = a2 = e, so b is order 4. But b is order 3. So b2 ≠ a. Also, b2 ≠ e for the same reason, and ≠ b by the Cancellation Rule . So b2 is something new, which we’ll put in our next column:

e a b b2 ? ?
e e a b b2
a a e
b b b2
b2 b2

b3 = e, so we can fill out b • b2 and b2 • b, as well as b2 • b2 = b3 • b = b

e a b b2 ? ?
e e a b b2
a a e
b b b2 e
b2 b2 e b

By the Sudoku principle, we can clearly see that a • b is none of e, a, b, or b2. So whatever it is, we’ll put it in the next column as “ab”:

e a b b2 ab ?
e e a b b2 ab
a a e ab
b b b2 e
b2 b2 e b
ab ab

Looking at a•b2, we can again see that it is none of the elements we’ve identified so far. So whatever it is, we’ll put it in the next column as “ab2”.

e a b b2 ab ab2
e e a b b2 ab ab2
a a e ab ab2
b b b2 e
b2 b2 e b
ab ab
ab2 ab2

Associativity (1.2) allows us to use the algebraic rules a2  = e and b3 = e to fill in a few more spots (like a•ab = a2b = b).

e a b b2 ab ab2
e e a b b2 ab ab2
a a e ab ab2 b b2
b b b2 e
b2 b2 e b
ab ab ab2 a
ab2 ab2 a ab

Now everything else in the table is determined by a single choice: should we assign b•a to ab, or to ab2? Each leads to a Cayley table consistent with the axioms, so we write them both out, using the Sudoku principle to finish each one entirely.

Group 1: (ba = ab)

e a b b2 ab ab2
e e a b b2 ab ab2
a a e ab ab2 b b2
b b ab b2 e ab2 a
b2 b2 ab2 e b a ab
ab ab b ab2 a b2 e
ab2 ab2 b2 a ab e b

Group 2: (ba = ab2)

e a b b2 ab ab2
e e a b b2 ab ab2
a a e ab ab2 b b2
b b ab2 b2 e a ab
b2 b2 ab e b ab2 a
ab ab b2 ab2 a e b
ab2 ab2 b a ab b2 e

These two groups are clearly not isomorphic, as one is commutative and the other not. This proves that there are two groups of size 6 up to isomorphism!

Example 5: Solving groups of prime order, again

Suppose |G| = p, where p is some prime number. The center of G, Z(G), is a subgroup of G (by 11). So by Lagrange’s theorem, |Z| = 1 or p.

By the Class Equation (12), |G| = |Z| + ∑|Cl(x)|. All terms divide p and each |Cl(x)| ≠ 1, so we have p = |Z| + ∑p. This is only possible if |Z| = 0 or there are no non-central elements. Z contains e, so |Z| ≠ 0. So there are no non-central elements. Thus G = Z, which is to say G is abelian.

By the Classification of Finite Abelian Groups (8), G is isomorphic to Zp.

Example 6: Prime power groups have non-trivial centers

Suppose |G| = pn, where p is some prime number and n is some positive integer. The Class Equation tells us that |G| = |Z| + ∑|Cl(x)|, where |Z| = 1, p, p2, …, or pn and each |Cl(x)| = p, p2, …, or pn-1. Taking the Class Equation modulo p we find that |Z| = 0 (mod p), so |Z| cannot be 1. So G has a non-trivial center!

Example 7: Solving groups of prime-squared order

Suppose |G| = p2, where p is some prime number. The Class Equation tells us that |G| = |Z| + ∑|Cl(x)|, where |Z| = 1, p, or p2 and each |Cl(x)| = p. So we have p2 = |Z| + ∑p. Taking this equation mod p we find that |Z| = 0 (mod p), so |Z| is non-trivial. (This is just a special case of Example 6 above.) So |Z| = p or p2.

Suppose |Z| = p. Then |G/Z| = p2/p = p. So G/Z is cyclic. But then G is abelian (by 13). So G = Z, which implies that |Z| = p2. Contradiction. So |Z| = p2, i.e. G is abelian.

By the classification of finite abelian groups we see that there are two possibilities: G is isomorphic to integers mod p2 or G is isomorphic to Zp x Zp. These are the only two groups (up to isomorphism) of size p2.

Example 8: Proving Cauchy’s Theorem

Suppose |G| = n, where n is divisible by some prime p.

Let X be the subset of p-tuples of elements from G whose product is e. That is, X = {(g1,g2,…,gp) Gp s.t. g1g2…gp = e}. Notice that the number of elements in X is |G|p-1 (there are |G| choices for each of the first p-1 elements, and then the final one is fully determined by the constraint). Let the integers mod p (Zp) act on X by the following rule: the action of 1 on an element of X is 1 • (g1,g2,…,gp) = (gp,g1,g2,…,gp-1). (From this we can deduce the action of all the other integers mod p.)

By the Orbit Stabilizer Theorem (9), the possible orbit sizes for this action must divide the size of the cyclic group. That is, for every x X, |orb(x)| = 1 or p. Let r be the number of orbits of size 1 and s be the number of orbits of size p. Then by the Partition Equation (10), we have r + sp = |X| = |G|p-1. By our starting assumption, we have that the right hand side is divisible by p. And since sp is divisible by p, r must also be divisible by p. That is, r = 0, or p, or 2p, and so on.

Notice that the p-tuple (e,e,…,e) is in X, and that its orbit size is 1. So r is at least 1. So r = p, or 2p, and so on. That means that there is at least one more element of X with orbit size 1. This element must look like (a,a,…,a) for some a G. And since it’s in X, it must satisfy our constraint, namely: ap = e! So there is at least one non-identity element of order p.

Example 9: Subgroups of Group of Size 20

Suppose |G| = 10 = 22*5

By Cauchy’s Theorem and Sylow’s First Theorem (15), we know there must be subgroups of size 2, 4, and 5. We can learn more about the number of subgroups of sizes 4 and 5 using Sylow’s Third Theorem (16).

If N is the number of subgroups of size 4, then N divides 5 (1 or 5), and N = 1 (mod 2). So N = 1 or 5.

If N is the number of subgroups of size 5, then N divides 4 (1, 2, 4), and N = 1 (mod 5). So N = 1.

Example 10: Subgroups of Group of Size p2q

Suppose |G| = p2q, for primes p and q where 2 < p < q and q-1 is not a multiple of p.

Then by (15) we have subgroups of size p2 and q. Again, we use (16) to learn more about these subgroups.

If N is the number of subgroups of size p2, then N divides q (1 or q), and N = 1 (mod p). q ≠ 1 (mod p), N cannot be q. So N = 1.

If N is the number of subgroups of size q, then N divides p2 (1, p, or p2), and N = 1 (mod q) (so N is 1, or q+1, or 2q+1, and so on). But p is smaller than q, so N can’t be p. And if N = p2, then p2 – 1 must be a multiple of q. But p2 – 1 = (p+1)(p-1). Both of these values are too small to contain factors of q. So their product cannot be a multiple of q. So N ≠ p2 either. So N = 1!

Example 11: Subgroups of Group of Size pqn

Suppose |G| = pqn, where p and q are primes and 2 < p < q.

Using Sylow III (16): If N is the number of subgroups of size qn, then N divides p (1 or p) and N = 1 (mod q). p is too small to be a multiple of q plus 1, So N = 1.

Example 12: Classifying Groups of Size 2p

Suppose |G| = 2p for some prime p > 2. Let’s use everything in our toolkit to fully classify the possibilities!

By Cauchy, we have elements of of order 2 and p. Let a be an element of order 2 and b an element of order p. Let A = <a> = {e, a} and B = <b> = {e, b, b2, …, bp-1}. A and B are both cyclic of different orders, so their intersection is trivial. So by (14), |AB| = |A| |B| / |A B| = 2p. So AB = G. This means we can write all the elements of G as follows:

e b b2 bp-1
e e b  b2  bp-1
a a ab ab2 abp-1

So G = {e, b, b2, …, bp-1, a, ab, ab2, …, abp-1}. Call N2 the number of subgroups of size 2 and Np the number of subgroups of size p. Applying Sylow III, it is easy to see that there are two possibilities: (N2 = 1 and Np = 1), or (N2 = p and Np = 1).

Case 1: N2 = p and Np = 1.

Since N2 = p and Np = 1, there are p elements of order 2 (one for each subgroup) and p-1 elements of order p. Adding in e, this accounts for all of G.

Order 1: e
Order p: b, b2, …, bp-1
Order 2: a, ab, ab2, …, abp-1 (the remaining members of G)

Now, since a and b are in G, so must be ba. And since G = AB, ba AB. It’s not e or a power of b (apply the Cancellation Rule), so ba must be abn for some n. This means that ba is order 2, so (ba)2 = e. Expanding and substituting, we get (ba)2 = (ba)(ba) = (ba)(abn) = ba2bn = bn+1 = e = bp.

So n = p – 1, or in other words, ba = abp-1. This actually allows us to fully fill out the rest of the Cayley table, as we can take any expression and move the ‘b’s all the way to the right, ending up with something that looks like anbm.  This is the dihedral group of degree p (the group of symmetries of a regular p-gon)!

Case 2: N2 = 1 and Np = 1.

Order 1: e
Order p: b, b2, …, bp-1
Order 2: a

No other elements can be order 1, 2, or p, so all other elements must be order 2p (think Lagrange’s theorem). For instance, (ab)2p = e and (ab)p ≠ e . You can show that this only works if ab = ba, which allows you to freely move ‘a’s and ‘b’s around in any expression. ((ab)2p = a2p b2p = (a2)p (bp)2 = e, and (ab)p = ap bp = ap = a ≠ e).

Again, we can use this to fill out the entire Cayley table. The group we get is isomorphic to Z2p (the integers mod 2p)!

So if |G| = 2p for some prime p > 2, then G is either the group of symmetries of a regular p-gon, or it’s the integers mod p. Fantastic! If you follow this whole line of reasoning, then I think that you have a good grasp of how to use the toolkit above.

Producing all numbers using just four fours

How many of the natural numbers do you think you can produce just by combining four 4s with standard mathematical operations?

The answer might blow your mind a little bit. It’s all of them!

In particular, you can get any natural number by using the symbols ‘4’, ‘√’, ‘log’, and ‘/’. And in particular, you can do it with just four 4s!

Try it for yourself before moving on!



Continue reading “Producing all numbers using just four fours”