Put down three points on a piece of paper. Choose one of them as your “starting point”. Now, randomly choose one of the three points and hop from your starting point, halfway over to the chosen point. Mark down where you’ve landed. Then repeat: randomly choose one of the three starting points, and move halfway from your newly marked point to this new chosen point. Mark where you land. And on, and on, to infinity.

What pattern will arise? Watch and see!

Controls:
E to increase points/second.
Q to decrease points/second.
Click and drag the red points to move them around.
Pressing a number key will make a polygon with that number of sides.

Here’s a natural follow-up to my last post on the Mandelbrot set – an interactive Julia set explorer!

The Julia set corresponding to a particular point c = x + iy in the complex plane is defined as the set of complex numbers z that stay finite upon arbitrary iterations of the following function: f_{c}(z) = z^{2} + c. The Mandelbrot set, by comparison, is defined as the set of complex numbers c such that the value obtained by starting with 0 and iterating the function f_{c} arbitrarily many times converges.

What’s remarkable is now beautiful and complex the patterns that arise from this simple equation are. Take a look for yourself: just hover over a point to see its corresponding Julia set!

Resolution is preset at a value good for seeing lots of details and loading at a reasonable speed, but should you want to change it, controls are ‘E’ to increase it and ‘Q’ to decrease it. To reset to default, press ‘SPACE’.

Let me start this off by saying that if you’re reading this blog and haven’t ever checked out the Youtube channel Numberphile, you need to go there right away and start bingeing their videos. That’s what I’ve been doing for the last few days, and it’s given me tons of cool new puzzles to consider.

Here’s one:

Naturally, after watching this video I wanted to try this out for myself. Here you see the pattern arising beautifully from the randomness:

I urge you to think hard about why the Sierpinski triangle would arise from something as simple as randomly hopping between midpoints. It’s very non-obvious, and although I have a few ideas, I’m still missing a clear intuition.

I also made some visualizations for other shapes. I’ll show some of them, but encourage you to make predictions about what pattern you’d expect to see before scrolling down to see the actual result.

First:

Square

Instead of three points arranged as above, we will start out with four points arranged in a perfect square. Then, as before, we’ll jump from our starting point halfway to one of these four, and will continue this procedure ad infinitum.

What pattern will arise? Do you think that we’ll have “missing regions” where no points can land, like with the triangle?

Scroll down to see the answer…

(…)

(…)

(…)

Okay! So it looks like the whole square gets filled out, with no missing regions. This was pretty surprising to me; given that three points gave rise to a intricate fractal pattern, why wouldn’t four points do the same? What’s special about “3” .

Well, perhaps things will be different if we tweak the positions of the corners slightly? Will any quadrilateral have the same behavior of filling out all the points, or will the blank regions re-arise? Again, make a prediction!

Quadrilaterals

Let’s see:

Okay, now we see that apparently the square was actually a very special case! Pretty much any quadrilateral we can construct will give us a nested infinity of blank regions, as long as at least one angle is not equal to 90º. Again, this is fascinating and puzzling to me. Why do 90º angles invariably cause the whole region to fill out? I’m not sure.

Let’s move on to a pentagon! Do you think that a regular pentagon will behave more like a triangle or a square?

Pentagon

Take a look…

And naturally, the next question is what about a hexagon?

Hexagon

Notice the difference between the hexagon and all the previous ones! Rather than having small areas of points that are never reached, it appears that suddenly we get lines! Again, I encourage you to try to think about why this might be (what’s so special about 6?) and leave a comment if you have any ideas.

Now, I because curious about what other types of patterns we can generate with simple rules like these. I wondered what would happen if instead of simply jumping to the average of the current point and a randomly chosen point, we built a pattern with some “memory”. For instance, what if we didn’t just look at the current point and the randomly chosen point, but also at the last chosen point? We could then take the middle of the triangle formed by these three points as our new point.

It turns out that the patterns that arise from this are even more beautiful than the previous ones! (In my opinion, of course)

Take a look:

Triangle

Square

Pentagon

Hexagon

Heptagon

I’ll stop here, but this is a great example of how beautiful and surprising math can be. I would have never guessed that such intricate fractal patterns would arise from such simple random rules.

Consider two players: Alice and Bob. Alice moves first. At the start of the game, Alice has two piles of coins in front of her: one pile contains 4 coins and the other pile contains 1 coin. Each player has two moves available: either “take” the larger pile of coins and give the smaller pile to the other player or “push” both piles across the table to the other player. Each time the piles of coins pass across the table, the quantity of coins in each pile doubles. For example, assume that Alice chooses to “push” the piles on her first move, handing the piles of 1 and 4 coins over to Bob, doubling them to 2 and 8. Bob could now use his first move to either “take” the pile of 8 coins and give 2 coins to Alice, or he can “push” the two piles back across the table again to Alice, again increasing the size of the piles to 4 and 16 coins. The game continues for a fixed number of rounds or until a player decides to end the game by pocketing a pile of coins.

(Assume that if the game gets to the final round and the last player decides to “push”, the pot is doubled and they get the smaller pile.)

Assuming that they are self-interested, what do you think is the rational strategy for each of Alice and Bob to adopt? What is the rational strategy if they each know that the other reasons about decision-making in the same way that they themselves do? And what happens if two updateless decision theorists are pitted against each other?

If you have some prior familiarity with game theory, you might have seen the backwards induction proof right away. It turns out that standard game theory teaches us that the Nash equilibrium is to defect as soon as you can, thus never exploiting the “doubling” feature of the setup.

Why? Supposing that you have made it to the final round of the game, you stand to get a larger payout by “defecting” and taking the larger pile rather than the doubled smaller pile. But your opponent knows that you’ll reason this way, so they reason that they are better off defecting the round before… and so on all the way to the first round.

This sucks. The game ends right away, and none of that exponential goodness gets taken advantage of. If only Alice and Bob weren’t so rational!

We can show that this conclusion follows as long as the three things are true of Alice and Bob:

Given a choice between a definite value A and a smaller value B, both Alice and Bob will choose the larger value (A).

Both Alice and Bob can accurately perform deductive reasoning.

Both (1.) and (2.) are common knowledge to Alice and Bob.

It’s pretty hard to deny the reasonableness of any of these three assumptions!

Here’s a related problem:

An airline loses two suitcases belonging to two different travelers. Both suitcases happen to be identical and contain identical antiques. An airline manager tasked to settle the claims of both travelers explains that the airline is liable for a maximum of $100 per suitcase—he is unable to find out directly the price of the antiques.

To determine an honest appraised value of the antiques, the manager separates both travelers so they can’t confer, and asks them to write down the amount of their value at no less than $2 and no larger than $100. He also tells them that if both write down the same number, he will treat that number as the true dollar value of both suitcases and reimburse both travelers that amount. However, if one writes down a smaller number than the other, this smaller number will be taken as the true dollar value, and both travelers will receive that amount along with a bonus/malus: $2 extra will be paid to the traveler who wrote down the lower value and a $2 deduction will be taken from the person who wrote down the higher amount. The challenge is: what strategy should both travelers follow to decide the value they should write down?

Suppose you put no value on honesty, and only care about getting the most money possible. Further, suppose that both travelers reason the same way about decision problems, and that they both know this fact (and that they both know that they both know this fact, and so on).

The first intuition you might have is that both should just write down $100. But if you know that your partner is going to write down $100, then you stand to gain one whole dollar by defecting and writing $99 (thus collecting the $2 bonus for a total of $101). But if they know that you’re going to write $99, then they stand to gain one whole dollar by defecting and writing $98 (thus netting $100). And so on.

In the end both of these unfortunate “rational” individuals end up writing down $2. Once again, we see the tragedy of being a rational individual.

Of course, we could take these thought experiments to be an indication not of the inherent tragedy of rationality, but instead of the need for a better theory of rationality.

For instance, you might have noticed that the arguments we used in both cases relied on a type of reasoning where each agent assumes that they can change their decision, holding fixed the decision of the other agent. This is not a valid move in general, as it assumes independence! It might very well be that the information about what decision you make is relevant to your knowledge about what the other agent’s decision will be. In fact, when we stipulated that you reason similarly to the other agent, we are in essence stipulating an evidential relationship between your decision and theirs! So the arguments we gave above need to be looked at more closely.

If the agents do end up taking into account their similarity, then their behavior is radically different. For example, we can look at the behavior of updateless decision theory: two UDTs playing each other in the Centipede game “push” every single round (including the final one!), thus ending up with exponentially higher rewards (on the order of $2^{N}, where N is the number of rounds). And two UDTs in the Traveller’s Dilemma would write down $100, thus both ending up roughly $98 better off than otherwise. So perhaps we aren’t doomed to a gloomy view of rationality as a burden eternally holding us back!

One final problem.

Two players, this time with just one pile of coins in front of them. Initially this pile contains just 1 coin. The players take turns, and each turn they can either take the whole pile or push it to the other side, in which case the size of the pile will double. This will continue for a fixed number of rounds or until a player ends the game by taking the pile.

On the final round, the last player has a choice of either taking all the coins or pushing them over, thus giving the entire doubled pile to their opponent. Both players are perfectly self-interested, and this fact is common knowledge. And finally, suppose that who goes first is determined by a coin flip.

Standard decision theory obviously says that the first person should just take the 1 coin and the game ends there. What would UDT do here? What do you think is the rational policy for each player?

To be totally clear: the question is not assuming that there is ONLY one student whose neighbors both flipped heads, just that there is AT LEAST one such student. You can imagine that the teacher first asks for all students whose neighbors both flipped heads to step forward, then randomly selected one of the students that had stepped forward.

Now, take a minute to think about this before reading on…

It seemed initially obvious to me that the teacher was correct. There are exactly as many possible worlds in which the three students are HTH as there worlds in which they are HHH, right? Knowing how your neighbors’ coins landed shouldn’t give you any information about how your own coin landed, and to think otherwise seems akin to the Gambler’s fallacy.

But in fact, the teacher is wrong! It is in fact more likely that the student flipped tails than heads! Why? Let’s simplify the problem.

Suppose there are just three students standing in a circle (/triangle). There are eight possible ways that their coins might have landed, namely:

HHH
HHT
HTH
HTT
THH
THT
TTH
TTT

Now, the teacher asks all those students whose neighbors both have “H” to step forward, and AT LEAST ONE steps forward. What does this tell us about the possible world we’re in? Well, it rules out all of the worlds in which no student could be surrounded by both ‘H’, namely… TTT, TTH, THT, and HTT. We’re left with the following…

HHH
HHT
HTH
THH

One thing to notice is that we’re left with mostly worlds with lots of heads. The expected total of heads is 2.25, while the expected total of tails is just 0.75. So maybe we should expect that the student is actually more likely to have heads than tails!

But this is wrong. What we want to see is what proportion of those surrounded by heads are heads in each possible world.

HHH: 3/3 have H (100%)
HHT: 0/1 have H (0%)
HTH: 0/1 have H (0%)
THH: 0/1 have H (0%)

Since each of these worlds is equally likely, what we end up with is a 25% chance of 100% heads, and a 75% chance of 0% heads. In other words, our credence in the student having heads should be just 25%!

Now, what about for N students? I wrote a program that does a brute-force calculation of the final answer for any N, and here’s what you get:

N

cr(heads)

~

3

1/4

0.25

4

3/7

0.4286

5

4/9

0.4444

6

13/32

0.4063

7

1213/2970

0.4084

8

6479/15260

0.4209

9

10763/25284

0.4246

10

998993/2329740

0.4257

11

24461/56580

0.4323

12

11567641/26580015

0.4352

13

1122812/2564595

0.4378

14

20767139/47153106

0.4404

15

114861079/259324065

0.4430

16

2557308958/5743282545

0.4453

17

70667521/157922688

0.4475

These numbers are not very pretty, though they appear to be gradually converging (I’d guess to 50%).

Can anybody see any patterns here? Or some simple intuitive way to arrive at these numbers?

Read this post and give it a try yourself before reading on! Spoilers ahead.

I’ve written up the way that I solved the puzzle, step by step:

1. A looks at B and C and says “I don’t know what my number is.”

We assess what information this gives us by considering in which scenario A would have known what their number was, and ruling it out.

Well, for starters, if A saw two different numbers, then A would consider there to be two logically possible worlds, one in which they are the sum of the two numbers, and another in which they are one of the two that are added together. But if A saw the SAME two numbers, they A would know that they must be the sum of those two (since zero is not a possible value for themselves). What this means is that the fact that A doesn’t know their number tells us with certainty that B ≠ C! Furthermore, since B and C will go through this same line of reasoning themselves, they will also know that B ≠ C. And since all of them know that B and C will go through the same line of reasoning, it becomes common knowledge that B ≠ C.

Good! So after A’s statement, we have added one piece of common knowledge, namely that B ≠ C.

2. B thinks a moment and then says “Me neither.”

Ok, one thing this tells us is that A ≠ C, for the exact same reason as before.

But it also tells us more than this, because B knows that B ≠ C and still doesn’t know. So we just need to think of the scenario in which knowing that B ≠ C (and knowing the values of A and C, of course) would tell B the value that they have. Try to figure it out for yourselves!

The answer is, the scenario is that A = 2C! Imagine that B sees that A is 10 and C is 5. This tells them that they are either 5 or 15. But they know they can’t be 5, because C is five and B ≠ C. So they must be 15! In other words, since they would know their value if A equaled 2C and they don’t know their value, this tells us that A ≠ 2C!

So now we have two more pieces of common knowledge: A ≠ C, and A ≠ 2C. Putting this together with what we knew before, we have a total of three pieces of information:

B ≠ C
A ≠ C
A ≠ 2C

3. C thinks a moment and then says “Me neither.”

By exact analogy with the previous arguments, this tells us that A ≠ B, as well as that A ≠ 2B and B ≠ 2A. (We saw previously that B could conclude from B ≠ C that A ≠ 2C. By the same arguments, C can conclude from C ≠ A that B ≠ 2A. And from C ≠ B, C can conclude that A ≠ 2B.)

There’s one more piece of information that we haven’t taken into account, which is that A ≠ 2C. In which situation does the fact that A ≠ 2C tell C their value? Well, if A = 10 and B = 15, then C is either 5 or 20. But C can’t be 5, because then A would be twice C. So C could conclude that they are 20. Since C doesn’t conclude this, we know that 3A ≠ 2B.

Putting it all together, we know the following:

B ≠ C
A ≠ C
A ≠ 2C
A ≠ 2B
B ≠ 2A
3A ≠ 2B

4. A thinks, and says: “Now I know! My number is 25.”

The question we need to ask ourselves is what the values of B and C must be in order that the above six conditions to allow A to figure out their own value. Pause to think about this before reading on…

…

…

Let’s work through how A processes each piece of information:

A could figure out their own value by seeing B = C. But we already know that this isn’t the case.

Since A knows that A ≠ C, A could figure out their value by seeing B = 2C. So that’s a possibility… Except that if B = 2C, then A = B + 2C = 3C. But 25 is not divisible by 3. So this can’t be what they saw.

Since A knows that A ≠ B, A could figure out their value by seeing C = 2B. But again, this doesn’t work, since it would imply that A was divisible by 3, which it is not.

Since A knows that A ≠ 2C, A could figure out their value by seeing B = 3C (e.g. B = 15, C = 5). They would rule out themselves being one component of the sum, and conclude that they are 4C. But 25 is not divisible by 4. So this is not the case.

Since A knows that A ≠ 2B, A could figure out their value by seeing C = 3B (e.g. B = 5, C = 15). By the same reasoning as before, this cannot be the case.

Since B ≠ 2A, A could figure out their value by seeing 3B = 2C (e.g. B = 10, C = 15). They would know that they cannot be just one component of the sum, so they would conclude that they must be B + C, or 2.5 B. Now, is there an integer B such that 25 = 2.5 B? You betcha! B = 10, and C = 15!

We can stop here, since we’ve found a logically consistent world in which A figures out that their own value is 25. Since there can only be one such world (as the problem statement implies that this information is enough to solve the puzzle), we know that this must be what they saw. So we’ve found the answer! (A,B,C) = (25,10,15). But if you’re curious, I’d suggest you go through the rest of the cases and show that no other values of B and C would be consistent with them knowing that their own value is 25.

One thing that’s interesting here is the big role that the number 25 played in this. The fact that 25 was not divisible by 3 but was divisible by 2.5, was crucial. For the same puzzle but a different value that 25, we would have come to a totally different answer!

My challenge to anybody that’s made it this far: Consider the set of all integers that A could have truthfully declared that they knew themselves to be. For some such integers, it won’t be the case that A’s declaration is sufficient for us to conclude what B and C are. Which integers are these?

Common knowledge puzzles are my favorite. Here’s one I just came across. I challenge you to try to figure it out in less than 5 minutes. 🙂

Three perfect logicians with positive (non-zero) integers taped to their foreheads, A, B, and C, sit in a triangle. Each doesn’t know their own number but can see the numbers for the other two. It is common knowledge amongst all three that one of the numbers is the sum of the other two, but it is not known which is which.

A looks at B and C, and says “I don’t know what my number is.”

B thinks a moment and then says “Me neither.”

C thinks a moment and then says “Me neither.”

A thinks, then says “Now I know! My number is 25.”

This puzzle has been popping up on my Facebook feed recently. Try to see how high you can get before reading on!

(…)

Now, would you be surprised if I told you that with a little interpretive freedom and creativity, you can get numbers larger than the number of atoms in the observable universe? How about if I told you that you can get to numbers large enough that they break set theory? Let me demonstrate for you.

First, we’ll start with the boring solutions.

You can move the bottom match in the 5 up to make it a 9. Then you can rotate the bottom left vertical match in the 0 to make it a nine. This gives 998.

Can we do better? Sure! Take the top and bottom matches in the central zero and move them to squeeze in another one, giving you 51,118.

So far we’ve been working purely in base 10. Let’s try to do better by moving to a more exotic number system.

Hexadecimal!

Hexadecimal is a base-16 number system, the digits of which are written as 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, and F. For instance, we can construct F88:

F88 is only 3976, which is not better than our best so far (51,118). If we’re allowed to interpret a square of matches as a D instead of a zero, we can do slightly better (though to do so we have to be allowed to throw out a toothpick, or place it directly on top of another one):

FDB is 4059, which is still not better than our current best. To get better, we need to get more clever.

Exponentiation!

Our first strategy is just to shift two matches in the first digit down so as to make it into a 9:

We can interpret this as 9^{8} = 43,046,721. This is our best so far! But we can do better by applying Knuth’s up-arrow notation.

This is 5^{18}, which is almost 3.8 trillion, 100 thousand times better than 9^{8}! But we’re not done yet! If we use a caret (^) to represent exponentiation, we can get even higher!

51^{18} is 5.4 nonillion, a number with 31 decimal digits long. We could try to do better than this by squeezing in a caret between the 5 and the 1, making 5^{1}^{18} (a number with 83 decimal digits) but this becomes pretty messy and gross.

Alright, so we got up to 83 digit long numbers. Can we get any better? Yep!

Tetration

Tetration is the level above exponentiation. The nth tetration of a is defined as follows:

Just as multiplication is repeated addition and exponentiation is repeated multiplication, tetration is repeated exponentiation! Now we get into numbers whose size is truly impossible to grasp. Let’s shift the top and bottom matches on the middle 0:

We can write this equivalently as:

How big is this number? There’s really no meaningful way to describe it. We’ve gone far beyond any quantities that can be made physical sense of, like the number of cubic nanometers in the universe, which is a measly 10^{107}. But we’re not done yet!

Busy Beavers!

The busy beaver numbers are a sequence of numbers that arise from the properties of Turing machines. Here’s a brief description: The nth busy beaver number B(N) is the greatest number of ones that a finitely-running N state Turing machine can print on a tape which starts all zero.

Did you think tetration is big? Well, busy beaver numbers are unimaginably larger. In fact, the busy beaver sequence grows larger than any computable function! There’s a neat proof of this that involves the uncomputability of the halting problem. To skim over it, it can be shown that were we to have an upper bound on the busy beaver sequence, then we could find a deterministic algorithm for solving the halting problem in a finite amount of time, which we know is impossible. And if any computable function F(N) grew faster than B(N), then we could find an upper bound on the busy beaver sequence. Thus, it must be the case that no computable function grows as fast as B(N)!

We can exploit the absurd growth rate of the busy beaver sequence if we are allowed the interpretative freedom of assuming parentheses, so that Bn = B(n).

Let’s think for a minute about how large B(118) must be. So far, the only values of n for which B(n) is known are B(1), B(2), B(3), and B(4). After this the values grow out of control. A lower bound on B(6) is . For B(7), we have as a lower bound . B(8) almost certainly beats our current record. And B(118) is unthinkably large.

We can get even higher than the busy beaver numbers with the maximum shifts function S(N), defined as the number of steps that the longest-finitely-running N state Turing machine takes before halting. This function is guaranteed to be larger than B(N) for all N. Using S(N), and taking the same liberties as above with respect to parentheses, we can get an insanely high value:

This is S(1118), and while it’s undoubtedly larger than B(118), there’s no way to get any intuitive grasp on how much larger. But wait, there’s more! We can get even larger by recursively nesting a busy beaver function within a maximum shifts function:

We interpret this as S(B(9)). Why is this larger than S(1118)? Well, B(9) is some enormous number, certainly larger than 1118, so S(B(9)) is certainly greater than S(1118).

Now, are we finally done? Have we reached the peak yet? No! It’s time for the largest solution of them all.

The reason that the Busy Beaver numbers and Maximum Shift function are so big is because of the uncomputability of the halting problem. But if we consider Turing machines that have an oracle for the halting problem (call these meta Turing machines), we get a new meta-halting problem: when do these meta Turing machines halt? From the meta-halting problem comes an associated new sequence of Busy Beaver numbers, which grows uncomputable faster than the original Busy Beaver sequence. Then we can equip Turing machines with an oracle for the meta-halting problem, generating a meta-meta-Busy Beaver sequence.

Thus we get a hierarchy of Busy Beaver functions, which, following the notation used by Scott Aaronson here, can be described with B_{n}(x). Each B_{n} grows uncomputably faster than the previous B_{n-1}. There’s a similar hierarchy for the maximum shifts function, and each S_n is going to be an upper bound on each S_{n-1}.

So we can exploit this hierarchy to create an unimaginably large number (whose actual value is almost certainly independent of the axioms of set theory): Move around the top and bottom matches on the 0 to give S a subscript of 11. Then we get the 11th-up-in-the-hierarchy maximum shifts function S_{1}_{1} applied to 118: S_{1}_{1}(118).

It’s a little gross-looking, but I think it works! I challenge anybody to try to come up with a better solution. 🙂

I’m in the mood for puzzles, so here’s another one. This one is so good that it deserves its own post.

The setup (from wiki):

The director of a prison offers 100 death row prisoners, who are numbered from 1 to 100, a last chance. A room contains a cupboard with 100 drawers. The director randomly puts one prisoner’s number in each closed drawer. The prisoners enter the room, one after another. Each prisoner may open and look into 50 drawers in any order. The drawers are closed again afterwards.

If, during this search, every prisoner finds his number in one of the drawers, all prisoners are pardoned. If just one prisoner does not find his number, all prisoners die. Before the first prisoner enters the room, the prisoners may discuss strategy—but may not communicate once the first prisoner enters to look in the drawers. What is the prisoners’ best strategy?

Suppose that each prisoner selects 50 at random, and don’t coordinate with one another. Then the chance that any particular prisoner gets their own number is 50%. This means that the chance that all 100 get their own number is 1/2¹⁰⁰.

Let me emphasize how crazily small this is. 1/2¹⁰⁰ is 1/1,267,650,600,228,229,401,496,703,205,376; less than one in a decillion. If there were 100 prisoners trying exactly this setup every millisecond, it would take them 40 billion billion years to get out alive once. This is 3 billion times longer than the age of the universe.

Okay, so that’s a bad strategy. Can we do better?

It’s hard to imagine how… While the prisoners can coordinate beforehand, they cannot share any information. So every time a prisoner comes in for their turn at the drawers, they are in exactly the same state of knowledge as if they hadn’t coordinated with the others.

Given this, how could we possibly increase the survival chance beyond 1/2¹⁰⁰?

(…)

(Try to answer for yourself before continuing)

(…)

Let’s consider a much simpler case. Imagine we have just two prisoners, two drawers, and each one can only open one of them. Now if both prisoners choose randomly, there’s only a 1 in 4 chance that they both survive.

What if they agree to open the same drawer? Then they have reduced their survival chance from 25% to 0%! Why? Because by choosing the same drawer, they either both get the number 1, or they both get the number 2. In either case, they are guaranteed that only one of them gets their own number.

So clearly the prisoners can decrease the survival probability by coordinating beforehand. Can they increase it?

Yes! Suppose that they agree to open different drawers. Then this doubles their survival chance from 25% to 50%. Either they both get their own number, or they both get the wrong number.

The key here is to minimize the overlap between the choices of the prisoners. Unfortunately, this sort of strategy doesn’t scale well. If we have four prisoners, each allowed to open two drawers, then random drawing gives a 1/16 survival chance.

Let’s say they open according to the following scheme: 12, 34, 13, 24 (first prisoner opens drawers 1 and 2, second opens 3 and 4, and so on). Then out of the 24 possible drawer layouts, the only layouts that work are 1432 and 3124:

Scaling this up to 100 prisoners, the odds of survival look pretty measly. Can they do better than this?

(…)

(Try to answer for yourself before continuing)

(…)

It turns out that yes, there is a strategy that does better at ensuring survival. In fact, it does so much better that the survival chance is over 30 percent!

Take a moment to boggle at this. Somehow we can leverage the dependency induced by the prisoners’ coordination to increase the chance of survival by a factor of one decillion, even though none of their states of knowledge are any different. It’s pretty shocking to me that this is possible.

Here’s the strategy: Each time a prisoner opens a drawer, they consult the number in that drawer to determine which drawer they will open next. Thus each prisoner only has to decide on the first drawer to open, and all the rest of the drawers follow from this. Importantly, the prisoner only knows the first drawer they’ll pick; the other 49 are determined by thedistribution of numbers in the drawers.

We can think about each drawer as starting a chain through the other drawers. These chains always cycle back into the starting number, the longest possible cycle being 100 numbers and the shortest being 1. Now, each prisoner can guarantee that they are in a cycle that contains their own number by choosing the drawer corresponding to their own number!

So, the strategy is that Prisoner N starts by choosing Drawer N, looking at the number within, then choosing the drawer labeled with that number. Repeat 50 times per each prisoner.

The wiki page has a good description of how to calculate the survival probability with this strategy:

The prison director’s assignment of prisoner numbers to drawers can mathematically be described as a permutation of the numbers 1 to 100. Such a permutation is a one-to-one mapping of the set of natural numbers from 1 to 100 to itself. A sequence of numbers which after repeated application of the permutation returns to the first number is called a cycle of the permutation. Every permutation can be decomposed into disjoint cycles, that is, cycles which have no common elements.

In the initial problem, the 100 prisoners are successful if the longest cycle of the permutation has a length of at most 50. Their survival probability is therefore equal to the probability that a random permutation of the numbers 1 to 100 contains no cycle of length greater than 50. This probability is determined in the following.

A permutation of the numbers 1 to 100 can contain at most one cycle of length . There are exactly ways to select the numbers of such a cycle. Within this cycle, these numbers can be arranged in ways since there are permutations to represent distinct cycles of length because of cyclic symmetry. The remaining numbers can be arranged in ways. Therefore, the number of permutations of the numbers 1 to 100 with a cycle of length is equal to

The probability, that a (uniformly distributed) random permutation contains no cycle of length greater than 50 is calculated with the formula for single events and the formula for complementary events thus given by

This is about 31.1828%.

This formula easily generalizes to other numbers of prisoners. We can plot the survival chance using this strategy as a function of the number of prisoners:

Amazingly, we can see that this value is asymptoting at a value greater than 30%. The precise limit is 1 – ln2 ≈ 30.685%.

In other words, no matter how many prisoners there are, we can always ensure that the survival probability is greater than 30%! This is pretty remarkable, and I think there are some deep lessons to be drawn from it, but I’m not sure what they are.