Put down three points on a piece of paper. Choose one of them as your “starting point”. Now, randomly choose one of the three points and hop from your starting point, halfway over to the chosen point. Mark down where you’ve landed. Then repeat: randomly choose one of the three starting points, and move halfway from your newly marked point to this new chosen point. Mark where you land. And on, and on, to infinity.
What pattern will arise? Watch and see!
E to increase points/second.
Q to decrease points/second.
Click and drag the red points to move them around.
Pressing a number key will make a polygon with that number of sides.
Here’s a natural follow-up to my last post on the Mandelbrot set – an interactive Julia set explorer!
The Julia set corresponding to a particular point c = x + iy in the complex plane is defined as the set of complex numbers z that stay finite upon arbitrary iterations of the following function: fc(z) = z2 + c. The Mandelbrot set, by comparison, is defined as the set of complex numbers c such that the value obtained by starting with 0 and iterating the function fc arbitrarily many times converges.
What’s remarkable is now beautiful and complex the patterns that arise from this simple equation are. Take a look for yourself: just hover over a point to see its corresponding Julia set!
Resolution is preset at a value good for seeing lots of details and loading at a reasonable speed, but should you want to change it, controls are ‘E’ to increase it and ‘Q’ to decrease it. To reset to default, press ‘SPACE’.
Let me start this off by saying that if you’re reading this blog and haven’t ever checked out the Youtube channel Numberphile, you need to go there right away and start bingeing their videos. That’s what I’ve been doing for the last few days, and it’s given me tons of cool new puzzles to consider.
Naturally, after watching this video I wanted to try this out for myself. Here you see the pattern arising beautifully from the randomness:
I urge you to think hard about why the Sierpinski triangle would arise from something as simple as randomly hopping between midpoints. It’s very non-obvious, and although I have a few ideas, I’m still missing a clear intuition.
I also made some visualizations for other shapes. I’ll show some of them, but encourage you to make predictions about what pattern you’d expect to see before scrolling down to see the actual result.
Instead of three points arranged as above, we will start out with four points arranged in a perfect square. Then, as before, we’ll jump from our starting point halfway to one of these four, and will continue this procedure ad infinitum.
What pattern will arise? Do you think that we’ll have “missing regions” where no points can land, like with the triangle?
Scroll down to see the answer…
Okay! So it looks like the whole square gets filled out, with no missing regions. This was pretty surprising to me; given that three points gave rise to a intricate fractal pattern, why wouldn’t four points do the same? What’s special about “3” .
Well, perhaps things will be different if we tweak the positions of the corners slightly? Will any quadrilateral have the same behavior of filling out all the points, or will the blank regions re-arise? Again, make a prediction!
Okay, now we see that apparently the square was actually a very special case! Pretty much any quadrilateral we can construct will give us a nested infinity of blank regions, as long as at least one angle is not equal to 90º. Again, this is fascinating and puzzling to me. Why do 90º angles invariably cause the whole region to fill out? I’m not sure.
Let’s move on to a pentagon! Do you think that a regular pentagon will behave more like a triangle or a square?
Take a look…
And naturally, the next question is what about a hexagon?
Notice the difference between the hexagon and all the previous ones! Rather than having small areas of points that are never reached, it appears that suddenly we get lines! Again, I encourage you to try to think about why this might be (what’s so special about 6?) and leave a comment if you have any ideas.
Now, I because curious about what other types of patterns we can generate with simple rules like these. I wondered what would happen if instead of simply jumping to the average of the current point and a randomly chosen point, we built a pattern with some “memory”. For instance, what if we didn’t just look at the current point and the randomly chosen point, but also at the last chosen point? We could then take the middle of the triangle formed by these three points as our new point.
It turns out that the patterns that arise from this are even more beautiful than the previous ones! (In my opinion, of course)
Take a look:
I’ll stop here, but this is a great example of how beautiful and surprising math can be. I would have never guessed that such intricate fractal patterns would arise from such simple random rules.
Consider two players: Alice and Bob. Alice moves first. At the start of the game, Alice has two piles of coins in front of her: one pile contains 4 coins and the other pile contains 1 coin. Each player has two moves available: either “take” the larger pile of coins and give the smaller pile to the other player or “push” both piles across the table to the other player. Each time the piles of coins pass across the table, the quantity of coins in each pile doubles. For example, assume that Alice chooses to “push” the piles on her first move, handing the piles of 1 and 4 coins over to Bob, doubling them to 2 and 8. Bob could now use his first move to either “take” the pile of 8 coins and give 2 coins to Alice, or he can “push” the two piles back across the table again to Alice, again increasing the size of the piles to 4 and 16 coins. The game continues for a fixed number of rounds or until a player decides to end the game by pocketing a pile of coins.
(Assume that if the game gets to the final round and the last player decides to “push”, the pot is doubled and they get the smaller pile.)
Assuming that they are self-interested, what do you think is the rational strategy for each of Alice and Bob to adopt? What is the rational strategy if they each know that the other reasons about decision-making in the same way that they themselves do? And what happens if two updateless decision theorists are pitted against each other?
If you have some prior familiarity with game theory, you might have seen the backwards induction proof right away. It turns out that standard game theory teaches us that the Nash equilibrium is to defect as soon as you can, thus never exploiting the “doubling” feature of the setup.
Why? Supposing that you have made it to the final round of the game, you stand to get a larger payout by “defecting” and taking the larger pile rather than the doubled smaller pile. But your opponent knows that you’ll reason this way, so they reason that they are better off defecting the round before… and so on all the way to the first round.
This sucks. The game ends right away, and none of that exponential goodness gets taken advantage of. If only Alice and Bob weren’t so rational!
We can show that this conclusion follows as long as the three things are true of Alice and Bob:
Given a choice between a definite value A and a smaller value B, both Alice and Bob will choose the larger value (A).
Both Alice and Bob can accurately perform deductive reasoning.
Both (1.) and (2.) are common knowledge to Alice and Bob.
It’s pretty hard to deny the reasonableness of any of these three assumptions!
Here’s a related problem:
An airline loses two suitcases belonging to two different travelers. Both suitcases happen to be identical and contain identical antiques. An airline manager tasked to settle the claims of both travelers explains that the airline is liable for a maximum of $100 per suitcase—he is unable to find out directly the price of the antiques.
To determine an honest appraised value of the antiques, the manager separates both travelers so they can’t confer, and asks them to write down the amount of their value at no less than $2 and no larger than $100. He also tells them that if both write down the same number, he will treat that number as the true dollar value of both suitcases and reimburse both travelers that amount. However, if one writes down a smaller number than the other, this smaller number will be taken as the true dollar value, and both travelers will receive that amount along with a bonus/malus: $2 extra will be paid to the traveler who wrote down the lower value and a $2 deduction will be taken from the person who wrote down the higher amount. The challenge is: what strategy should both travelers follow to decide the value they should write down?
Suppose you put no value on honesty, and only care about getting the most money possible. Further, suppose that both travelers reason the same way about decision problems, and that they both know this fact (and that they both know that they both know this fact, and so on).
The first intuition you might have is that both should just write down $100. But if you know that your partner is going to write down $100, then you stand to gain one whole dollar by defecting and writing $99 (thus collecting the $2 bonus for a total of $101). But if they know that you’re going to write $99, then they stand to gain one whole dollar by defecting and writing $98 (thus netting $100). And so on.
In the end both of these unfortunate “rational” individuals end up writing down $2. Once again, we see the tragedy of being a rational individual.
Of course, we could take these thought experiments to be an indication not of the inherent tragedy of rationality, but instead of the need for a better theory of rationality.
For instance, you might have noticed that the arguments we used in both cases relied on a type of reasoning where each agent assumes that they can change their decision, holding fixed the decision of the other agent. This is not a valid move in general, as it assumes independence! It might very well be that the information about what decision you make is relevant to your knowledge about what the other agent’s decision will be. In fact, when we stipulated that you reason similarly to the other agent, we are in essence stipulating an evidential relationship between your decision and theirs! So the arguments we gave above need to be looked at more closely.
If the agents do end up taking into account their similarity, then their behavior is radically different. For example, we can look at the behavior of updateless decision theory: two UDTs playing each other in the Centipede game “push” every single round (including the final one!), thus ending up with exponentially higher rewards (on the order of $2N, where N is the number of rounds). And two UDTs in the Traveller’s Dilemma would write down $100, thus both ending up roughly $98 better off than otherwise. So perhaps we aren’t doomed to a gloomy view of rationality as a burden eternally holding us back!
One final problem.
Two players, this time with just one pile of coins in front of them. Initially this pile contains just 1 coin. The players take turns, and each turn they can either take the whole pile or push it to the other side, in which case the size of the pile will double. This will continue for a fixed number of rounds or until a player ends the game by taking the pile.
On the final round, the last player has a choice of either taking all the coins or pushing them over, thus giving the entire doubled pile to their opponent. Both players are perfectly self-interested, and this fact is common knowledge. And finally, suppose that who goes first is determined by a coin flip.
Standard decision theory obviously says that the first person should just take the 1 coin and the game ends there. What would UDT do here? What do you think is the rational policy for each player?
To be totally clear: the question is not assuming that there is ONLY one student whose neighbors both flipped heads, just that there is AT LEAST one such student. You can imagine that the teacher first asks for all students whose neighbors both flipped heads to step forward, then randomly selected one of the students that had stepped forward.
Now, take a minute to think about this before reading on…
It seemed initially obvious to me that the teacher was correct. There are exactly as many possible worlds in which the three students are HTH as there worlds in which they are HHH, right? Knowing how your neighbors’ coins landed shouldn’t give you any information about how your own coin landed, and to think otherwise seems akin to the Gambler’s fallacy.
But in fact, the teacher is wrong! It is in fact more likely that the student flipped tails than heads! Why? Let’s simplify the problem.
Suppose there are just three students standing in a circle (/triangle). There are eight possible ways that their coins might have landed, namely:
Now, the teacher asks all those students whose neighbors both have “H” to step forward, and AT LEAST ONE steps forward. What does this tell us about the possible world we’re in? Well, it rules out all of the worlds in which no student could be surrounded by both ‘H’, namely… TTT, TTH, THT, and HTT. We’re left with the following…
One thing to notice is that we’re left with mostly worlds with lots of heads. The expected total of heads is 2.25, while the expected total of tails is just 0.75. So maybe we should expect that the student is actually more likely to have heads than tails!
But this is wrong. What we want to see is what proportion of those surrounded by heads are heads in each possible world.
HHH: 3/3 have H (100%)
HHT: 0/1 have H (0%)
HTH: 0/1 have H (0%)
THH: 0/1 have H (0%)
Since each of these worlds is equally likely, what we end up with is a 25% chance of 100% heads, and a 75% chance of 0% heads. In other words, our credence in the student having heads should be just 25%!
Now, what about for N students? I wrote a program that does a brute-force calculation of the final answer for any N, and here’s what you get:
These numbers are not very pretty, though they appear to be gradually converging (I’d guess to 50%).
Can anybody see any patterns here? Or some simple intuitive way to arrive at these numbers?
Read this post and give it a try yourself before reading on! Spoilers ahead.
I’ve written up the way that I solved the puzzle, step by step:
1. A looks at B and C and says “I don’t know what my number is.”
We assess what information this gives us by considering in which scenario A would have known what their number was, and ruling it out.
Well, for starters, if A saw two different numbers, then A would consider there to be two logically possible worlds, one in which they are the sum of the two numbers, and another in which they are one of the two that are added together. But if A saw the SAME two numbers, they A would know that they must be the sum of those two (since zero is not a possible value for themselves). What this means is that the fact that A doesn’t know their number tells us with certainty that B ≠ C! Furthermore, since B and C will go through this same line of reasoning themselves, they will also know that B ≠ C. And since all of them know that B and C will go through the same line of reasoning, it becomes common knowledge that B ≠ C.
Good! So after A’s statement, we have added one piece of common knowledge, namely that B ≠ C.
2. B thinks a moment and then says “Me neither.”
Ok, one thing this tells us is that A ≠ C, for the exact same reason as before.
But it also tells us more than this, because B knows that B ≠ C and still doesn’t know. So we just need to think of the scenario in which knowing that B ≠ C (and knowing the values of A and C, of course) would tell B the value that they have. Try to figure it out for yourselves!
The answer is, the scenario is that A = 2C! Imagine that B sees that A is 10 and C is 5. This tells them that they are either 5 or 15. But they know they can’t be 5, because C is five and B ≠ C. So they must be 15! In other words, since they would know their value if A equaled 2C and they don’t know their value, this tells us that A ≠ 2C!
So now we have two more pieces of common knowledge: A ≠ C, and A ≠ 2C. Putting this together with what we knew before, we have a total of three pieces of information:
B ≠ C
A ≠ C
A ≠ 2C
3. C thinks a moment and then says “Me neither.”
By exact analogy with the previous arguments, this tells us that A ≠ B, as well as that A ≠ 2B and B ≠ 2A. (We saw previously that B could conclude from B ≠ C that A ≠ 2C. By the same arguments, C can conclude from C ≠ A that B ≠ 2A. And from C ≠ B, C can conclude that A ≠ 2B.)
There’s one more piece of information that we haven’t taken into account, which is that A ≠ 2C. In which situation does the fact that A ≠ 2C tell C their value? Well, if A = 10 and B = 15, then C is either 5 or 20. But C can’t be 5, because then A would be twice C. So C could conclude that they are 20. Since C doesn’t conclude this, we know that 3A ≠ 2B.
Putting it all together, we know the following:
B ≠ C
A ≠ C
A ≠ 2C
A ≠ 2B
B ≠ 2A
3A ≠ 2B
4. A thinks, and says: “Now I know! My number is 25.”
The question we need to ask ourselves is what the values of B and C must be in order that the above six conditions to allow A to figure out their own value. Pause to think about this before reading on…
Let’s work through how A processes each piece of information:
A could figure out their own value by seeing B = C. But we already know that this isn’t the case.
Since A knows that A ≠ C, A could figure out their value by seeing B = 2C. So that’s a possibility… Except that if B = 2C, then A = B + 2C = 3C. But 25 is not divisible by 3. So this can’t be what they saw.
Since A knows that A ≠ B, A could figure out their value by seeing C = 2B. But again, this doesn’t work, since it would imply that A was divisible by 3, which it is not.
Since A knows that A ≠ 2C, A could figure out their value by seeing B = 3C (e.g. B = 15, C = 5). They would rule out themselves being one component of the sum, and conclude that they are 4C. But 25 is not divisible by 4. So this is not the case.
Since A knows that A ≠ 2B, A could figure out their value by seeing C = 3B (e.g. B = 5, C = 15). By the same reasoning as before, this cannot be the case.
Since B ≠ 2A, A could figure out their value by seeing 3B = 2C (e.g. B = 10, C = 15). They would know that they cannot be just one component of the sum, so they would conclude that they must be B + C, or 2.5 B. Now, is there an integer B such that 25 = 2.5 B? You betcha! B = 10, and C = 15!
We can stop here, since we’ve found a logically consistent world in which A figures out that their own value is 25. Since there can only be one such world (as the problem statement implies that this information is enough to solve the puzzle), we know that this must be what they saw. So we’ve found the answer! (A,B,C) = (25,10,15). But if you’re curious, I’d suggest you go through the rest of the cases and show that no other values of B and C would be consistent with them knowing that their own value is 25.
One thing that’s interesting here is the big role that the number 25 played in this. The fact that 25 was not divisible by 3 but was divisible by 2.5, was crucial. For the same puzzle but a different value that 25, we would have come to a totally different answer!
My challenge to anybody that’s made it this far: Consider the set of all integers that A could have truthfully declared that they knew themselves to be. For some such integers, it won’t be the case that A’s declaration is sufficient for us to conclude what B and C are. Which integers are these?
Common knowledge puzzles are my favorite. Here’s one I just came across. I challenge you to try to figure it out in less than 5 minutes. 🙂
Three perfect logicians with positive (non-zero) integers taped to their foreheads, A, B, and C, sit in a triangle. Each doesn’t know their own number but can see the numbers for the other two. It is common knowledge amongst all three that one of the numbers is the sum of the other two, but it is not known which is which.
A looks at B and C, and says “I don’t know what my number is.”
B thinks a moment and then says “Me neither.”
C thinks a moment and then says “Me neither.”
A thinks, then says “Now I know! My number is 25.”
This puzzle has been popping up on my Facebook feed recently. Try to see how high you can get before reading on!
Now, would you be surprised if I told you that with a little interpretive freedom and creativity, you can get numbers larger than the number of atoms in the observable universe? How about if I told you that you can get to numbers large enough that they break set theory? Let me demonstrate for you.
First, we’ll start with the boring solutions.
You can move the bottom match in the 5 up to make it a 9. Then you can rotate the bottom left vertical match in the 0 to make it a nine. This gives 998.
Can we do better? Sure! Take the top and bottom matches in the central zero and move them to squeeze in another one, giving you 51,118.
So far we’ve been working purely in base 10. Let’s try to do better by moving to a more exotic number system.
Hexadecimal is a base-16 number system, the digits of which are written as 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, and F. For instance, we can construct F88:
F88 is only 3976, which is not better than our best so far (51,118). If we’re allowed to interpret a square of matches as a D instead of a zero, we can do slightly better (though to do so we have to be allowed to throw out a toothpick, or place it directly on top of another one):
FDB is 4059, which is still not better than our current best. To get better, we need to get more clever.
Our first strategy is just to shift two matches in the first digit down so as to make it into a 9:
We can interpret this as 98 = 43,046,721. This is our best so far! But we can do better by applying Knuth’s up-arrow notation.
This is 518, which is almost 3.8 trillion, 100 thousand times better than 98! But we’re not done yet! If we use a caret (^) to represent exponentiation, we can get even higher!
5118 is 5.4 nonillion, a number with 31 decimal digits long. We could try to do better than this by squeezing in a caret between the 5 and the 1, making 5118 (a number with 83 decimal digits) but this becomes pretty messy and gross.
Alright, so we got up to 83 digit long numbers. Can we get any better? Yep!
Tetration is the level above exponentiation. The nth tetration of a is defined as follows:
Just as multiplication is repeated addition and exponentiation is repeated multiplication, tetration is repeated exponentiation! Now we get into numbers whose size is truly impossible to grasp. Let’s shift the top and bottom matches on the middle 0:
We can write this equivalently as:
How big is this number? There’s really no meaningful way to describe it. We’ve gone far beyond any quantities that can be made physical sense of, like the number of cubic nanometers in the universe, which is a measly 10107. But we’re not done yet!
The busy beaver numbers are a sequence of numbers that arise from the properties of Turing machines. Here’s a brief description: The nth busy beaver number B(N) is the greatest number of ones that a finitely-running N state Turing machine can print on a tape which starts all zero.
Did you think tetration is big? Well, busy beaver numbers are unimaginably larger. In fact, the busy beaver sequence grows larger than any computable function! There’s a neat proof of this that involves the uncomputability of the halting problem. To skim over it, it can be shown that were we to have an upper bound on the busy beaver sequence, then we could find a deterministic algorithm for solving the halting problem in a finite amount of time, which we know is impossible. And if any computable function F(N) grew faster than B(N), then we could find an upper bound on the busy beaver sequence. Thus, it must be the case that no computable function grows as fast as B(N)!
We can exploit the absurd growth rate of the busy beaver sequence if we are allowed the interpretative freedom of assuming parentheses, so that Bn = B(n).
Let’s think for a minute about how large B(118) must be. So far, the only values of n for which B(n) is known are B(1), B(2), B(3), and B(4). After this the values grow out of control. A lower bound on B(6) is . For B(7), we have as a lower bound . B(8) almost certainly beats our current record. And B(118) is unthinkably large.
We can get even higher than the busy beaver numbers with the maximum shifts function S(N), defined as the number of steps that the longest-finitely-running N state Turing machine takes before halting. This function is guaranteed to be larger than B(N) for all N. Using S(N), and taking the same liberties as above with respect to parentheses, we can get an insanely high value:
This is S(1118), and while it’s undoubtedly larger than B(118), there’s no way to get any intuitive grasp on how much larger. But wait, there’s more! We can get even larger by recursively nesting a busy beaver function within a maximum shifts function:
We interpret this as S(B(9)). Why is this larger than S(1118)? Well, B(9) is some enormous number, certainly larger than 1118, so S(B(9)) is certainly greater than S(1118).
Now, are we finally done? Have we reached the peak yet? No! It’s time for the largest solution of them all.
The reason that the Busy Beaver numbers and Maximum Shift function are so big is because of the uncomputability of the halting problem. But if we consider Turing machines that have an oracle for the halting problem (call these meta Turing machines), we get a new meta-halting problem: when do these meta Turing machines halt? From the meta-halting problem comes an associated new sequence of Busy Beaver numbers, which grows uncomputable faster than the original Busy Beaver sequence. Then we can equip Turing machines with an oracle for the meta-halting problem, generating a meta-meta-Busy Beaver sequence.
Thus we get a hierarchy of Busy Beaver functions, which, following the notation used by Scott Aaronson here, can be described with Bn(x). Each Bn grows uncomputably faster than the previous Bn-1. There’s a similar hierarchy for the maximum shifts function, and each S_n is going to be an upper bound on each Sn-1.
So we can exploit this hierarchy to create an unimaginably large number (whose actual value is almost certainly independent of the axioms of set theory): Move around the top and bottom matches on the 0 to give S a subscript of 11. Then we get the 11th-up-in-the-hierarchy maximum shifts function S11 applied to 118: S11(118).
It’s a little gross-looking, but I think it works! I challenge anybody to try to come up with a better solution. 🙂