A probability puzzle


To be totally clear: the question is not assuming that there is ONLY one student whose neighbors both flipped heads, just that there is AT LEAST one such student. You can imagine that the teacher first asks for all students whose neighbors both flipped heads to step forward, then randomly selected one of the students that had stepped forward.

Now, take a minute to think about this before reading on…

It seemed initially obvious to me that the teacher was correct. There are exactly as many possible worlds in which the three students are HTH as there worlds in which they are HHH, right? Knowing how your neighbors’ coins landed shouldn’t give you any information about how your own coin landed, and to think otherwise seems akin to the Gambler’s fallacy.

But in fact, the teacher is wrong! It is in fact more likely that the student flipped tails than heads! Why? Let’s simplify the problem.

Suppose there are just three students standing in a circle (/triangle). There are eight possible ways that their coins might have landed, namely:


Now, the teacher asks all those students whose neighbors both have “H” to step forward, and AT LEAST ONE steps forward. What does this tell us about the possible world we’re in? Well, it rules out all of the worlds in which no student could be surrounded by both ‘H’, namely… TTT, TTH, THT, and HTT. We’re left with the following…


One thing to notice is that we’re left with mostly worlds with lots of heads. The expected total of heads is 2.25, while the expected total of tails is just 0.75. So maybe we should expect that the student is actually more likely to have heads than tails!

But this is wrong. What we want to see is what proportion of those surrounded by heads are heads in each possible world.

HHH: 3/3 have H (100%)
HHT: 0/1 have H (0%)
HTH: 0/1 have H (0%)
THH: 0/1 have H (0%)

Since each of these worlds is equally likely, what we end up with is a 25% chance of 100% heads, and a 75% chance of 0% heads. In other words, our credence in the student having heads should be just 25%!

Now, what about for N students? I wrote a program that does a brute-force calculation of the final answer for any N, and here’s what you get:

















































These numbers are not very pretty, though they appear to be gradually converging (I’d guess to 50%).

Can anybody see any patterns here? Or some simple intuitive way to arrive at these numbers?



Solving the common knowledge puzzle

Read this post and give it a try yourself before reading on! Spoilers ahead.

I’ve written up the way that I solved the puzzle, step by step:

1. A looks at B and C and says “I don’t know what my number is.”

We assess what information this gives us by considering in which scenario A would have known what their number was, and ruling it out.

Well, for starters, if A saw two different numbers, then A would consider there to be two logically possible worlds, one in which they are the sum of the two numbers, and another in which they are one of the two that are added together. But if A saw the SAME two numbers, they A would know that they must be the sum of those two (since zero is not a possible value for themselves). What this means is that the fact that A doesn’t know their number tells us with certainty that B ≠ C! Furthermore, since B and C will go through this same line of reasoning themselves, they will also know that B ≠ C. And since all of them know that B and C will go through the same line of reasoning, it becomes common knowledge that B ≠ C.

Good! So after A’s statement, we have added one piece of common knowledge, namely that B ≠ C.

2. B thinks a moment and then says “Me neither.”

Ok, one thing this tells us is that A ≠ C, for the exact same reason as before.

But it also tells us more than this, because B knows that B ≠ C and still doesn’t know. So we just need to think of the scenario in which knowing that B ≠ C (and knowing the values of A and C, of course) would tell B the value that they have. Try to figure it out for yourselves!

The answer is, the scenario is that A = 2C! Imagine that B sees that A is 10 and C is 5. This tells them that they are either 5 or 15. But they know they can’t be 5, because C is five and B ≠ C. So they must be 15! In other words, since they would know their value if A equaled 2C and they don’t know their value, this tells us that A ≠ 2C!

So now we have two more pieces of common knowledge: A ≠ C, and A ≠ 2C. Putting this together with what we knew before, we have a total of three pieces of information:

B ≠ C
A ≠ C
A ≠ 2C

3. C thinks a moment and then says “Me neither.”

By exact analogy with the previous arguments, this tells us that A ≠ B, as well as that A ≠ 2B and B ≠ 2A. (We saw previously that B could conclude from B ≠ C that A ≠ 2C. By the same arguments, C can conclude from C ≠ A that B ≠ 2A. And from C ≠ B, C can conclude that A ≠ 2B.)

There’s one more piece of information that we haven’t taken into account, which is that A ≠ 2C. In which situation does the fact that A ≠ 2C tell C their value? Well, if A = 10 and B = 15, then C is either 5 or 20. But C can’t be 5, because then A would be twice C. So C could conclude that they are 20. Since C doesn’t conclude this, we know that 3A ≠ 2B.

Putting it all together, we know the following:

B ≠ C
A ≠ C
A ≠ 2C
A ≠ 2B
B ≠ 2A
3A ≠ 2B

4. A thinks, and says: “Now I know! My number is 25.”

The question we need to ask ourselves is what the values of B and C must be in order that the above six conditions to allow A to figure out their own value. Pause to think about this before reading on…




Let’s work through how A processes each piece of information:

A could figure out their own value by seeing B = C. But we already know that this isn’t the case.

Since A knows that A ≠ C, A could figure out their value by seeing B = 2C. So that’s a possibility… Except that if B = 2C, then A = B + 2C = 3C. But 25 is not divisible by 3. So this can’t be what they saw.

Since A knows that A ≠ B, A could figure out their value by seeing C = 2B. But again, this doesn’t work, since it would imply that A was divisible by 3, which it is not.

Since A knows that A ≠ 2C, A could figure out their value by seeing B = 3C (e.g. B = 15, C = 5). They would rule out themselves being one component of the sum, and conclude that they are 4C. But 25 is not divisible by 4. So this is not the case.

Since A knows that A ≠ 2B, A could figure out their value by seeing C = 3B (e.g. B = 5, C = 15). By the same reasoning as before, this cannot be the case.

Since B ≠ 2A, A could figure out their value by seeing 3B = 2C (e.g. B = 10, C = 15). They would know that they cannot be just one component of the sum, so they would conclude that they must be B + C, or 2.5 B. Now, is there an integer B such that 25 = 2.5 B? You betcha! B = 10, and C = 15!

We can stop here, since we’ve found a logically consistent world in which A figures out that their own value is 25. Since there can only be one such world (as the problem statement implies that this information is enough to solve the puzzle), we know that this must be what they saw. So we’ve found the answer! (A,B,C) = (25,10,15). But if you’re curious, I’d suggest you go through the rest of the cases and show that no other values of B and C would be consistent with them knowing that their own value is 25.

One thing that’s interesting here is the big role that the number 25 played in this. The fact that 25 was not divisible by 3 but was divisible by 2.5, was crucial. For the same puzzle but a different value that 25, we would have come to a totally different answer!

My challenge to anybody that’s made it this far: Consider the set of all integers that A could have truthfully declared that they knew themselves to be. For some such integers, it won’t be the case that A’s declaration is sufficient for us to conclude what B and C are. Which integers are these?

A common knowledge puzzle

Common knowledge puzzles are my favorite. Here’s one I just came across. I challenge you to try to figure it out in less than 5 minutes. 🙂

Three perfect logicians with positive (non-zero) integers taped to their foreheads, A, B, and C, sit in a triangle. Each doesn’t know their own number but can see the numbers for the other two. It is common knowledge amongst all three that one of the numbers is the sum of the other two, but it is not known which is which.

A looks at B and C, and says “I don’t know what my number is.”

B thinks a moment and then says “Me neither.”

C thinks a moment and then says “Me neither.”

A thinks, then says “Now I know! My number is 25.”

What are the numbers on B and C’s foreheads?

Kant’s attempt to save metaphysics and causality from Hume


  • Hume sort of wrecked metaphysics. This inspired Kant to try and save it.
  • Hume thought that terms were only meaningful insofar as they were derived from experience.
  • We never actually experience necessary connections between events, we just see correlations. So Hume thought that the idea of causality as necessary connection is empty and confused, and that all our idea of causality really amounts to is correlation.
  • Kant didn’t like this. He wanted to PROTECT causality. But how??
  • Kant said that metaphysical knowledge was both a priori and substantive, and justified this by describing these things called pure intuitions and pure concepts.
  • Intuitions are representations of things (like sense perceptions). Pure intuitions are the necessary preconditions for us to represent things at all.
  • Concepts are classifications of representations (like “red”). Pure concepts are the necessary preconditions underlying all classifications of representations.
  • There are two pure intuitions (space and time) and twelve pure concepts (one of which is causality).
  • We get substantive a priori knowledge by referring to pure intuitions (mathematics) or pure concepts (laws of nature, metaphysics).
  • Yay! We saved metaphysics!


(Okay, now on to the actual essay. This was not originally written for this blog, which is why it’s significantly more formal than my usual fare.)




David Hume’s Enquiry Into Human Understanding stands out as a profound and original challenge to the validity of metaphysical knowledge. Part of the historical legacy of this work is its effect on Kant, who describes Hume as being responsible for [interrupting] my dogmatic slumber and [giving] my investigations in the field of speculative philosophy a completely different direction.” Despite the great inspiration that Kant took from Hume’s writing, their thinking on many matters is diametrically opposed. A prime example of this is their views on causality.

Hume’s take on causation is famously unintuitive. He gives a deflationary account of the concept, arguing that the traditional conception lacks a solid epistemic foundation and must be replaced by mere correlation. To understand this conclusion, we need to back up and consider the goal and methodology of the Enquiry.

He starts with an appeal to the importance of careful and accurate reasoning in all areas of human life, and especially in philosophy. In a beautiful bit of prose, he warns against the danger of being overwhelmed by popular superstition and religious prejudice when casting one’s mind towards the especially difficult and abstruse questions of metaphysics.

But this obscurity in the profound and abstract philosophy is objected to, not only as painful and fatiguing, but as the inevitable source of uncertainty and error. Here indeed lies the most just and most plausible objection against a considerable part of metaphysics, that they are not properly a science, but arise either from the fruitless efforts of human vanity, which would penetrate into subjects utterly inaccessible to the understanding, or from the craft of popular superstitions, which, being unable to defend themselves on fair ground, raise these entangling brambles to cover and protect their weakness. Chased from the open country, these robbers fly into the forest, and lie in wait to break in upon every unguarded avenue of the mind, and overwhelm it with religious fears and prejudices. The stoutest antagonist, if he remit his watch a moment, is oppressed. And many, through cowardice and folly, open the gates to the enemies, and willingly receive them with reverence and submission, as their legal sovereigns.

In less poetic terms, Hume’s worry about metaphysics is that its difficulty and abstruseness makes its practitioners vulnerable to flawed reasoning. Even worse, the difficulty serves to make the subject all the more tempting for “each adventurous genius[, who] will still leap at the arduous prize and find himself stimulated, rather than discouraged by the failures of his predecessors, while he hopes that the glory of achieving so hard an adventure is reserved for him alone.”

Thus, says Hume, the only solution is “to free learning at once from these abstruse questions [by inquiring] seriously into the nature of human understanding and [showing], from an exact analysis of its powers and capacity, that it is by no means fitted for such remote and abstruse questions.”

Here we get the first major divergence between Kant and Hume. Kant doesn’t share Hume’s eagerness to banish metaphysics. His Prolegomena To Any Future Metaphysics and Critique of Pure Reason are attempts to find it a safe haven from Hume’s attacks. However, while Kant might not be similarly constituted to Hume in this way, he does take Hume’s methodology very seriously. He states in the preface to the Prolegomena that “since the origin of metaphysics as far as history reaches, nothing has ever happened which could have been more decisive to its fate than the attack made upon it by David Hume.” Many of the principles which Hume derives, Kant agrees with wholeheartedly, making the task of shielding metaphysics even harder for him.

With that understanding of Hume’s methodology in mind, let’s look at how he argues for his view of causality. We’ll start with a distinction that is central to Hume’s philosophy: that between ideas and impressions. The difference between the memory of a sensation and the sensation itself is a good example. While the memory may mimic or copy the sensation, it can never reach its full force and vivacity. In general, Hume suggests that our experiences fall into two distinct categories, separated by a qualitative gap in liveliness. The more lively category he calls impressions, which includes sensory perceptions like the smell of a rose or the taste of wine, as well as internal experiences like the feeling of love or anger. The less lively category he refers to as thoughts or ideas. These include memories of impressions as well as imagined scenes, concepts, and abstract thoughts. 

With this distinction in hand, Hume proposes his first limit on the human mind. He claims that no matter how creative or original you are, all of your thoughts are the product of “compounding, transposing, augmenting, or diminishing the materials afforded us by the senses and experiences.” This is the copy principle: all ideas are copies of impressions, or compositions of simpler ideas that are in turn copies of impressions.

Hume turns this observation of the nature of our mind into a powerful criterion of meaning. “When we entertain any suspicion that a philosophical term is employed without any meaning or idea (as is but too frequent), we need but enquire, From what impression is that supposed idea derived? And if it be impossible to assign any, this will serve to confirm our suspicion.

This criterion turns out to be just the tool Hume needs in order to establish his conclusion. He examines the traditional conception of causation as a necessary connection between events, searches for the impressions that might correspond to this idea, and, failing to find anything satisfactory, declares that “we have no idea of connection or power at all and that these words are absolutely without any meaning when employed in either philosophical reasonings or common life.” His primary argument here is that all of our observations are of mere correlation, and that we can never actually observe a necessary connection.

Interestingly, at this point he refrains from recommending that we throw out the term causation. Instead he proposes a redefinition of the term, suggesting a more subtle interpretation of his criterion of meaning. Rather than eliminating the concept altogether upon discovering it to have no satisfactory basis in experience, he reconceives it in terms of the impressions from which it is actually formed. In particular, he argues that our idea of causation is really based on “the connection which we feel in the mind, this customary transition of the imagination from one object to its usual attendant.”

Here Hume is saying that humans have a rationally unjustifiable habit of thought where, when we repeatedly observe one type of event followed by another, we begin to call the first a cause and the second its effect, and we expect that future instances of the cause will be followed by future instances of the effect. Causation, then, is just this constant conjunction between events, and our mind’s habit of projecting the conjunction into the future. We can summarize all of this in a few lines:

Hume’s denial of the traditional concept of causation

  1. Ideas are always either copies of impressions or composites of simpler ideas that are copies of impressions.
  2. The traditional conception of causation is neither of these.
  3. So we have no idea of the traditional conception of causation.

Hume’s reconceptualization of causation

  1. An idea is the idea of the impression that it is a copy of.
  2. The idea of causation is copied from the impression of constant conjunction.
  3. So the idea of causation is just the idea of constant conjunction.

There we have Hume’s line of reasoning, which provoked Kant to examine the foundations of metaphysics anew. Kant wanted to resist Hume’s dismissal of the traditional conception of causation, while accepting that our sense perceptions reveal no necessary connections to us. Thus his strategy was to deny the Copy Principle and give an account of how we can have substantive knowledge that is not ultimately traceable to impressions. He does this by introducing the analytic/synthetic distinction and the notion of a priori synthetic knowledge.

Kant’s original definition of analytic judgments is that they “express nothing in the predicate but what has already been actually thought in the concept of the subject.” This suggests that the truth value of an analytic judgment is determined by purely the meanings of the concepts in use. A standard example of this is “All bachelors are unmarried.” The truth of this statement follows immediately just by understanding what it means, as the concept of bachelor already contains the predicate unmarried.  Synthetic judgments, on the other hand, are not fixed in truth value by merely the meanings of the concepts in use. These judgments amplify our knowledge and bring us to genuinely new conclusions about our concepts. An example: “The President is ornery.” This certainly doesn’t follow by definition; you’d have to go out and watch the news to realize its truth.

We can now put the challenge to metaphysics slightly differently. Metaphysics purports to be discovering truths that are both necessary (and therefore a priori) as well as substantive (adding to our concepts and thus synthetic). But this category of synthetic a priori judgments seems a bit mysterious. Evidently, the truth values of such judgments can be determined without referring to experience, but can’t be determined by merely the meanings of the relevant concepts. So apparently something further is required besides the meanings of concepts in order to make a synthetic a priori judgment. What is this thing?

Kant’s answer is that the further requirement is pure intuition and pure concepts. These terms need explanation.

Pure Intuitions

For Kant, an intuition is a direct, immediate representation of an object. An obvious example of this is sense perception; looking at a cup gives you a direct and immediate representation of an object, namely, the cup. But pure intuitions must be independent of experience, or else judgments based on them would not be a priori. In other words, the only type of intuition that could possibly be a priori is one that is present in all possible perceptions, so that its existence is not contingent upon what perceptions are being had. Kant claims that this is only possible if pure intuitions represent the necessary preconditions for the possibility of perception.

What are these necessary preconditions? Kant famously claimed that the only two are space and time. This implies that all of our perceptions have spatiotemporal features, and indeed that perception is only possible in virtue of the existence of space and time. It also implies, according to Kant, that space and time don’t exist outside of our minds!  Consider that pure intuitions exist equally in all possible perceptions and thus are independent of the actual properties of external objects. This independence suggests that rather than being objective features of the external world, space and time are structural features of our minds that frame all our experiences.

This is why Kant’s philosophy is a species of idealism. Space and time get turned into features of the mind, and correspondingly appearances in space and time become internal as well. Kant forcefully argues that this view does not make space and time into illusions, saying that without his doctrine “it would be absolutely impossible to determine whether the intuitions of space and time, which we borrow from no experience, but which still lie in our representation a priori, are not mere phantasms of our brain.”

The pure intuitions of space and time play an important role in Kant’s philosophy of mathematics: they serve to justify the synthetic a priori status of geometry and arithmetic. When we judge that the sum of the interior angles of a triangle is 180º, for example, we do so not purely by examining the concepts triangle, sum, and angle. We also need to consult the pure intuition of space! And similarly, our affirmations of arithmetic truths rely upon the pure intuition of time for their validity.

Pure Concepts

Pure intuition is only one part of the story. We don’t just perceive the world, we also think about our perceptions. In Kant’s words, “Thoughts without content are empty; intuitions without concepts are blind. […] The understanding cannot intuit anything, and the senses cannot think anything. Only from their union can cognition arise.” As pure intuitions are to perceptions, pure concepts are to thought. Pure concepts are necessary for our empirical judgments, and without them we could not make sense of perception. It is this category in which causality falls.

Kant’s argument for this is that causality is a necessary condition for the judgment that events occur in a temporal order. He starts by observing that we don’t directly perceive time. For instance, we never have a perception of one event being before another, we just perceive one and, separately, the other. So to conclude that the first preceded the second requires something beyond perception, that is, a concept connecting them.

He argues that this connection must be necessary: “For this objective relation to be cognized as determinate, the relation between the two states must be thought as being such that it determines as necessary which of the states must be placed before and which after.” And as we’ve seen, the only way to get a necessary connection between perceptions is through a pure concept. The required pure concept is the relation of cause and effect: “the cause is what determines the effect in time, and determines it as the consequence.” So starting from the observation that we judge events to occur in a temporal order, Kant concludes that we must have a pure concept of cause and effect.

What about particular causal rules, like that striking a match produces a flame? Such rules are not derived solely from experience, but also from the pure concept of causality, on which their existence depends. It is the presence of the pure concept that allows the inference of these particular rules from experience, even though they postulate a necessary connection.

Now we can see how different Kant and Hume’s conceptions of causality are. While Hume thought that the traditional concept of causality as a necessary connection was unrescuable and extraneous to our perceptions, Kant sees it as a bedrock principle of experience that is necessary for us to be able to make sense of our perceptions at all. Kant rejects Hume’s definition of cause in terms of constant conjunction on the grounds that it “cannot be reconciled with the scientific a priori cognitions that we actually have.”

Despite this great gulf between the two philosophers’ conceptions of causality, there are some similarities. As we saw above, Kant agrees wholeheartedly with Hume that perception alone is insufficient for concluding that there is a necessary connection between events. He also agrees that a purely analytic approach is insufficient. Since Kant sees pure intuitions and pure concepts as features of the mind, not the external world, both philosophers deny that causation is an objective relationship between things in themselves (as opposed to perceptions of things). Of course, Kant would deny that this makes causality an illusion, just as he denied that space and time are made illusory by his philosophy.

Of course, it’s impossible to know to what extent the two philosophers would have actually agreed, had Hume been able to read Kant’s responses to his works. Would he have been convinced that synthetic a priori judgments really exist? If so, would he accept Kant’s pure intuitions and pure concepts? I suspect that at the crux of their disagreement would be Kant’s claim that math is synthetic a priori. While Hume never explicitly proclaims math’s analyticity (he didn’t have the term, after all), it seems more in line with his views on algebra and arithmetic as purely concerning the way that ideas relate to one another. It is also more in line with the axiomatic approach to mathematics familiar to Hume, in which one defines a set of axioms from which all truths about the mathematical concepts involved necessarily follow.

If Hume did maintain math’s analyticity, then Kant’s arguments about the importance of synthetic a priori knowledge would probably hold much less sway for him, and would largely amount to an appeal to the validity of metaphysical knowledge, which Hume already doubted. Hume also would likely want to resist Kant’s idealism; in Section XII of the Enquiry he mocks philosophers that doubt the connection between the objects of our senses and external objects, saying that if you “deprive matter of all its intelligible qualities, both primary and secondary, you in a manner annihilate it and leave only a certain unknown, inexplicable something as the cause of our perceptions – a notion so imperfect that no skeptic will think it worthwhile to contend against it.”

Fractals and Epicycles

There is no bilaterally-symmetrical, nor eccentrically-periodic curve used in any branch of astrophysics or observational astronomy which could not be smoothly plotted as the resultant motion of a point turning within a constellation of epicycles, finite in number, revolving around a fixed deferent.

Norwood Russell Hanson, “The Mathematical Power of Epicyclical Astronomy”


A friend recently showed me this image…


…and thus I was drawn into the world of epicycles and fractals.

Epicycles were first used by the Greeks to reconcile observational data of the motions of the planets with the theory that all bodies orbit the Earth in perfect circles. It was found that epicycles allowed astronomers to retain their belief in perfectly circular orbits, as well as the centrality of Earth. The cost of this, however, was a system with many adjustable parameters (as many parameters as there were epicycles).

There’s a somewhat common trope about adding on endless epicycles to a theory, the idea being that by being overly flexible and accommodating of data you lose epistemic credibility. This happens to fit perfectly with my most recent posts on model selection and overfitting! The epicycle view of the solar system is one that is able to explain virtually any observational data. (There’s a pretty cool reason for this that has to do with the properties of Fourier series, but I won’t go into it.) The cost of this is a massive model with many parameters. The heliocentric model of the solar system, coupled with the Newtonian theory of gravity, turns out to be able to match all the same data with far fewer adjustable parameters. So by all of the model selection criteria we went over, it makes sense to switch over from one to the other.

Of course, it is not the case that we should have been able to tell a priori that an epicycle model of the planets’ motions was a bad idea. “Every planet orbits Earth on at most one epicycle”, for instance, is a perfectly reasonable scientific hypothesis… it just so happened that it didn’t fit the data. And adding epicycles to improve the fit to data is also not bad scientific practice, so long as you aren’t ignoring other equally good models with fewer parameters.)

Okay, enough blabbing. On to the pretty pictures! I was fascinated by the Hilbert curve drawn above, so I decided to write up a program of my own that generates custom fractal images from epicycles. Here are some gifs I created for your enjoyment:

Negative doubling of angular velocity

(Each arm rotates in the opposite direction of the previous arm, and at twice its angular velocity. The length of each arm is half that of the previous.)


Trebling of angular velocity


Negative trebling


Here’s a still frame of the final product for N = 20 epicycles:

Screen Shot 2018-11-27 at 7.23.55 AM



ωn ~ (n+1) 2n

(or, the Fractal Frog)


ωn ~ n, rn ~ 1/n

radius ~ 1:n, frequency ~ n.gif

ωn ~ n, constant rn


ωn ~ 2n, rn ~ 1/n2


And here’s a still frame of N = 20:

high res pincers

(All animations were built using Processing.py, which I highly recommend for quick and easy construction of visualizations.)

The Match Problem


This puzzle has been popping up on my Facebook feed recently. Try to see how high you can get before reading on!






Now, would you be surprised if I told you that with a little interpretive freedom and creativity, you can get numbers larger than the number of atoms in the observable universe? How about if I told you that you can get to numbers large enough that they break set theory? Let me demonstrate for you.

First, we’ll start with the boring solutions.

You can move the bottom match in the 5 up to make it a 9. Then you can rotate the bottom left vertical match in the 0 to make it a nine. This gives 998.


Can we do better? Sure! Take the top and bottom matches in the central zero and move them to squeeze in another one, giving you 51,118.


So far we’ve been working purely in base 10. Let’s try to do better by moving to a more exotic number system.


Hexadecimal is a base-16 number system, the digits of which are written as 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, and F. For instance, we can construct F88:


F88 is only 3976, which is not better than our best so far (51,118). If we’re allowed to interpret a square of matches as a D instead of a zero, we can do slightly better (though to do so we have to be allowed to throw out a toothpick, or place it directly on top of another one):


FDB is 4059, which is still not better than our current best. To get better, we need to get more clever.


Our first strategy is just to shift two matches in the first digit down so as to make it into a 9:


We can interpret this as 98 = 43,046,721. This is our best so far! But we can do better by applying Knuth’s up-arrow notation.


This is 518, which is almost 3.8 trillion, 100 thousand times better than 98! But we’re not done yet! If we use a caret (^) to represent exponentiation, we can get even higher!


5118 is 5.4 nonillion, a number with 31 decimal digits long. We could try to do better than this by squeezing in a caret between the 5 and the 1, making 5118 (a number with 83 decimal digits) but this becomes pretty messy and gross.

Alright, so we got up to 83 digit long numbers. Can we get any better? Yep!


Tetration is the level above exponentiation. The nth tetration of a is defined as follows:

Just as multiplication is repeated addition and exponentiation is repeated multiplication, tetration is repeated exponentiation! Now we get into numbers whose size is truly impossible to grasp. Let’s shift the top and bottom matches on the middle 0:


We can write this equivalently as: Screen Shot 2018-11-12 at 11.06.29 PM.png

How big is this number? There’s really no meaningful way to describe it. We’ve gone far beyond any quantities that can be made physical sense of, like the number of cubic nanometers in the universe, which is a measly 10107. But we’re not done yet!

Busy Beavers!

The busy beaver numbers are a sequence of numbers that arise from the properties of Turing machines. Here’s a brief description: The nth busy beaver number B(N) is the greatest number of ones that a finitely-running N state Turing machine can print on a tape which starts all zero.

Did you think tetration is big? Well, busy beaver numbers are unimaginably larger. In fact, the busy beaver sequence grows larger than any computable function! There’s a neat proof of this that involves the uncomputability of the halting problem. To skim over it, it can be shown that were we to have an upper bound on the busy beaver sequence, then we could find a deterministic algorithm for solving the halting problem in a finite amount of time, which we know is impossible. And if any computable function F(N) grew faster than B(N), then we could find an upper bound on the busy beaver sequence. Thus, it must be the case that no computable function grows as fast as B(N)!

We can exploit the absurd growth rate of the busy beaver sequence if we are allowed the interpretative freedom of assuming parentheses, so that Bn = B(n).


Let’s think for a minute about how large B(118) must be. So far, the only values of n for which B(n) is known are B(1), B(2), B(3), and B(4). After this the values grow out of control. A lower bound on B(6) is Screen Shot 2018-11-12 at 11.07.48 PM. For B(7), we have as a lower bound Screen Shot 2018-11-12 at 11.09.22 PM.png. B(8) almost certainly beats our current record. And B(118) is unthinkably large.

We can get even higher than the busy beaver numbers with the maximum shifts function S(N), defined as the number of steps that the longest-finitely-running N state Turing machine takes before halting. This function is guaranteed to be larger than B(N) for all N. Using S(N), and taking the same liberties as above with respect to parentheses, we can get an insanely high value:


This is S(1118), and while it’s undoubtedly larger than B(118), there’s no way to get any intuitive grasp on how much larger. But wait, there’s more! We can get even larger by recursively nesting a busy beaver function within a maximum shifts function:


We interpret this as S(B(9)). Why is this larger than S(1118)? Well, B(9) is some enormous number, certainly larger than 1118, so S(B(9)) is certainly greater than S(1118).

Now, are we finally done? Have we reached the peak yet? No! It’s time for the largest solution of them all.

The reason that the Busy Beaver numbers and Maximum Shift function are so big is because of the uncomputability of the halting problem. But if we consider Turing machines that have an oracle for the halting problem (call these meta Turing machines), we get a new meta-halting problem: when do these meta Turing machines halt? From the meta-halting problem comes an associated new sequence of Busy Beaver numbers, which grows uncomputable faster than the original Busy Beaver sequence. Then we can equip Turing machines with an oracle for the meta-halting problem, generating a meta-meta-Busy Beaver sequence.

Thus we get a hierarchy of Busy Beaver functions, which, following the notation used by Scott Aaronson here, can be described with Bn(x). Each Bn grows uncomputably faster than the previous Bn-1. There’s a similar hierarchy for the maximum shifts function, and each S_n is going to be an upper bound on each Sn-1.

So we can exploit this hierarchy to create an unimaginably large number (whose actual value is almost certainly independent of the axioms of set theory): Move around the top and bottom matches on the 0 to give S a subscript of 11. Then we get the 11th-up-in-the-hierarchy maximum shifts function S11 applied to 118: S11(118).


It’s a little gross-looking, but I think it works! I challenge anybody to try to come up with a better solution. 🙂

Gödel’s Second Incompleteness Theorem: Explained in Words of Only One Syllable

Somebody recently referred me to a 1994 paper by George Boolos in which he writes out a description of Gödel’s Second Incompleteness Theorem, using only words of one syllable. I love it so much that I’m going to copy the whole thing here in this post. Enjoy!

First of all, when I say “proved”, what I will mean is “proved with the aid of the whole of math”. Now then: two plus two is four, as you well know. And, of course, it can be proved that two plus two is four (proved, that is, with the aid of the whole of math, as I said, though in the case of two plus two, of course we do not need the whole of math to prove that it is four). And, as may not be quite so clear, it can be proved that it can be proved that two plus two is four, as well. And it can be proved that it can be proved that it can be proved that two plus two is four. And so on. In fact, if a claim can be proved, then it can be proved that the claim can be proved. And that too can be proved.

Now, two plus two is not five. And it can be proved that two plus two is not five. And it can be proved that it can be proved that two plus two is not five, and so on.

Thus: it can be proved that two plus two is not five. Can it be proved as well that two plus two is five? It would be a real blow to math, to say the least, if it could. If it could be proved that two plus two is five, then it could be proved that five is not five, and then there would be no claim that could not be proved, and math would be a lot of bunk.

So, we now want to ask, can it be proved that it can’t be proved that two plus two is five? Here’s the shock: no, it can’t. Or, to hedge a bit: if it can be proved that it can’t be proved that two plus two is five, then it can be proved as well that two plus two is five, and math is a lot of bunk. In fact, if math is not a lot of bunk, then no claim of the form “claim X can’t be proved” can be proved.

So, if math is not a lot of bunk, then, though it can’t be proved that two plus two is five, it can’t be proved that it can’t be proved that two plus two is five.

By the way, in case you’d like to know: yes, it can be proved that if it can be proved that it can’t be proved that two plus two is five, then it can be proved that two plus two is five.

George Boolos, Mind, Vol. 103, January 1994, pp. 1 – 3