# Sleeping Beauty Problem

I’ve been talking a lot about anthropic reasoning, so it’s only fair that I present what’s probably the most well-known thought experiment in this area: the sleeping beauty problem. Here’s a description of the problem from Wiki:

Sleeping Beauty volunteers to undergo the following experiment and is told all of the following details: On Sunday she will be put to sleep. Once or twice, during the experiment, Beauty will be awakened, interviewed, and put back to sleep with an amnesia-inducing drug that makes her forget that awakening. A fair coin will be tossed to determine which experimental procedure to undertake:

• If the coin comes up heads, Beauty will be awakened and interviewed on Monday only.
• If the coin comes up tails, she will be awakened and interviewed on Monday and Tuesday.

In either case, she will be awakened on Wednesday without interview and the experiment ends.

Any time Sleeping Beauty is awakened and interviewed she will not be able to tell which day it is or whether she has been awakened before. During the interview Beauty is asked: “What is your credence now for the proposition that the coin landed heads?”

There are two popular positions: the thirder position and the halfer position.

Thirders say: “Sleeping Beauty knows that she is in one of three situations: {Monday & Heads}, {Monday & Tails}, or {Tuesday and Tails}. All three of these situations are equally compatible with her experience (she can’t distinguish between them from the inside), so she should be indifferent about which one she is in. Thus there is a 1/3 chance of each, implying that there is a 1/3 chance of Heads and a 2/3 chance of Tails.”

Halfers say: “The coin is fair, so there is a 1/2 chance of Heads and Tails. When Sleeping Beauty wakes up, she gets no information that she didn’t have before (she would be woken up in either scenario). Since she has no new information, there is no reason to update her credences. So there is still a 1/2 chance of Heads and a 1/2 chance of Tails.”

I think that the Halfers are right. The anthropic information she could update on is the fact W = “I have been awakened.” We want to see what happens when we update our prior odds with respect to W. Using Bayes rule we get…

$\frac {P(H)} {P(T)} = \frac {1/2} {1/2} = 1 \\~\\ \frac {P(H | W)} {P(T | W)} = \frac {P(W | H)} {P(W | T)} \cdot \frac {P(H)} {P(T)} = \frac {1} {1} \cdot \frac {1/2} {1/2} = 1 \ \ \\~\\ So \ P(H | W) = \frac{1}{2}, \ P(T | W) = \frac{1}{2}$

The important feature of this calculation is that the likelihood ratio is 1. This is because both the theory that the coin landed Heads, and the theory that the coin landed Tails, predict with 100% confidence that Sleeping Beauty will be woken up. The fact that Sleeping Beauty is woken up twice if the coin comes up Tails and only once if the coin comes up Heads is, apparently, irrelevant to Bayes’ theorem.

However, Thirders also have a very strong response up their sleeves: “Let’s imagine that every time Sleeping Beauty is right, she gets $1. Now, suppose that Sleeping Beauty always says that the coin landed Tails. Now if she is right, she gets$2… one dollar for each day that she is woken up. What if she always says that the coin lands Heads? Then if she is right, she only gets $1. In other words, if the setup is rerun some large amount of times, the Sleeping Beauty that always says Tails gets twice as much money as the Sleeping Beauty that says Heads. If Sleeping Beauty is indifferent between Heads and Tails, as you Halfers suggest, then she would not have any preference about which one to say. But she would be wrong! She is better off by thinking Tails is more likely… in particular, she should think that Tails is two times more likely than Heads!” This is a response along the lines of “rationality should not function as a handicap.” I am generally very fond of these arguments, but am uncomfortable with what it implies here. If the above reasoning is correct, then Bayes’ theorem tells us to take a position that leaves us worse off. And if this is true, then it seems we’ve found a flaw in using Bayes’ theorem as a guide to rational belief-formation! But maybe this is too hasty. Is it really true that an expected value calculation using 1/2 probabilities will result in being indifferent between saying that the coin will land Heads and saying that it will land Tails? Plausibly not. If the coin lands Heads, then you have twice as many opportunities to make money. In addition, since your qualitative experience is identical on both of these opportunities, you should expect that whatever decision process you perform on Monday will be identical to the decision process on Tuesday. Thus if Sleeping Beauty is a timeless decision theorist, she will see her decision on both days as a single decision. What will she calculate? Expected value of saying Heads = 50% chance of Heads $\cdot$$2 gain for saying Heads on both days + 50% chance of Tails $\cdot$ $0 =$1

Expected value of saying Tails = 50% chance of Heads $\cdot$ $0 + 50% chance of Tails $\cdot$$1 gain for saying Tails on Tuesday = $0.50 So the expected value of saying Heads is still higher even if you think that the probability of Heads and Tails are equal, provided that you know about subjunctive dependence and timeless decision theory! # Infinities in the anthropic dice killer thought experiment It’s time for yet another post about the anthropic dice killer thought experiment. 😛 In this post, I’ll point out some features of the thought experiment that have gone unmentioned in this blog thus far. Perhaps it is in these features that we can figure out how to think about this the right way. First of all, there are lots of hidden infinities in the thought experiment. And as we’ve seen before, where there are infinities, things start getting really wacky. Perhaps some of the strangeness of the puzzle can be chalked up to these infinities. For instance, we stipulated that the population from which people are being kidnapped is infinite. This was to allow for the game to go on for arbitrarily many rounds, but it leads to some trickiness. As we saw in the last post, it becomes important to calculate the probability of a particular individual being kidnapped if randomly drawn from the population. But… what is this probability if the population is infinite? The probability of selecting a particular person from an infinite population is just like the probability of picking 5 if randomly selecting from all natural numbers: zero! Things get a little conceptually tricky here. Imagine that you’re randomly selecting a real number between 0 and 1. The probability that you select any particular number is zero. But at the same time, you will end up selecting some number. Whichever number you end up selecting, is a number that you would have said had a 0% chance of being selected! For situations like these, the term “almost never” is used. Rather than saying that any particular number is impossible, you say that it will “almost never” be picked. While this linguistic trick might make you feel less uneasy about the situation, there still seems to be some remaining confusion to be resolved here. So in the case of our thought experiment, no matter how many rounds the game ends up going on, you have a 0% chance of being kidnapped. At the same time, by stipulation you have been kidnapped. Making sense of this is only the first puzzle. The second one is to figure out if it makes sense to talk about some theories making it more likely than others that you’ll be kidnapped (is $\frac{11}{\infty}$ smaller than $\frac{111}{\infty}$?) An even more trouble infinity is in the expected number of people that are kidnapped. No matter how many rounds end up being played, there are always only a finite number of people that are ever kidnapped. But let’s calculate the expected number of people that play the game. $Number \ by \ n^{th} \ round = \frac{10^n - 1}{9} \\~\\ \sum\limits_{n=1}^{\infty} { \frac{35^{n-1}}{36^n} \cdot \frac{10^n - 1}{9} }$ But wait, this sum diverges! To see this, let’s just for a moment consider the expected number of people in the last round: $Number \ in \ n^{th} \ round = 10^{n-1} \\~\\ \sum\limits_{n=1}^{\infty} { \frac{35^{n-1}}{36^n} \cdot 10^{n-1} } = \sum\limits_{n=1}^{\infty} { (\frac{350}{36})^{n-1} }$ Since $\frac{350}{36} > 1$, this sum diverges. So on average there are an infinite number of people on the last round (even though the last round always contains a finite number of people). Correspondingly, the expected number of people kidnapped is infinite. Why might these infinities matter? Well, one reason is that there is a well known problem with playing betting games against sources with infinite resources. Consider the Martingale betting system: A gambler makes a bet of$1 at some odds. If they win, then good for them! Otherwise, if they lose, they bet $2 on the same odds. If they lose this time, they double down again, betting$4. And so on until eventually they win. The outcome of this is that by the time they win, they have lost $\(1 + 2 + 4 + ... + 2^n)$ and gained $\ 2^{n+1}$. This is a net gain of $1. In other words, no matter what the odds they are betting on, this betting system guarantees a gain of$1 with probability 100%.

However, this guaranteed $1 only applies if the gambler can continue doubling down arbitrarily long. If they have a finite amount of money, then at some point they can no longer double down, and they suffer an enormous loss. For a gambler with finite resources, they stand a very good chance of gaining$1 and a very tiny chance of losing massively. If you calculate the expected gain, it turns out to be no better than what you expect from any ordinary betting system.

Summing up: With finite resources, continually doubling down gives no advantage on average. But with infinite resources, continually doubling down gives a guaranteed profit. Hopefully you see the similarity to the dice killer thought experiment. With an infinite population to draw from, the killer can keep “doubling down” (actually “decupling” down) until they finally get their “payout”: killing all of their current captives. On the other hand, with a finite population, the killer eventually loses the ability to get a new group of 10x the population of the previous one and lets everybody free. In this case, exactly like the Martingale system, the odds for a kidnappee end up coming out to the prior odds of 1/36.

What this indicates is that at least some of the weirdness of the dice killer scenario can be chalked up to the exploitability of infinities by systems like the Martingale system. If you have been kidnapped by the dice killer, you should think that your odds are 90% only if you know you are drawn from an infinite population. Otherwise, your odds should come out to 1/36.

But now consider the following: If you are a casino owner, should you allow into your casino a person with infinite money? Clearly not! It doesn’t matter how much of a bias the games in the casino give in favor of the house. An infinitely wealthy person can always exploit this infinity to give themselves an advantage.

But what about allowing a person with infinite money into your casino to place a single bet? In this case, I think that the answer is yes, you should allow them. After all, with only a finite number of bets, the odds still come out in favor of the house. This is actually analogous to the original dice killer puzzle! You are only selected in one round, and know that you will not be selected at any other time. So perhaps the infinity does not save us here.

One final point. It looks like a lot of the weirdness of this thought experiment is the same type of weirdness as you get from infinitely wealthy people using the Martingale betting system. But now we can ask: Is it possible to construct a variant of the dice killer thought experiment in which the anthropic calculation differs from the non-anthropic calculation, AND the expected number of people kidnapped is finite? It doesn’t seem obvious to me that this is impossible. Since the expected number of captives takes the form of an infinite sum with the number of people by the Nth round multiplied by roughly $(\frac{35}{36})^N$, all that is required is that the number of people by the Nth round be less than $(\frac{36}{35})^N$. Then the anthropic calculation should give a different answer from the non-anthropic calculation, and we can place the chance of escape in between these two. Now we have a finite expected number of captives, but a reversal in decision depending on whether you update on anthropic evidence or not. Perhaps I’ll explore this more in future posts.

# Not a solution to the anthropic dice killer puzzle

I recently came up with what I thought was a solution to the dice killer puzzle. It turns out that I was wrong, but in the process of figuring this out I discovered a few subtleties in the puzzle that I had missed first time around.

First I’ll repost the puzzle here:

One piece of information that you have is that you are aware of the maniacal schemes of your captor. His plans began by capturing one random person. He then rolled a pair of dice to determine their fate. If the dice landed snake eyes (both 1), then the captive would be killed. If not, then they would be let free.

But if they are let free, the killer will search for new victims, and this time bring back ten new people and lock them alone in rooms. He will then determine their fate just as before, with a pair of dice. Snake eyes means they die, otherwise they will be let free and he will search for new victims.

His murder spree will continue until the first time he rolls snake eyes. Then he will kill the group that he currently has imprisoned and retire from the serial-killer life.

Now. You become aware of a risky way out of the room you are locked in and to freedom. The chances of surviving this escape route are only 50%. Your choices are thus either (1) to traverse the escape route with a 50% chance of survival or (2) to just wait for the killer to roll his dice, and hope that it doesn’t land snake eyes.

What should you do?

As you’ll recall, there are two possible estimates of the probability of the dice landing snake eyes: 1/36 and 90%. Briefly, the arguments for each are…

Argument 1  The probability of the dice landing snake eyes is 1/36. If the dice land snake eyes, you die. So the probability that you die is 1/36.

Argument 2  The probability that you are in the last round is above 90%. Everybody in the last round dies. So the probability that you die is above 90%.

The puzzle is trying to explain what is wrong with the second argument, given its unintuitive consequences. So, here’s an attempt at a resolution!

Imagine that you find out that you’re in the fourth round with 999 other people. The probability that you’re interested in is the probability that the fourth round is the last round (which is equivalent to the fourth round being the round in which you get snake-eyes and thus die). To calculate this, we want to consider all possible worlds (i.e. all possible number of rounds that the game might go for) and calculate the probability weight for each.

In other words, we want to be able to calculate P(Game ends on the Nth round) for every N. We can calculate this a priori by just considering the conditions for the game ending on the Nth round. This happens if the dice roll something other than snake eyes N-1 times and then snake eyes once, on the final round. Thus the probability should be:

Now, to calculate the probability that the game ends on the fourth round, we just plug in N = 4 and we’re done!

But hold on. There’s an obvious problem with this approach. If you know that you’ve been kidnapped on the fourth round, then you should have zero credence that the game ended on the third, second, or first rounds. But the probability calculation above gives a non-zero credence to each of these scenarios! What’s gone wrong?

Answer: While the probability above is the right prior probability for the game progressing to the Nth round, what we actually want is the posterior probability, conditioned on the information that you have about your own kidnapping.

In other words, we’re not interested in the prior probability P(Game ends on the Nth round). We’re interested in the conditional probability P(Game ends on the Nth round | I was kidnapped in the fourth round). To calculate this requires Bayes’ rule.

The top term P(You are in the fourth round | N total rounds) is zero whenever N is less than four, which is a good sign. But what happens when N is ≥ 4? Does the probability grow with N or shrink?

Intuitively, we might think that if there are a very large number of rounds, then it is very unlikely that we are in the fourth one. Taking into account the 10x growth in number of people each round, it looks like for any N > 4, the theory that there are N rounds strongly predicts that you are in the Nth round. The larger N is, the more strongly it predicts that you are not in the fourth round. In other words, the update on your being in the fourth round strongly favors possible worlds in which the fourth round is the last one

But this is not the whole story! There’s another update to be considered. Remember that in this setup, you exist as a member of a boundless population and are at some point kidnapped. We can ask the question: How likely is it that you would have been kidnapped if there were N rounds?  Clearly, the more rounds there are before the game ends, the more people are kidnapped, and so the higher chance you have of being kidnapped in the first place! This means that we should expect it to be very likely that the fourth round is not the last round, because worlds in which the fourth round is not the last one contain many more people, thus making it more likely that you would have been kidnapped at all.

In other words, we can break our update into two components: (1) that you were kidnapped, and (2) that it was in the fourth round that you were kidnapped. The first of these updates strongly favors theories in which you are not in the last round. The second strongly favors theories in which you are in the last round. Perhaps, if we’re lucky, these two updates cancel out, leaving us with only the prior probability based on the objective chance of the dice rolling snake eyes (1/36)!

Recapping: If we know which round we are in, then when we update on this information, the probability that this round is the last one is just equal to the objective chance that the dice roll lands snake eyes (1/36). Since this should be true no matter what particular round we happen to be in, we should be able to preemptively update on being in the Nth round (for some N) and bring our credence to 1/36.

This is the line of thought that I had a couple of days ago, which I thought pointed the way to a solution to the anthropic dice killer puzzle. But unfortunately… I was wrong. It turns out that even when we consider both of these updates, we still end up with a probability > 90% of being in the last round.

Here’s an intuitive way to think about why this is the case.

In the solution I wrote up in my initial post on the anthropic dice killer thought experiment, I gave the following calculation:

Basically, we look at the fraction of people that die if the game ends on the Nth round, calculate the probability of the game ending on the Nth round, and then average the fraction over all possible N. This gives us the average fraction of people that die in the last round.

We now know that this calculation was wrong. The place where I went wrong was in calculating the chance of getting snake eyes in the nth round. The probability I wrote was the prior probability, where what we want instead is the posterior probability after performing an anthropic update on the fact of your own kidnapping.

So maybe if we plug in the correct values for these probabilities, we’ll end up getting a saner answer!

Unfortunately, no. The fraction of people that die starts at 100% and then gradually decreases, converging at infinity to 90% (the limit of $\frac{1000...}{1111...}$ is .9). This means that no matter what probabilities we plug in there, the average fraction of people will be greater than 90%. (If the possible values of a quantity are all greater than 90%, then the average value of this quantity cannot possibly be less than 90%.)

This means that without even calculating the precise posterior probabilities, we can confidently say that the average probability of death must be greater than 90%. And therefore our proposed solution fails, and the mystery remains.

It’s worth noting that even if our calculation had come out with the conclusion that 1/36 was the actual average chance of death, we would still have a little explaining to do. Namely, it actually is the case that the average person does better by trying to escape (i.e. acting as if the probability of their death is greater than 90%) than by staying around (i.e. acting as if the probability of their death is 1/36).

This is something that we can say with really high confidence: accepting the apparent anthropic calculation of 90% leaves you better off on average than rejecting it. On its own, this is a very powerful argument for accepting 90% as the answer. The rational course of action should not be one that causes us to lose where winning is an option.

# Pushing anti-anthropic intuitions

A stranger comes up to you and offers to play the following game with you: “I will roll a pair of dice. If they land snake eyes (i.e. they both land 1), you give me one dollar. Otherwise, if they land anything else, I give you a dollar.”

Do you play this game?

Here’s an intuitive response: Yes, of course you should! You have a 35/36 chance of gaining $1, and only a 1/36 chance of losing$1. You’d have to be quite risk averse to refuse those odds.

What if the stranger tells you that they are giving this same bet to many other people? Should that change your calculation?

Intuitively: No, of course not! It doesn’t matter what else the stranger is doing with other people.

What if they tell you that they’ve given this offer to people in the past, and might give the offer to others in the future? Should that change anything?

Once again, it seems intuitively not to matter. The offers given to others simply have nothing to do with you. What matters are your possible outcomes and the probabilities of each of these outcomes. And what other people are doing has nothing to do with either of these.

… Right?

Now imagine that the stranger is playing the game in the following way: First they find one person and offer to play the game with them. If the dice land snake eyes, then they collect a dollar and stop playing the game. Otherwise, they find ten new people and offer to play the game with them. Same as before: snake eyes, the stranger collects $1 from each and stops playing, otherwise he moves on to 100 new people. Et cetera forever. We now ask the question: How does the average person given the offer do if they take the offer? Well, no matter how many rounds of offers the stranger gives, at least 90% of people end up in his last round. That means that at least 90% of people end up giving over$1 and at most 10% gain $1. This is clearly net negative for those that hand over money! Think about it this way: Imagine a population of individuals who all take the offer, and compare them to a population that all reject the offer. Which population does better on average? For the population who takes the offer, the average person loses money. An upper bound on how much they lose is 10% ($1) + 90% (-$1) = -$.80. For the population that reject the offer, nobody gains money or loses It either: the average case is exactly $0.$0 is better than -$.80, so the strategy of rejecting the offer is better, on average! This thought experiment is very closely related to the dice killer thought experiment. I think of it as a variant that pushes our anti-anthropic-reasoning intuitions. It just seems really wrong to me that if somebody comes up to you and offers you this deal that has a 35/36 chance of paying out you should reject it. The details of who else is being offered the deal seem totally irrelevant. But of course, all of the previous arguments I’ve made for anthropic reasoning apply here as well. And it is just true that the average person that rejects the offer does better than the average person that accepts it. Perhaps this is just another bullet that we have to bite in our attempt to formalize rationality! # A closer look at anthropic tests for consciousness (This post is the culmination of my last week of posts on anthropics and conservation of expected evidence.) In this post, I described how anthropic reasoning can apparently give you a way to update on theories of consciousness. This is already weird enough, but I want to make things a little weirder. I want to present an argument that in fact anthropic reasoning implies that we should be functionalists about consciousness. But first, a brief recap (for more details see the post linked above): Thus… Whenever this experiment is run, roughly 90% of experimental subjects observe snake eyes, and roughly 10% observe not snake eyes. What this means is that 90% of the people update in favor of functionalism (by a factor of 9), and only 10% of people update in favor of substrate dependence theory (also by a factor of 9). Now suppose that we have a large population that starts out completely agnostic on the question of functionalism vs. substrate dependence. That is, the prior ratio for each individual is 1: Now imagine that we run arbitrarily many dice-killer experimental setups on the population. We would see an upwards drift in the average beliefs of the population towards functionalism. And in the limit of infinite experiments, we would see complete convergence towards functionalism as the correct theory of consciousness. Now, the only remaining ingredient is what I’ve been going on about the past two days: if you can predict beforehand that a piece of evidence is going to make you on average more functionalist, then you should preemptively update in favor of functionalism. What we end up with is the conclusion that considering the counterfactual infinity of experimental results we could receive, we should conclude with arbitrarily high confidence that functionalism is correct. To be clear, the argument is the following: 1. If we were to be members of a population that underwent arbitrarily many dice-killer trials, we would converge towards functionalism. 2. Conservation of expected evidence: if you can predict beforehand which direction some observation would move you, then you should pre-emptively adjust your beliefs in that direction. 3. Thus, we should preemptively converge towards functionalism. Premise 1 follows from a basic application of anthropic reasoning. We could deny it, but doing so amounts to denying the self-sampling assumption and ensuring that you will lose in anthropic games. Premise 2 follows from the axioms of probability theory. It is more or less the statement that you should update your beliefs with evidence, even if this evidence is counterfactual information about the possible results of future experiments. (If this sounds unintuitive to you at all, consider the following thought experiment: We have two theories of cosmology, one in which 99% of people live in Region A and 1% in Region B, and the other in which 1% live in Region A and 99% in Region B. We now ask where we expect to find ourselves. If we expect to find ourselves in Region A, then we must have higher credence in the first theory than the second. And if we initially did not have this higher credence, then considering the counterfactual question “Where would I find myself if I were to look at which region I am in?” should cause us to update in favor of the first theory.) Altogether, this argument looks really bullet proof to me. And yet its conclusion seems very wrong. Can we really conclude with arbitrarily high certainty that functionalism is correct by just going through this sort of armchair reasoning from possible experimental results that we will never do? Should we now be hardcore functionalists? I’m not quite sure yet what the right way to think about this is. But here is one objection I’ve thought of. We have only considered one possible version of the dice killer thought experiment (in which the experimenter starts off with 1 human, then chooses 1 human and 9 androids, then 1 human and 99 androids, and so on). In this version, observing snake eyes was evidence for functionalism over substrate dependence theory, which is what causes the population-wide drift towards functionalism. We can ask, however, if we can construct a variant of the dice killer thought experiment in which snake eyes counts as evidence for substrate dependence theory over functionalism. If so, then we could construct an experimental setup that we can predict beforehand will end up with us converging with arbitrary certainty to substrate dependence theory! Let’s see how this might be done. We’ll imagine the set of all variants on the thought experiment (that is, the set of all choices the dice killer could make about how many humans and androids to kidnap in each round.) For ease of notation, we’ll abbreviate functionalism and substrate dependence theory as F and S respectively. And we’ll also introduce a convenient notation for calculating the total number of humans and the total number androids ever kidnapped by round N. Now, we want to calculate the probability of snake eyes given functionalism in this general setup, and compare it to the probability of snake eyes given substrate dependence theory. The first step will be to consider the probability of snake eyes if the experiment happens to end on the nth round, for some n. This is just the number of individuals in the last round divided by the total number of kidnapped individuals. Now, we calculate the average probability of snake eyes (the average fraction of individuals in the last round). The question is thus if we can find a pair of sequences such that the first term is larger than the second. It seems hard to imagine that there are no such pairs of sequences that satisfy this inequality, but thus far I haven’t been able to find an example. For now, I’ll leave it as an exercise for the reader! If there are no such pairs of sequences, then it is tempting to take this as extremely strong evidence for functionalism. But I am concerned about this whole line of reasoning. What if there are a few such pairs of sequences? What if there are far more in which functionalism is favored than those in which substrate dependence is favored? What if there are an infinity of each? While I buy each step of the argument, it seems wrong to say that the right thing to do is to consider the infinite set of all possible anthropic experiments you could do, and then somehow average over the results of each to determine the direction in which we should update our theories of consciousness. Indeed, I suspect that any such averaging procedure would be vulnerable to arbitrariness in the way that the experiments are framed, such that different framings give different results. At this point, I’m pretty convinced that I’m making some fundamental mistake here, but I’m not sure exactly where this mistake is. Any help from readers would be greatly appreciated. 🙂 # Two principles of Bayesian reasoning Bayes’ rule is a pretty simple piece of mathematics, and it’s extraordinary to me the amount of deep insight that can be plumbed by looking closely at it and considering its implications. ## Principle 1: The surprisingness of an observation is proportional to the amount of evidence it provides. Evidence that you expect to observe is weak evidence, while evidence that is unexpected is strong evidence. This follows directly from Bayes’ theorem: If E is very unexpected, then P(E) is very small. This puts an upwards pressure on the posterior probability, entailing a large belief update. If E is thoroughly unsurprising, then P(E) is near 1, which means that this upward pressure is not there. A more precise way to say this is to talk about how surprising evidence is given a particular theory. On the left is a term that (1) is large when E provides strong evidence for H, (2) is near zero when it provides strong evidence against H, and (3) is near 1 when it provides weak evidence regarding H. On the right is a term that (1) is large if E is very unsurprising given H, (2) is near zero when E is very surprising given H, and (3) is near 1 when E is not made much more surprising or unsurprising by H. What we get is that (1) E provides strong evidence for H when E is very unsurprising given H, (2) E provides strong evidence against H when it is very surprising given H, and (3) E provides weak evidence regarding H when it is not much more surprising or unsurprising given H. This makes a lot of sense when you think through it. Theories that make strong and surprising predictions that turn out to be right, are given stronger evidential weight than theories that make weak and unsurprising predictions. ## Principle 2: Conservation of expected evidence I stole the name of this principle from Eliezer Yudkowsky, who wrote about this here. The idea here is that for any expectation you have of receiving evidence for a belief, you should have an equal and opposite expectation of receiving evidence against a belief. It cannot be the case that all possible observations support a theory. If some observations support a theory, then there must be some other observations that undermine it. And the precise amount that these observations undermine this theory balances the expected evidential support of the theory. Proof of this: The first term is the expected change in credence in H after observing E, and the second is the expected change in credence in H after observing -E. Thus, the average expected change in credence is exactly zero. Putting these together, we see that a strong expectation corresponds to weak evidence, and this strong expectation of weak evidence also corresponds to a weak expectation of strong evidence! # Explaining anthropic reasoning I realize that I’ve been a little unclear in my last few posts. I presupposed a degree of familiarity with anthropic reasoning that most people don’t have. I want to remedy that by providing a short explanation of what anthropic reasoning is and why it is useful. First of all, one thing that may be confusing is that the term ‘anthropic reasoning’ is used in multiple very different ways. In particular, its most common usage is probably in arguments about the existence of God, where it is sometimes presented as an argument against the evidential force of fine tuning. I have no interest in this, so please don’t take me to be using the term this way. My usage is identical with that of Nick Bostrom, who wrote a fantastic book about anthropic reasoning. You’ll see precisely what this usage entails shortly, but I just want to plant a flag now in case I use the word ‘anthropic’ in a way that you are unfamiliar with. Good! Now, let’s start with a few thought experiments. 1. Suppose that the universe consists of one enormous galaxy, divided into a central region and an outer region. The outer region is densely populated with intelligent life, containing many trillions of planetary civilizations at any given moment. The inner region is hostile to biology, and at any given time only has a few hundred planetary civilizations. It is impossible for life to develop beyond the outer region of the galaxy. Now, you are a member of a planetary civilization that knows all of this, but doesn’t know its location in the galaxy. You reason that it is: (a) As likely that you are in the central region as it is that you are in the the outer region (b) More likely (c) Less likely 2. Suppose that the universe consists of one galaxy that goes through life phases. In its early phase, life is very rare and the galaxy is typically populated by only a few hundred planetary civilizations. In its middle phase, life is plentiful and the galaxy is typically populated by billions of planetary civilizations. And in its final phase, which lasts for the rest of the history of the universe, it is impossible for life to evolve. You are born into a planetary civilization that knows all of this, but doesn’t know what life phase the galaxy is in. You reason that it is: (a) As likely that you are in the early phase as the middle phase (b) More likely (c) Less likely 3. You are considering two competing theories of cosmology. In Cosmology X, 1% of life exists in Region A and 99% in Region B. In Cosmology Y, 99% of life is in Region A and 1% in Region B. You currently don’t know which region you are in, and have equal credence in Cosmology X and Cosmology Y. Now you perform an experiment that locates yourself in the universe. You find that you are in Region A. How should your beliefs change? (a) They should stay the same (b) Cosmology X becomes more likely than Cosmology Y (c) Cosmology Y becomes more likely than Cosmology X If you answered (c) for all three, then congratulations, you’re already an expert anthropic reasoner! What we want to do is explain why (c) was the right answer in all three cases, and see if we can unearth any common principles. You might think that this is unnecessary; after all, aren’t we just using a standard application of Bayes’ theorem? Sort of, but there’s a little more going on here. Consider, for instance the following argument: 1. Most people have property X, 2. Therefore, I probably have property X. Ignoring the base rate fallacy here, there is an implicit assumption involved in the jump from 1 to 2. This assumption can be phrased as follows: I should reason about myself as if I am randomly sampled from the set of all people. A similar principle turns out to be implicit in the reasoning behind our answers to the three starting questions. For question 1, it was something like I should reason about myself as if I am randomly sampled from the set of all intelligent organisms in the universe at this moment. For 2, it might be I should reason about myself as if I am randomly sampled from the set of all intelligent organisms in the history of the universe. And for 3, it is pretty much the same as 1: I should reason about myself as if I am randomly sampled from all intelligent organisms in the universe. These various sampling assumptions really amount to the notion that we should reason about ourselves the same way we reason about anything else. If somebody hands us a marble from an urn that contains 99% black marbles, (and we have no other information) we should think this marble has a 99% chance of being black. If we learn that 99% of individuals like us exist in Region A rather than Region B (and we have no other information), then we should think that we have a 99% chance of being in Region A. In general, we can assert the Self-Sampling Assumption (SSA): SSA: In the absence of more information, I should reason about myself as if I am randomly sampled from the set of all individuals like me. The “individuals like me” is what gives this principle the versatility to handle all the various cases we’ve discussed so far. It’s slightly vague, but will do for now. And now we have our first anthropic principle! We’ve seen how eminently reasonable this principle is in the way that it handles the cases we started with. But at the same time, accepting this basic principle pretty quickly leads to some unintuitive conclusions. For instance: 1. It’s probably not the case that there are other intelligent civilizations that have populations many times larger than ours (for instance, galactic societies). 2. It’s probably not the case that we exist in the first part of a long and glorious history of humanity in which we expand across space and populate the galaxy (this is called the Doomsday argument). 3. On average, you are probably pretty average in most ways. (Though there might be a selection effect to be considered in who ends up regularly reading this blog.) These are pretty dramatic conclusions for a little bit of armchair reasoning! Can it really be that we can assert the extreme improbability of a glorious future and the greater likelihood of doomsday from simply observing our birth order in the history of humanity? Can we really draw these types of conclusions about the probable distributions of intelligent life in our universe from simply looking at facts about the size of our species? It is tempting to just deny that this reasoning is valid. But to do so is to reject the simple and fairly obvious-seeming principle that justified our initial conclusions. Perhaps we can find some way to accept (c) as the answer for the three questions we started with while still denying the three conclusions I’ve just listed, but it’s not at all obvious how. Just to drive the point a little further, let’s look at (2) – the Doomsday argument – again. The argument is essentially this: Consider two theories of human history. In Theory 1, humans have a brief flash of exponential growth and planetary domination, but then go extinct not much later. In this view, we (you and me) are living in a fairly typical point in the history of humanity, existing near its last few years when its population is greatest. In Theory 2, humans continue to expand and expand, spreading civilization across the solar system and eventually the galaxy. In this view, the future of humanity is immense and glorious, and involves many trillions of humans spread across hundreds or thousands of planets for many hundreds of thousands of years. We’d all like Theory 2 to be the right one. But when we consider our place in history, we must admit that it seems incredibly less likely for us to be in the very tiny period of human history in which we still exist on one planet, than it is for us to be in the height of human history where most people live. By analogy, imagine a bowl filled with numbered marbles. We have two theories about the number of marbles in the bowl. Theory 1 says that there are 10 marbles in the bowl. Theory 2 says that there are 10,000,000. Now we draw a marble and see that it is numbered 7. How should this update our credences in these two theories? Well, on Theory 2, getting a 7 is one million times less likely than it is on Theory 1. So Theory 1 gets a massive evidential boost from the observation. In fact, if we consider the set of all possible theories of how many marbles there are in the jar, the greatest update goes to the theory that says that there are exactly 7 marbles. Theories that say any fewer than 7 are made impossible by the observation, and theories that say more than 7 are progressively less likely as the number goes up. This is exactly analogous to our birth order in the history of humanity. The self-sampling assumption says that given that you are a human, you should treat yourself as if you are randomly sampled from the set of all humans there will ever be. If you are, say, the one trillionth human, then the most likely theory is that there are not many more than a trillion humans that will ever exist. And theories that say there will be fewer than a trillion humans are ruled out definitively by the observation. Comparing the theory that says there will be a trillion trilllion humans throughout history to the theory that says there will be a trillion humans throughout history, the first is a trillion times less likely! In other words, applying the self-sampling assumption to your birth order in the history of humanity, we update in favor of a shortly upcoming doomsday. To be clear, this is not the same as saying that doomsday soon is inevitable and that all other sources of evidence for doomsday or not-doomsday are irrelevant. This is just another piece of evidence to be added to the set of all evidence we have when drawing inferences about the future of humanity, albeit a very powerful one. Okay, great! So far we’ve just waded into anthropic reasoning. The self-sampling assumption is just one of a few anthropic principles that Nick Bostrom discusses, and there are many other mind boggling implications of this style of reasoning. But hopefully I have whetted your appetite for more, as well as given you a sense that this style of reasoning is both nontrivial to refute and deeply significant to our reasoning about our circumstance. # Predicting changes in belief One interesting fact about Bayesianism is that you should never be able to predict beforehand how a piece of evidence will change your credences. For example, sometimes I think that the more I study economics, the more I will become impressed by the power of free markets, and that therefore I will become more of a capitalist. Another example is that I often think that over time I will become and more moderate in many ways (this is partially just because of induction on the direction of my change – it seems like when I’ve held extreme beliefs in the past, these beliefs tend to have become more moderate over time.) Now, in these two cases I might be right as a matter of psychology. It might be the case that if I study economics, I’ll be more likely to end up a supporter of capitalism than if I don’t. It might be the case that as time passes, I’ll become increasingly moderate until I die out of boredom with my infinitely bland beliefs. But even if I’m right about these things as a matter of psychology, I’m wrong about them as a matter of probability theory. It should not be the case that I can predict beforehand how my beliefs will change, for if it were, then why wait to change them? If I am really convinced that studying economics will make me a hardcore capitalist, then I should update in favor of capitalism in advance. I think that this is a pretty common human mistake to make. Part of an explanation for this might be based off of the argumentative theory of reason – the idea that human rationality evolved as a method of winning arguments, not necessarily being well-calibrated in our pursuit of truth. If this is right, then it would make sense that we would only want to hold strong beliefs when we can confidently display strong evidence for them. It’s not that easy to convincingly argue for your position when part of our justification for it is something like “Well I don’t have the evidence yet, but I’ve got a pretty strong hunch that it’ll come out in this direction.” Another factor might be the way that personality changes influence belief changes. It might be that when we say “I will become more moderate in beliefs as I age”, we’re really saying something like “My personality will become less contrarian as I age.” There’s still some type of mistake here, but it has to do with an overly strong influence of personality on beliefs, not the epistemic principle in question here. Suppose that you have in front of you a coin with some unknown bias X. Your beliefs about the bias of the coin are captured in a probability distribution over possible values of X, P(X). Now you flip the coin and observe whether it lands heads or tails. If it lands H, your new state of belief is P(X | H). If it lands T, your new state of belief is P(X | T). So before you observe the coin flip, your new expected distribution over the possible biases is just the weighted sum of these two: Posterior is P(X | H) with probability P(H) Posterior is P(X | T) with probability P(T) Thus, expected posterior is P(X | H) P(H) + P(X | T) P(T) = P(X) Our expected posterior distribution is exactly as our prior posterior distribution! In other words, you can’t anticipate any particular change in your prior distribution. This makes some intuitive sense… if you knew beforehand that your prior would change in a particular direction, then you should have already changed in that direction! In general: Take any hypothesis H and some piece of relevant evidence that will either turn out E or -E. Suppose you have some prior credence P(H) in this hypothesis before observing either E or -E. Your expected final credence is just: P(H | E) P(E) + P(H | -E) P(-E) Which of course is just another way of writing the prior credence P(H). P(H) = P(H | E) P(E) + P(H | -E) P(-E) Eliezer Yudkowsky has named this idea “conservation of expected evidence” – for any possible expectation of evidence you might receive, you should have an equal and opposite expectation of evidence in the other direction. It should never be the case that a hypothesis is confirmed no matter what evidence you receive. If E counts as evidence for H, then -E should count against it. And if you have a strong expectation of weak evidence E, then you should have a weak expectation of strong evidence -E. (If you strongly expect to see H, then it is weak evidence, and correspondingly you should have a weak expectation of seeing T, which would be strong evidence.) This is a pretty powerful idea, and I find myself applying it as a thinking tool pretty regularly. (And Yudkowsky’s writings on LessWrong are chalk full of this type of probability theoretic wisdom, I highly recommend them.) Today I was thinking about how we could determine, if not the average change in beliefs, the average amount that beliefs change. That is, while you can’t say beforehand that you will become more confident that a hypothesis is true, you can still say something about how much you expect your confidence to change as an absolute value. Let’s stick with our simple coin-tossing example for illustration. Our prior distribution over possible biases is P(X). This distribution has some characteristic mean µ and standard deviation σ. We can also describe the mean of each possible posterior distribution: Now we can look at how far away each of these updated means is from the original mean: We want to average these differences, weighted by how likely we think we are to observe them. But if we did this, we’d just get zero. Why? Conservation of expected evidence rearing its head again! The average of the differences in means is the average amount that you think that your expectation of heads will move, and this cannot be nonzero. What we want is a quantity that captures the absolute distance between the new means and the original mean. The standard way of doing this is the following: This gives us: This gives us a measure of how strong of a belief update we should expect to receive. I haven’t heard very much about this quantity (the square root of the weighted sum of the squares of the changes in means for all possible evidential updates), but it seems pretty important. Notice also that this scales with the variance on our prior distribution. This makes a lot of sense, because a small variance on your prior implies a high degree of confidence, which entails a weak belief update. Similarly, a large variance on your prior implies a weakly held belief, and thus a strong belief update. Let’s see what this measure gives for some simple distributions. First, the uniform distribution: In this case, the predicted change in mean is exactly right! If we get H, our new mean becomes 2/3 (a change of +1/6) and if we get T, our new mean becomes 1/3 (a change of -1/6). Either way, the mean value of our distribution shifts by 1/6, just as our calculation predicted! Let’s imagine you start maximally uncertain about the bias (the uniform prior) and then observe a H. Now with this new distribution, what is your expected magnitude of belief change? Notice that this magnitude of belief change is smaller than for the original distribution. Once again, this makes a lot of sense: after getting more information, your beliefs become more ossified and less mutable. In general, we can calculate ∆µ for the distribution that results from observing n heads and m tails, starting with a flat prior: Asymptotically as n and m go to infinity (as you flip the coin arbitrarily many times), this relation becomes Which clearly goes to zero, and pretty quickly as well. This amounts to the statement that as you get lots and lots of information, each subsequent piece of information matters comparatively less. # What do I find conceptually puzzling? There are lots of things that I don’t know, like, say, what the birth rate in Sweden is or what the effect of poverty on IQ is. There are also lots of things that I find really confusing and hard to understand, like quantum field theory and monetary policy. There’s also a special category of things that I find conceptually puzzling. These things aren’t difficult to grasp because the facts about them are difficult to understand or require learning complicated jargon. Instead, they’re difficult to grasp because I suspect that I’m confused about the concepts in use. This is a much deeper level of confusion. It can’t be adjudicated by just reading lots of facts about the subject matter. It requires philosophical reflection on the nature of these concepts, which can sometimes leave me totally confused about everything and grasping for the solid ground of mere factual ignorance. As such, it feels like a big deal when something I’ve been conceptually puzzled about becomes clear. I want to compile a list for future reference of things that I’m currently conceptually puzzled about and things that I’ve become un-puzzled about. (This is not a complete list, but I believe it touches on the major themes.) # Things I’m conceptually puzzled about ### What is the relationship between consciousness and physics? I’ve written about this here. Essentially, at this point every available viewpoint on consciousness seems wrong to me. Eliminativism amounts to a denial of pretty much the only thing that we can be sure can’t be denied – that we are having conscious experiences. Physicalism entails the claim that facts about conscious experience can be derived from laws of physics, which is wrong as a matter of logic. Dualism entails that the laws of physics by themselves cannot account for the behavior of the matter in our brains, which is wrong. And epiphenomenalism entails that our beliefs about our own conscious experience are almost certainly wrong, and are no better representations of our actual conscious experiences than random chance. ### How do we make sense of decision theory if we deny libertarian free will? Written about this here and here. Decision theory is ultimately about finding the decision D that maximizes expected utility EU(D). But to do this calculation, we have to decide what the set of possible decisions we are searching is. Make this set too large, and you end up getting fantastical and impossible results (like that the optimal decision is to snap your fingers and make the world into a utopia). Make it too small, and you end up getting underwhelming results (in the extreme case, you just get that the optimal decision is to do exactly what you are going to do, since this is the only thing you can do in a strictly deterministic world). We want to find a nice middle ground between these two – a boundary where we can say “inside here the things that are actually possible for us to do, and outside are those that are not.” But any principled distinction between what’s in the set and what’s not must be based on some conception of some actions being “truly possible” to us, and others being truly impossible. I don’t know how to make this distinction in the absence of a robust conception of libertarian free will. ### Are there objectively right choices of priors? I’ve written about this here. If you say no, then there are no objectively right answers to questions like “What should I believe given the evidence I have?” And if you say yes, then you have to deal with thought experiments like the cube problem, where any choice of priors looks arbitrary and unjustifiable. (If you are going to be handed a cube, and all you know is that it has a volume less than 1 cm3, then setting maximum entropy priors over volumes gives different answers than setting maximum entropy priors over side areas or side lengths. This means that what qualifies as “maximally uncertain” depends on whether we frame our reasoning in terms of side length, areas, or cube volume. Other approaches besides MaxEnt have similar problems of concept dependence.) ### How should we deal with infinities in decision theory? I wrote about this here, here, here, and here. The basic problem is that expected utility theory does great at delivering reasonable answers when the rewards are finite, but becomes wacky when the rewards become infinite. There are a huge amount of examples of this. For instance, in the St. Petersburg paradox, you are given the option to play a game with an infinite expected payout, suggesting that you should buy in to the game no matter how high the cost. You end up making obviously irrational choices, such as spending$1,000,000 on the hope that a fair coin will land heads 20 times in a row. Variants of this involve the inability of EU theory to distinguish between obviously better and worse bets that have infinite expected value.

And Pascal’s mugging is an even worse case. Roughly speaking, a person comes up to you and threatens you with infinite torture if you don’t submit to them and give them 20 dollars. Now, the probability that this threat is credible is surely tiny. But it is non-zero! (as long as you don’t think it is literally logically impossible for this threat to come true)

An infinite penalty times a finite probability is still an infinite expected penalty. So we stand to gain an infinite expected utility by just handing over the 20 dollars. This seems ridiculous, but I don’t know any reasonable formalization of decision theory that allows me to refute it.

### Is causality fundamental?

Causality has been nicely formalized by Pearl’s probabilistic graphical models. This is a simple extension of probability theory, out of which naturally falls causality and counterfactuals.

One can use this framework to represent the states of fundamental particles and how they change over time and interact with one another. What I’m confused about is that in some ways of looking at it, the causal relations appear to be useful but un-fundamental constructs for the sake of easing calculations. In other ways of looking at it, causal relations are necessarily built into the structure of the world, and we can go out and empirically discover them. I don’t know which is right. (Sorry for the vagueness in this one – it’s confusing enough to me that I have trouble even precisely phrasing the dilemma).

### How should we deal with the apparent dependence of inductive reasoning upon our choices of concepts?

I’ve written about this here. Beyond just the problem of concept-dependence in our choices of priors, there’s also the problem presented by the grue/bleen thought experiment.

This thought experiment proposes two new concepts: grue (= the set of things that are either green before 2100 or blue after 2100) and bleen (the inverse of grue). It then shows that if we reasoned in terms of grue and bleen, standard induction would have us concluding that all emeralds will suddenly turn blue after 2100. (We repeatedly observed them being grue before 2100, so we should conclude that they will be grue after 2100.)

In other words, choose the wrong concepts and induction breaks down. This is really disturbing – choices of concepts should be merely pragmatic matters! They shouldn’t function as fatal epistemic handicaps. And given that they appear to, we need to develop some criterion we can use to determine what concepts are good and what concepts are bad.

The trouble with this is that the only proposals I’ve seen for such a criterion reference the idea of concepts that “carve reality at its joints”; in other words, the world is composed of green and blue things, not grue and bleen things, so we should use the former rather than the latter. But this relies on the outcome of our inductive process to draw conclusions about the starting step on which this outcome depends!

I don’t know how to cash out “good choices of concepts” without ultimately reasoning circularly. I also don’t even know how to make sense of the idea of concepts being better or worse for more than merely pragmatic reasons.

### How should we reason about self defeating beliefs?

The classic self-defeating belief is “This statement is a lie.” If you believe it, then you are compelled to disbelieve it, eliminating the need to believe it in the first place. Broadly speaking, self-defeating beliefs are those that undermine the justifications for belief in them.

Here’s an example that might actually apply in the real world: Black holes glow. The process of emission is known as Hawking radiation. In principle, any configuration of particles with a mass less than the black hole can be emitted from it. Larger configurations are less likely to be emitted, but even configurations such as a human brain have a non-zero probability of being emitted. Henceforth, we will call such configurations black hole brains.

Now, imagine discovering some cosmological evidence that the era in which life can naturally arise on planets circling stars is finite, and that after this era there will be an infinite stretch of time during which all that exists are black holes and their radiation. In such a universe, the expected number of black hole brains produced is infinite (a tiny finite probability multiplied by an infinite stretch of time), while the expected number of “ordinary” brains produced is finite (assuming a finite spatial extent as well).

What this means is that discovering this cosmological evidence should give you an extremely strong boost in credence that you are a black hole brain. (Simply because most brains in your exact situation are black hole brains.) But most black hole brains have completely unreliable beliefs about their environment! They are produced by a stochastic process which cares nothing for producing brains with reliable beliefs. So if you believe that you are a black hole brain, then you should suddenly doubt all of your experiences and beliefs. In particular, you have no reason to think that the cosmological evidence you received was veridical at all!

I don’t know how to deal with this. It seems perfectly possible to find evidence for a scenario that suggests that we are black hole brains (I’d say that we have already found such evidence, multiple times). But then it seems we have no way to rationally respond to this evidence! In fact, if we do a naive application of Bayes’ theorem here, we find that the probability of receiving any evidence in support of black hole brains to be 0!

So we have a few options. First, we could rule out any possible skeptical scenarios like black hole brains, as well as anything that could provide any amount of evidence for them (no matter how tiny). Or we could accept the possibility of such scenarios but face paralysis upon actually encountering evidence for them! Both of these seem clearly wrong, but I don’t know what else to do.

### How should we reason about our own existence and indexical statements in general?

This is called anthropic reasoning. I haven’t written about it on this blog, but expect future posts on it.

A thought experiment: imagine a murderous psychopath who has decided to go on an unusual rampage. He will start by abducting one random person. He rolls a pair of dice, and kills the person if they land snake eyes (1, 1). If not, he lets them free and hunts down ten new people. Once again, he rolls his pair of die. If he gets snake eyes he kills all ten. Otherwise he frees them and kidnaps 100 new people. On and on until he eventually gets snake eyes, at which point his murder spree ends.

Now, you wake up and find that you have been abducted. You don’t know how many others have been abducted alongside you. The murderer is about to roll the dice. What is your chance of survival?

Your first thought might be that your chance of death is just the chance of both dice landing 1: 1/36. But think instead about the proportion of all people that are ever abducted by him that end up dying. This value ends up being roughly 90%! So once you condition upon the information that you have been captured, you end up being much more worried about your survival chance.

But at the same time, it seems really wrong to be watching the two dice tumble and internally thinking that there is a 90% chance that they land snake eyes. It’s as if you’re imagining that there’s some weird anthropic “force” pushing the dice towards snake eyes. There’s way more to say about this, but I’ll leave it for future posts.

# Things I’ve become un-puzzled about

### Newcomb’s problem – one box or two box?

To almost everyone, it is perfectly clear and obvious what should be done. The difficulty is that these people seem to divide almost evenly on the problem, with large numbers thinking that the opposing half is just being silly.

– Nozick, 1969

I’ve spent months and months being hopelessly puzzled about Newcomb’s problem. I now am convinced that there’s an unambiguous right answer, which is to take the one box. I wrote up a dialogue here explaining the justification for this choice.

In a few words, you should one-box because one-boxing makes it nearly certain that the simulation of you run by the predictor also one-boxed, thus making it nearly certain that you will get 1 million dollars. The dependence between your action and the simulation is not an ordinary causal dependence, nor even a spurious correlation – it is a logical dependence arising from the shared input-output structure. It is the same type of dependence that exists in the clone prisoner dilemma, where you can defect or cooperate with an individual you are assured is identical to you in every single way. When you take into account this logical dependence (also called subjunctive dependence), the answer is unambiguous: one-boxing is the way to go.

# Summing up:

Things I remain conceptually confused about:

• Consciousness
• Decision theory & free will
• Objective priors
• Infinities in decision theory
• Fundamentality of causality
• Dependence of induction on concept choice
• Self-defeating beliefs
• Anthropic reasoning

# 100 prisoners problem

I’m in the mood for puzzles, so here’s another one. This one is so good that it deserves its own post.

The setup (from wiki):

The director of a prison offers 100 death row prisoners, who are numbered from 1 to 100, a last chance. A room contains a cupboard with 100 drawers. The director randomly puts one prisoner’s number in each closed drawer. The prisoners enter the room, one after another. Each prisoner may open and look into 50 drawers in any order. The drawers are closed again afterwards.

If, during this search, every prisoner finds his number in one of the drawers, all prisoners are pardoned. If just one prisoner does not find his number, all prisoners die. Before the first prisoner enters the room, the prisoners may discuss strategy—but may not communicate once the first prisoner enters to look in the drawers. What is the prisoners’ best strategy?

Suppose that each prisoner selects 50 at random, and don’t coordinate with one another. Then the chance that any particular prisoner gets their own number is 50%. This means that the chance that all 100 get their own number is 1/2¹⁰⁰.

Let me emphasize how crazily small this is. 1/2¹⁰⁰ is 1/1,267,650,600,228,229,401,496,703,205,376; less than one in a decillion. If there were 100 prisoners trying exactly this setup every millisecond, it would take them 40 billion billion years to get out alive once. This is 3 billion times longer than the age of the universe.

Okay, so that’s a bad strategy. Can we do better?

It’s hard to imagine how… While the prisoners can coordinate beforehand, they cannot share any information. So every time a prisoner comes in for their turn at the drawers, they are in exactly the same state of knowledge as if they hadn’t coordinated with the others.

Given this, how could we possibly increase the survival chance beyond 1/2¹⁰⁰?

(…)

(Try to answer for yourself before continuing)

(…)

Let’s consider a much simpler case. Imagine we have just two prisoners, two drawers, and each one can only open one of them. Now if both prisoners choose randomly, there’s only a 1 in 4 chance that they both survive.

What if they agree to open the same drawer? Then they have reduced their survival chance from 25% to 0%! Why? Because by choosing the same drawer, they either both get the number 1, or they both get the number 2. In either case, they are guaranteed that only one of them gets their own number.

So clearly the prisoners can decrease the survival probability by coordinating beforehand. Can they increase it?

Yes! Suppose that they agree to open different drawers. Then this doubles their survival chance from 25% to 50%. Either they both get their own number, or they both get the wrong number.

The key here is to minimize the overlap between the choices of the prisoners. Unfortunately, this sort of strategy doesn’t scale well. If we have four prisoners, each allowed to open two drawers, then random drawing gives a 1/16 survival chance.

Let’s say they open according to the following scheme: 12, 34, 13, 24 (first prisoner opens drawers 1 and 2, second opens 3 and 4, and so on). Then out of the 24 possible drawer layouts, the only layouts that work are 1432 and 3124:

1234 1243 1324 1342 1423 1432
2134 2143 2314 2341 2413 2431
3124 3142 3214 3241 3412 3421
4123 4132 4213 4231 4312 4321

This gives a 1/12 chance of survival, which is better but not by much.

What if instead they open according to the following scheme: (12, 23, 34, 14)?

1234 1243 1324 1342 1423 1432
2134 2143 2314 2341 2413 2431
3124 3142 3214 3241 3412 3421
4123 4132 4213 4231 4312 4321

Same thing: a 1/12 chance of survival.

Scaling this up to 100 prisoners, the odds of survival look pretty measly. Can they do better than this?

(…)

(Try to answer for yourself before continuing)

(…)

It turns out that yes, there is a strategy that does better at ensuring survival. In fact, it does so much better that the survival chance is over 30 percent!

Take a moment to boggle at this. Somehow we can leverage the dependency induced by the prisoners’ coordination to increase the chance of survival by a factor of one decillion, even though none of their states of knowledge are any different. It’s pretty shocking to me that this is possible.

Here’s the strategy: Each time a prisoner opens a drawer, they consult the number in that drawer to determine which drawer they will open next. Thus each prisoner only has to decide on the first drawer to open, and all the rest of the drawers follow from this. Importantly, the prisoner only knows the first drawer they’ll pick; the other 49 are determined by the distribution of numbers in the drawers.

We can think about each drawer as starting a chain through the other drawers. These chains always cycle back into the starting number, the longest possible cycle being 100 numbers and the shortest being 1. Now, each prisoner can guarantee that they are in a cycle that contains their own number by choosing the drawer corresponding to their own number!

So, the strategy is that Prisoner N starts by choosing Drawer N, looking at the number within, then choosing the drawer labeled with that number. Repeat 50 times per each prisoner.

The wiki page has a good description of how to calculate the survival probability with this strategy:

The prison director’s assignment of prisoner numbers to drawers can mathematically be described as a permutation of the numbers 1 to 100. Such a permutation is a one-to-one mapping of the set of natural numbers from 1 to 100 to itself. A sequence of numbers which after repeated application of the permutation returns to the first number is called a cycle of the permutation. Every permutation can be decomposed into disjoint cycles, that is, cycles which have no common elements.

In the initial problem, the 100 prisoners are successful if the longest cycle of the permutation has a length of at most 50. Their survival probability is therefore equal to the probability that a random permutation of the numbers 1 to 100 contains no cycle of length greater than 50. This probability is determined in the following.

A permutation of the numbers 1 to 100 can contain at most one cycle of length . There are exactly  ways to select the numbers of such a cycle. Within this cycle, these numbers can be arranged in  ways since there are  permutations to represent distinct cycles of length  because of cyclic symmetry. The remaining numbers can be arranged in  ways. Therefore, the number of permutations of the numbers 1 to 100 with a cycle of length  is equal to

The probability, that a (uniformly distributed) random permutation contains no cycle of length greater than 50 is calculated with the formula for single events and the formula for complementary events thus given by