# Backwards induction and rationality

A fun problem I recently came across:

Consider two players: Alice and Bob. Alice moves first. At the start of the game, Alice has two piles of coins in front of her: one pile contains 4 coins and the other pile contains 1 coin. Each player has two moves available: either “take” the larger pile of coins and give the smaller pile to the other player or “push” both piles across the table to the other player. Each time the piles of coins pass across the table, the quantity of coins in each pile doubles. For example, assume that Alice chooses to “push” the piles on her first move, handing the piles of 1 and 4 coins over to Bob, doubling them to 2 and 8. Bob could now use his first move to either “take” the pile of 8 coins and give 2 coins to Alice, or he can “push” the two piles back across the table again to Alice, again increasing the size of the piles to 4 and 16 coins. The game continues for a fixed number of rounds or until a player decides to end the game by pocketing a pile of coins.

(from the wiki)

(Assume that if the game gets to the final round and the last player decides to “push”, the pot is doubled and they get the smaller pile.)

Assuming that they are self-interested, what do you think is the rational strategy for each of Alice and Bob to adopt? What is the rational strategy if they each know that the other reasons about decision-making in the same way that they themselves do? And what happens if two updateless decision theorists are pitted against each other?

If you have some prior familiarity with game theory, you might have seen the backwards induction proof right away. It turns out that standard game theory teaches us that the Nash equilibrium is to defect as soon as you can, thus never exploiting the “doubling” feature of the setup.

Why? Supposing that you have made it to the final round of the game, you stand to get a larger payout by “defecting” and taking the larger pile rather than the doubled smaller pile. But your opponent knows that you’ll reason this way, so they reason that they are better off defecting the round before… and so on all the way to the first round.

This sucks. The game ends right away, and none of that exponential goodness gets taken advantage of. If only Alice and Bob weren’t so rational!

We can show that this conclusion follows as long as the three things are true of Alice and Bob:

1. Given a choice between a definite value A and a smaller value B, both Alice and Bob will choose the larger value (A).
2. Both Alice and Bob can accurately perform deductive reasoning.
3. Both (1.) and (2.) are common knowledge to Alice and Bob.

It’s pretty hard to deny the reasonableness of any of these three assumptions!

Here’s a related problem:

An airline loses two suitcases belonging to two different travelers. Both suitcases happen to be identical and contain identical antiques. An airline manager tasked to settle the claims of both travelers explains that the airline is liable for a maximum of $100 per suitcase—he is unable to find out directly the price of the antiques. To determine an honest appraised value of the antiques, the manager separates both travelers so they can’t confer, and asks them to write down the amount of their value at no less than$2 and no larger than $100. He also tells them that if both write down the same number, he will treat that number as the true dollar value of both suitcases and reimburse both travelers that amount. However, if one writes down a smaller number than the other, this smaller number will be taken as the true dollar value, and both travelers will receive that amount along with a bonus/malus:$2 extra will be paid to the traveler who wrote down the lower value and a $2 deduction will be taken from the person who wrote down the higher amount. The challenge is: what strategy should both travelers follow to decide the value they should write down? (again, from the wiki) Suppose you put no value on honesty, and only care about getting the most money possible. Further, suppose that both travelers reason the same way about decision problems, and that they both know this fact (and that they both know that they both know this fact, and so on). The first intuition you might have is that both should just write down$100. But if you know that your partner is going to write down $100, then you stand to gain one whole dollar by defecting and writing$99 (thus collecting the $2 bonus for a total of$101). But if they know that you’re going to write $99, then they stand to gain one whole dollar by defecting and writing$98 (thus netting $100). And so on. In the end both of these unfortunate “rational” individuals end up writing down$2. Once again, we see the tragedy of being a rational individual.

Of course, we could take these thought experiments to be an indication not of the inherent tragedy of rationality, but instead of the need for a better theory of rationality.

For instance, you might have noticed that the arguments we used in both cases relied on a type of reasoning where each agent assumes that they can change their decision, holding fixed the decision of the other agent. This is not a valid move in general, as it assumes independence! It might very well be that the information about what decision you make is relevant to your knowledge about what the other agent’s decision will be. In fact, when we stipulated that you reason similarly to the other agent, we are in essence stipulating an evidential relationship between your decision and theirs! So the arguments we gave above need to be looked at more closely.

If the agents do end up taking into account their similarity, then their behavior is radically different. For example, we can look at the behavior of updateless decision theory: two UDTs playing each other in the Centipede game “push” every single round (including the final one!), thus ending up with exponentially higher rewards (on the order of $2N, where N is the number of rounds). And two UDTs in the Traveller’s Dilemma would write down$100, thus both ending up roughly $98 better off than otherwise. So perhaps we aren’t doomed to a gloomy view of rationality as a burden eternally holding us back! One final problem. Two players, this time with just one pile of coins in front of them. Initially this pile contains just 1 coin. The players take turns, and each turn they can either take the whole pile or push it to the other side, in which case the size of the pile will double. This will continue for a fixed number of rounds or until a player ends the game by taking the pile. On the final round, the last player has a choice of either taking all the coins or pushing them over, thus giving the entire doubled pile to their opponent. Both players are perfectly self-interested, and this fact is common knowledge. And finally, suppose that who goes first is determined by a coin flip. Standard decision theory obviously says that the first person should just take the 1 coin and the game ends there. What would UDT do here? What do you think is the rational policy for each player? # Consistently reflecting on decision theory There are many principles of rational choice that seem highly intuitively plausible at first glance. Sometimes upon further reflection we realize that a principle we initially endorsed is not quite as appealing as we first thought, or that it clashes with other equally plausible principles, or that it requires certain exceptions and caveats that were not initially apparent. We can think of many debates in philosophy as clashes of these general principles, where the thought experiments generated by philosophers serve as the datum that put on display their relative merits and deficits. In this essay, I’ll explore a variety of different principles for rational decision making and consider the ways in which they satisfy and frustrate our intuitions. I will focus in especially on the notion of reflective consistency, and see what sort of decision theory results from treating this as our primary desideratum. I want to start out by illustrating the back-and-forth between our intuitions and the general principles we formulate. Consider the following claim, known as the dominance principle: If a rational agent believes that doing A is better than not doing A in every possible world, then that agent should do A even if uncertain about which world they are in. Upon first encountering this principle, it seems perfectly uncontroversial and clearly valid. But now consider the following application of the dominance principle: “A student is considering whether to study for an exam. He reasons that if he will pass the exam, then studying is wasted effort. Also, if he will not pass the exam, then studying is wasted effort. He concludes that because whatever will happen, studying is wasted effort, it is better not to study.” (Titelbaum 237) The fact that this is clearly a bad argument casts doubt on the dominance principle. It is worth taking a moment to ask what went wrong here. How did this example turn our intuitions on their heads so completely? Well, the flaw in the student’s line of reasoning was that he was ignoring the effect of his studying on whether or not he ends up passing the exam. This dependency between his action and the possible world he ends up in should be relevant to his decision, and it apparently invalidates his dominance reasoning. A restricted version of the dominance principle fixes this flaw: If a rational agent prefers doing A to not doing A in all possible worlds and which world they are in is independent of whether they do A or not, then that agent should do A even if they are uncertain about which world they are in. I’ll call this the simple dominance principle. This principle is much harder to disagree with than our starting principle, but the caveat about independence greatly limits its scope. It applies only when our uncertainty about the state of the world is independent of our decision, which is not the case in most interesting decision problems. We’ll see by the end of this essay that even this seemingly obvious principle can be made to conflict with another intuitively plausible principle of rational choice. The process of honing our intuitions and fine-tuning our principles like this is sometimes called seeking reflective consistency, where reflective consistency is the hypothetical state you end up in after a sufficiently long period of consideration. Reflective consistency is achieved when you have reached a balance between your starting intuitions and other meta-level desiderata like consistency and simplicity, such that your final framework is stable against further intuition-pumping. This process has parallels in other areas of philosophy such as ethics and epistemology, but I want to suggest that it is particularly potent when applied to decision theory. The reason for this is that a decision theory makes recommendations for what action to take in any given setup, and we can craft setups where the choice to be made is about what decision theory to adopt. I’ll call these setups self-reflection problems. By observing what choices a decision theory makes in self-reflection problems, we get direct evidence about whether the decision theory is reflectively consistent or not. In other words, we don’t need to do all the hard work of allowing thought experiments to bump our intuitions around; we can just take a specific decision algorithm and observe how it behaves upon self-reflection! What we end up with is the following principle: Whatever decision theory we end up endorsing should be self-recommending. We should not end up in a position where we endorse decision theory X as the final best theory of rational choice, but then decision theory X recommends that we abandon it for some other decision theory that we consider less rational. The connection between self-recommendation and reflective consistency is worth fleshing out in a little more detail. I am not saying that self-recommendation is sufficient for reflective consistency. A self-recommending decision theory might be obviously in contradiction with our notion of rational choice, such that any philosopher considering this decision theory would immediately discard it as a candidate. Consider, for instance, alphabetical decision theory, which always chooses the option which comes alphabetically first in its list of choices. When faced with a choice between alphabetical decision theory and, say, evidential decision theory, alphabetical decision theory will presumably choose itself, reasoning that ‘a’ comes before ‘e’. But we don’t want to call this a virtue of alphabetical decision theory. Even if it is uniformly self-recommending, alphabetical decision theory is sufficiently distant from any reasonable notion of rational choice that we can immediately discard it as a candidate. On the other hand, even though not all self-recommending theories are reflectively consistent, any reflectively consistent decision theory must be self-recommending. Self-recommendation is a necessary but not sufficient condition for an adequate account of rational choice. Now, it turns out that this principle is too strong as I’ve phrased it and requires a few caveats. One issue with it is what I’ll call the problem of unfair decision problems. For example, suppose that we are comparing evidential decision theory (henceforth EDT) to causal decision theory (henceforth CDT). (For the sake of time and space, I will assume as background knowledge the details of how each of these theories work.) We put each of them up against the following self reflection problem: An omniscient agent peeks into your brain. If they see that you are an evidential decision theorist, they take all your money. Otherwise they leave you alone. Before they peek into your brain, you have the ability to modify your psychology such that you become either an evidential or causal decision theorist. What should you do? EDT reasons as follows: If I stay an EDT, I lose all my money. If I self-modify to CDT, I don’t. I don’t want to lose all my money, so I’ll self-modify to CDT. So EDT is not self-recommending in this setup. But clearly this is just because the setup is unfairly biased against EDT, not because of any intrinsic flaw in EDT. In fact, it’s a virtue of a decision theory to not be self-recommending in such circumstances, as doing so indicates a basic awareness of the payoff structure of the world it faces. While this certainly seems like the right thing to say about this particular decision problem, we need to consider how exactly to formalize this intuitive notion of “being unfairly biased against a decision theory.” There are a few things we might say here. For one, the distinguishing feature of this setup seems to be that the payout is determined not based off the decision made by an agent, but by their decision theory itself. This seems to be at the root of the intuitive unfairness of the problem; EDT is being penalized not for making a bad decision, but simply for being EDT. A decision theory should be accountable for the decisions it makes, not for simply being the particular decision theory that it happens to be. In addition, by swapping “evidential decision theory” and “causal decision theory” everywhere in the setup, we end up arriving at the exact opposite conclusion (evidential decision theory looks stable, while causal decision theory does not). As long as we don’t have any a priori reason to consider one of these setups more important to take into account than the other, then there is no net advantage of one decision theory over the other. If a decision problem belongs to a set of equally a priori important problems obtained by simply swapping out the terms for different decision theories, and no decision theory comes out ahead on the set as a whole, then perhaps we can disregard the entire set for the purposes of evaluating decision theories. The upshot of all of this is that what we should care about is decision problems that don’t make any direct reference to a particular decision theory, only to decisions. We’ll call such problems decision-determined. Our principle then becomes the following: Whatever decision theory we end up endorsing should be self-recommending in all decision-determined problems. There’s certainly more to be said about this principle and if any other caveats need be applied to it, but for now let’s move on to seeing what we end up with when we apply this principle in its current state. We’ll start out with an analysis of the infamous Newcomb problem. You enter a room containing two boxes, one opaque and one transparent. The transparent box contains$1,000. The opaque box contains either $0 or$1,000,000. Your choice is to either take just the opaque box (one-box) or to take both boxes (two-box). Before you entered the room, a predictor scanned your brain and created a simulation of you to see what you would do. If the simulation one-boxed, then the predictor filled the opaque box with $1,000,000. If the simulation two-boxed, then the opaque box was left empty. What do you choose? EDT reasons as follows: If I one-box, then this gives me strong evidence that I have the type of brain that decides to one-box, which gives me strong evidence that the predictor’s simulation of me one-boxed, which in turn gives me strong evidence that the opaque box is full. So if I one-box, I expect to get$1,000,000. On the other hand, if I two-box, then this gives me strong evidence that my simulation two-boxed, in which case the opaque box is empty. So if I two-box, I expect to get only $1,000. Therefore one-boxing is better than two-boxing. CDT reasons as follows: Whether the opaque box is full or empty is already determined by the time I entered the room, so my decision has no causal effect upon the box’s contents. And regardless of the contents of the box, I always expect to leave$1,000 richer by two-boxing than by one-boxing. So I should two-box.

At this point it’s important to ask whether Newcomb’s problem is a decision determined problem. After all, the predictor decides whether to fill the transparent box by scanning your brain and stimulating you. Isn’t that suspiciously similar to our earlier example of penalizing agents based off their decision theory? No. The simulator decides what to do not by evaluating your decision theory, but by its prediction about your decision. You aren’t penalized for being a CDT, just for being the type of agent that one-boxes. To see this you only need to observe that any decision theory that one-boxes would be treated identically to CDT in this problem. The determining factor is the decision, not the decision theory.

Now, let’s make the Newcomb problem into a test of reflective consistency. Instead of your choice being about whether to one-box or to two-box while in the room, your choice will now take place before you enter the room, and will be about whether to be an evidential decision theorist or a causal decision theorist when in the room. What does each theory do?

EDT’s reasoning: If I choose to be an evidential decision theorist, then I will one-box when in the room. The predictor will simulate me as one-boxing, so I’ll end up walking out with $1,000,000. If I choose to be a causal decision theorist, then I will two-box when in the room, the predictor will predict this, and I’ll walk out with only$1,000. So I will stay an EDT.

Interestingly, CDT agrees with this line of reasoning. The decision to be an evidential or causal decision theorist has a causal effect on how the predictor’s simulation behaves, so a causal decision theorist sees that the decision to stay a causal decision theorist will end up leaving them worse off than if they had switched over. So CDT switches to EDT. Notice that in CDT’s estimation, the decision to switch ends up making them $999,000 better off. This means that CDT would pay up to$999,000 just for the privilege of becoming an evidential decision theorist!

I think that looking at an actual example like this makes it more salient why reflective consistency and self-recommendation is something that we actually care about. There’s something very obviously off about a decision theory that knows beforehand that it will reliably perform worse than its opponent, so much so that it would be willing to pay up to $999,000 just for the privilege of becoming its opponent. This is certainly not the type of behavior that we associate with a rational agent that trusts itself to make good decisions. Classically, this argument has been phrased in the literature as the “why ain’tcha rich?” objection to CDT, but I think that the objection goes much deeper than this framing would suggest. There are several plausible principles that all apply here, such as that a rational decision maker shouldn’t regret having the decision theory they have, a rational decision maker shouldn’t pay to limit their future options, and a rational decision maker shouldn’t pay to decrease the values in their payoff matrix. The first of these is fairly self-explanatory. One influential response to it has been from James Joyce, who said that the causal decision theorist does not regret their decision theory, just the situation they find themselves in. I’d suggest that this response makes little sense when the situation the find themselves in is a direct result of their decision theory. As for the second and third of these, we could imagine giving a causal decision theorist the choice to pay money to remove the future possibility of two-boxing, or to hire a gunman that would shoot them if they try to take the transparent box. In each of these cases, CDT would endorse the decision to pay. I mention these other principles just to suggest that we have sound philosophical reason to care about self-recommendation independent of the reflective consistency considerations that we started out by discussing. The takeaway from this modified Newcomb problem is that CDT is not reflectively consistent, and that it will self-modify to EDT in the right circumstances. Does this mean that EDT is more reflectively consistent than CDT? It turns out that no, this is not the case. We can construct another thought experiment in which EDT self-modifies to become CDT, called the Newcomb’s Soda problem: You were just given either Soda 1 or Soda 2, which are indistinguishable. There was a 50% chance of each being the type you were given. You are now given a choice as to whether to choose chocolate ice cream or vanilla ice cream. Those that had Soda 1 have a strong unconscious inclination to choose chocolate ice cream and will be given$1,000,000 after they choose the ice cream. Those that had Soda 2 have a strong unconscious inclination to choose vanilla ice cream and are given nothing. If you choose vanilla ice cream, you get $1,000. What do you choose? EDT reasons that choosing chocolate ice cream gives them strong evidence that they were given Soda 1, in which case they will be given$1,000,000. So they would be willing to give up the $1000 reward for choosing vanilla ice cream, in order to increase their chances of getting the million. CDT reasons that what soda they were given is a fixed fact that isn’t causally influenced by the decision they make. So they have a 50% chance of getting the million regardless of their choice, but choosing to take the vanilla guarantees them an additional$1000. Therefore they take vanilla.

This is clearly a decision-determined problem, so let’s see how EDT and CDT do upon self-reflection. Given the chance to modify themselves to EDT before being given the soda, a causal decision theorist would see no advantage to doing so. After all, being an evidential decision theorist doesn’t make you more or less likely to get the $1,000,000, it just ensures that you won’t get the$1000. And the evidential decision theorist agrees! By changing to causal decision theory, they guarantee themselves an extra $1000 while not making it any more or less likely that they’ll be given the chocolate soda (and accordingly, the million dollars). The upshot of this is that both CDT and EDT are reflective inconsistent in the class of decision determined problems. What we seek, then, is a new decision theory that behaves like EDT in the Newcomb problem and like CDT in Newcomb’s Soda. One such theory was pioneered by machine learning researcher Eliezer Yudkowsky, who named it timeless decision theory (henceforth TDT). To deliver different verdicts in the two problems, we must find some feature that allows us to distinguish between their structure. TDT does this by distinguishing between the type of correlation arising from ordinary common causes (like the soda in Newcomb’s Soda) and the type of correlation arising from faithful simulations of your behavior (as in Newcomb’s problem). This second type of correlation is called logical dependence, and is the core idea motivating TDT. The simplest example of this is the following: two twins, physically identical down to the atomic level, raised in identical environments in a deterministic universe, will have perfectly correlated behavior throughout the lengths of their lives, even if they are entirely causally separated from each other. This correlation is apparently not due to a common cause or to any direct causal influence. It simply arises from the logical fact that two faithful instantiations of the same function will return the same output when fed the same input. Considering the behavior of a human being as an instantiation of an extremely complicated function, it becomes clear why you and your parallel-universe twin behave identically: you are instantiations of the same function! We can take this a step further by noting that two functions can have a similar input-output structure, in which case the physical instantiations of each function will have correlated input-output behavior. This correlation is what’s meant by logical dependence. To spell this out a bit further, imagine that in a far away country, there are factories that sell very simple calculators. Each calculator is designed to only run only one specific computation. Some factories are multiplication-factories; they only sell calculators that compute 713*291. Others are addition-factories; they only sell calculators that compute 713+291. You buy two calculators from one of these factories, but you’re not sure which type of factory you purchased from. Your credences are 50/50 split between the factory you purchased from being a multiplication-factory and it being an addition-factory. You also have some logical uncertainty regarding what the value of 713*291 is. You are evenly split between the value being 207,481 and the value being 207,483. On the other hand, you have no uncertainty about what the value of 713+291 is; you know that it is 1004. Now, you press “ENTER” on one of the two calculators you purchased, and find that the result is 207,483. For a rational reasoner, two things should now happen: First, you should treat this result as strong evidence that the factory from which both calculators were bought was a multiplication-factory, and therefore that the other calculator is also a multiplier. And second, you should update strongly on the other calculator outputting 207,483 rather than 207,481, since two calculators running the same computation will output the same result. The point of this example is that it clearly separates out ordinary common cause correlation from a different type of dependence. The common cause dependence is what warrants you updating on the other calculator being a multiplier rather than an adder. But it doesn’t warrant you updating on the result on the other calculator being specifically 207,483; to do this, we need the notion of logical dependence, which is the type of dependence that arises whenever you encounter systems that are instantiating the same or similar computations. Connecting this back to decision theory, TDT treats our decision as the output of a formal algorithm, which is our decision-making process. The behavior of this algorithm is entirely determined by its logical structure, which is why there are no upstream causal influences such as the soda in Newcomb’s Soda. But the behavior of this algorithm is going to be correlated with the parts of the universe that instantiate a similar function (as well as the parts of the universe it has a causal influence on). In Newcomb’s problem, for example, the predictor generates a detailed simulation of your decision process based off of a brain scan. This simulation of you is highly logically correlated with you, in that it will faithfully reproduce your behavior in a variety of situations. So if you decide to one-box, you are also learning that your simulation is very likely to one-box (and therefore that the opaque box is full). Notice that the exact mechanism by which the predictor operates becomes very important for TDT. If the predictor operates by means of some ordinary common cause where no logical dependence exists, TDT will treat its prediction as independent of your choice. This translates over to why TDT behaves like CDT on Newcomb’s Soda, as well as other so-called “medical Newcomb problems” such as the smoking lesion problem. When the reason for the correlation between your behavior and the outcome is merely that both depend on a common input, TDT treats your decision as an intervention and therefore independent of the outcome. One final way to conceptualize TDT and the difference between the different types of correlation is using structural equation modeling: Direct causal dependence exists between A and B when A is a function of B or when B is a function of A. > A = f(B) or B = g(A) Common cause dependence exists between A and B when A and B are both functions of some other variable C. > A = f(C) and B = g(C) Logical dependence exists between A and B when A and B depend on their inputs in similar ways. > A = f(C) and B = f(D) TDT takes direct causal dependence and logical dependence seriously, and ignores common cause dependence. We can formally express this by saying that TDT calculates the expected utility of a decision by treating it like a causal intervention and fixing the output of all other instantiations of TDT to be identical interventions. Using Judea Pearl’s do-calculus notation for causal intervention, this looks like: Here K is the TDT agent’s background knowledge, D is chosen from a set of possible decisions, and the sum is over all possible worlds. This equation isn’t quite right, since it doesn’t indicate what to do when the computation a given system instantiates is merely similar to TDT but not logically identical, but it serves as a first approximation to the algorithm. You might notice that the notion of logical dependence depends on the idea of logical uncertainty, as without it the result of the computations would be known with certainty as soon as you learn that the calculators came out of a multiplication-factory, without ever having to observe their results. Thus any theory that incorporates logical dependence into its framework will be faced with a problem of logical omniscience, which is to say, it will have to give some account of how to place and update reasonable probability distributions over tautologies. The upshot of all of this is that TDT is reflectively consistent on a larger class of problems than both EDT and CDT. Both EDT and CDT would self-modify into TDT in Newcomb-like problems if given the choice. Correspondingly, if you throw a bunch of TDTs and EDTs and CDTs into a world full of Newcomb and Newcomb-like problems, the TDTs will come out ahead. However, it turns out that TDT is not itself reflectively consistent on the whole class of decision-determined problems. Examples like the transparent Newcomb problem, Parfit’s hitchhiker, and counterfactual mugging all expose reflective inconsistency in TDT. Let’s look at the transparent Newcomb problem. The structure is identical to a Newcomb problem (you walk into a room with two boxes,$1000 in one and either $1000000 or$0 in the other, determined based on the behavior of your simulation), except that both boxes are transparent. This means that you already know with certainty the contents of both boxes. CDT two-boxes here like always. EDT also two-boxes, since any dependence between your decision and the box’s contents is made irrelevant as soon as you see the contents. TDT agrees with this line of reasoning; even though it sees a logical dependence between your behavior and your simulation’s behavior, knowing whether the box is full or empty fully screens off this dependence.

Two-boxing feels to many like the obvious rational choice here. The choice you face is simply whether to take $1,000,000 or$1,001,000 if the box is full. If it’s empty, your choice is between taking $1,000 or walking out empty-handed. But two-boxing also has a few strange consequences. For one, imagine that you are placed, blindfolded, in a transparent Newcomb problem. You can at any moment decide to remove your blindfold. If you are an EDT, you will reason that if you don’t remove your blindfold, you are essentially in an ordinary Newcomb problem, so you will one-box and correspondingly walk away with$1,000,000. But if you do remove your blindfold, you’ll end up two-boxing and most likely walking away with only $1000. So an EDT would pay up to$999,000, just for the privilege of staying blindfolded. This seems to conflict with an intuitive principle of rational choice, which goes something like: A rational agent should never expect to be worse off by simply gaining information. Paying money to keep yourself from learning relevant information seems like a sure sign of a pathological decision theory.

Of course, there are two ways out of this. One way is to follow the causal decision theorist and two-box in both the ordinary Newcomb problem and the transparent problem. This has all the issues that we’ve already discussed, most prominently that you end up systematically and predictably worse off by doing so. If you pit a causal decision theorist against an agent that always one-boxes, even in transparent Newcomb problems, CDT ends up the poorer. And since CDT can reason this through beforehand, they would willingly self-modify to become the other type of agent.

What type of agent is this? None of the three decision theories we’ve discussed give the reflectively consistent response here, so we need to invent a new decision theory. The difficulty with any such theory is that it has to be able to justify sticking to its guns and one-boxing even after conditioning on the contents of the box.

In general, similar issues will arise whenever the recommendations made by a decision theory are not time-consistent. For instance, the decision that TDT prescribes for an agent with background knowledge K depends heavily on the information that TDT has at the time of prescription. This means that at different times, TDT will make different recommendations for what to do in the same situation (before entering the room TDT recommends one-boxing once in the room, while after entering the room TDT recommends two-boxing). This leads to suboptimal performance. Agents that can decide on one course of action and credibly precommit to it get certain benefits that aren’t available to agents that don’t have this ability. I think the clearest example of this is Parfit’s hitchhiker:

You are stranded in the desert, running out of water, and soon to die. A Predictor approaches and tells you that they will drive you to town only if they predict you will pay them $100 once you get there. All of EDT, CDT, and TDT wish that they could credibly precommit to paying the money once in town, but can’t. Once they are in town they no longer have any reason to pay the$100, since they condition on the fact that they . The fact that EDT, CDT, and TDT all have time-sensitive recommendations makes them worse off, leaving them all stranded in the desert to die. Each of these agents would willingly switch to a decision theory that doesn’t change their recommendations over time. How would such a decision theory work? It looks like we need a decision theory that acts as if it doesn’t know whether they’re in town even once in town, and acts as if it doesn’t know the contents of the box even after seeing them. One strategy for achieving this behavior is simple; you just decide on your strategy without ever conditioning on the fact that you are in town!

The decision theory that arises from this choice is appropriately named updateless decision theory (henceforth UDT). UDT is peculiar in that it never actually updates on any information when determining how to behave. That’s not to say that for UDT, the decision you make does not depend on the information you get through your lifetime. Instead, UDT tells you to choose a policy – a mapping from the possible pieces of information you might receive to possible decisions you could make – that maximizes the expected utility, calculated using your prior on possible worlds. This policy is set from time zero and never changes, and it determines how the UDT agent responds to any information they might receive at later points. So, for instance, a UDT agent reasons that adopting a policy of one-boxing in the transparent Newcomb case regardless of what you see maximizes expected utility as calculated using your prior. So once the UDT agent is in the room with the transparent box, it one-boxes. We can formalize this this by analogy with TDT:

One concern with this approach is that a UDT agent might end up making silly decisions as a result of not taking into account information that is relevant to their decisions. But once again, the UDT agent does take into account the information they learn in their lifetime. It’s merely that they decide what to do with that information before receiving it and never update this prescription. For example, suppose that a UDT agent anticipates facing exactly one decision problem in their life, regarding whether to push a button or not. They have a 50% prior credence that pushing the button will result in the loss of $10, and 50% that it will result in gaining$10. Now, at some point before they decide to push the button, they are given the information about whether pushing the button causes you to gain $10 or to lose$10. UDT deals with this by choosing a policy for how to respond to that information in either case. The expected utility maximizing policy here would be to push the button if you learn that pushing the button leads to gaining $10, and to not push the button if you learn the opposite. Since UDT chooses its preferred policy based on its prior, this recommendation never changes throughout a UDT agent’s lifetime. This seems to indicate that UDT will be self-recommending in the class of all decision-determined problems, although I’m not aware of a full proof of this. If this is correct, then we have reached our goal of finding a self-recommending decision theory. It is interesting to consider what other principles of rational choice ended up being violated along the way. The simple dominance principle that we started off by discussing appears to be an example of this. In the transparent Newcomb problem, there is only one possible world that the agent considers when in the room (the one in which the box is full, say), and in this one world, two-boxing dominates one-boxing. Given that the box is full, your decision to one-box or to two-box is completely independent of the box’s contents. So the simple dominance principle recommends two-boxing. But UDT disagrees. Another example of a deeply intuitive principle that UDT violates is the irrelevance of impossible outcomes. This principles says that impossible outcomes should not factor into your decision-making process. But UDT seems to often recommend acting as if some impossible world might come to be. For instance, suppose a predictor walks up to you and gives you a choice to either give them$10 or to give them $100. You will not face any future consequences on the basis of your decision (besides whether you’re out$100 or only $10). However, you learn that the predictor only approached you because it predicted that you would give the$10. Do you give the $10 or the$100? UDT recommends giving the $100, because agents that do so are less likely to have been approached by the predictor. But if you’ve already been approached, then you are letting considerations about an impossible world influence your decision process! Our quest for reflective consistency took us from EDT and CDT to timeless decision theory. TDT used the notion of logical dependence to get self-recommending behavior in the Newcomb problem and medical Newcomb cases. But we found that TDT was itself reflectively inconsistent in problems like the Transparent Newcomb problem. This led us to create a new theory that made its recommendations without updating on information, which we called updateless decision theory. UDT turned out to be a totally static theory, setting forth a policy determining how to respond to all possible bits of information and never altering this policy. The unchanging nature of UDT indicates the possibility that we have found a truly self-recommending decision theory, while also leading to some quite unintuitive consequences. # Decision Theory Everywhere below where a Predictor is mentioned, assume that their predictions are made by scanning your brain to create a highly accurate simulation of you and then observing what this simulation does. All the below scenarios are one-shot games. Your action now will not influence the future decision problems you end up in. Newcomb’s Problem Two boxes: A and B. B contains$1,000. A contains $1,000,000 if the Predictor thinks you will take just A, and$0 otherwise. Do you take just A or both A and B?

Transparent Newcomb, Full Box
Newcomb problem, but you can see that box A contains $1,000,000. Do you take just A or both A and B? Transparent Newcomb, Empty Box Newcomb problem, but you can see that box A contains nothing. Do you take just A or both A and B? Newcomb with Precommitment Newcomb’s problem, but you have the ability to irrevocably resolve to take just A in advance of the Predictor’s prediction (which will still be just as good if you do precommit). Should you precommit? Take Opaque First Newcomb’s problem, but you have already taken A and it has been removed from the room. Should you now also take B or leave it behind? Smoking Lesion Some people have a lesion that causes cancer as well as a strong desire to smoke. Smoking doesn’t cause cancer and you enjoy it. Do you smoke? Smoking Lesion, Unconscious Inclination Some people have a lesion that causes cancer as well as a strong unconscious inclination to smoke. Smoking doesn’t cause cancer and you enjoy it. Do you smoke? Smoking and Appletinis Drinking a third appletini is the kind of act much more typical of people with addictive personalities, who tend to become smokers. I’d like to drink a third appletini, but I really don’t want to be a smoker. Should I order the appletini? Expensive Hospital You just got into an accident which gave you amnesia. You need to choose to be treated at either a cheap hospital or an expensive one. The quality of treatment in the two is the same, but you know that billionaires, due to unconscious habit will be biased towards using the expensive one. Which do you choose? Rocket Heels and Robots The world contains robots and humans, and you don’t know which you are. Robots rescue people whenever possible and have rockets in their heels that activate whenever necessary. Your friend falls down a mine shaft and will die soon without robotic assistance. Should you jump in after them? Death in Damascus If you and Death are in the same city tomorrow, you die. Death is a perfect predictor, and will come where he predicts you will be. You can stay in Damascus or pay$1000 to ﬂee to Aleppo. Do you stay or ﬂee?

Psychopath Button
If you press a button, all psychopaths will be killed. Only a psychopath would press such a button. Do you press the button?

Parfit’s Hitchhiker
You are stranded in the desert, running out of water, and soon to die. A Predictor will drive you to town only if they predict you will pay them $1000 once you get there. You have been brought into town. Do you pay? XOR Blackmail An honest predictor sends you this letter: “I sent this letter because I predicted that you have termites iﬀ you won’t send me$100. Send me $100.” Do you send the money? Twin Prisoner’s Dilemma You are in a prisoner’s dilemma with a twin of yourself. Do you cooperate or defect? Predictor Extortion A Predictor approaches you and threatens to torture you unless you hand over$100. They only approached you because they predicted beforehand that you would hand over the $100. Do you pay up? Counterfactual Mugging Predictor ﬂips coin which lands heads, and approaches you and asks you for$100. If the coin had landed tails, it would have tortured you if it predicted you wouldn’t give the $100. Do you give? Newcomb’s Soda You have 50% credence that you were given Soda 1, and 50% that you were given Soda 2. Those that had Soda 1 have a strong unconscious inclination to choose chocolate ice cream and will be given$1,000,000. Those that had Soda 2 have a strong unconscious inclination to choose vanilla ice cream and are given nothing. If you choose vanilla ice cream, you get $1000. Do you choose chocolate or vanilla ice cream? Meta-Newcomb Problem Two boxes: A and B. A contains$1,000. Box B will contain either nothing or $1,000,000. What B will contain is (or will be) determined by a Predictor just as in the standard Newcomb problem. Half of the time, the Predictor makes his move before you by predicting what you’ll do. The other half, the Predictor makes his move after you by observing what you do. There is a Metapredictor, who has an excellent track record of predicting Predictor’s choices as well as your own. The Metapredictor informs you that either (1) you choose A and B and Predictor will make his move after you make your choice, or else (2) you choose only B, and Predictor has already made his choice. Do you take only B or both A and B? # Sapiens: How Shared Myths Change the World I recently read Yuval Noah Harari’s book Sapiens and loved it. In additional to fascinating and disturbing details about the evolutionary history of Homo sapiens and a wonderful account of human history, he has a really interesting way of talking about the cognitive abilities that make humans distinct from other species. I’ll dive right into this latter topic in this post. Imagine two people in a prisoner’s dilemma. To try to make it relevant to our ancestral environment, let’s say that they are strangers running into one another, and each see that the other has some resources. There are four possible outcomes. First, they could both cooperate and team up to catch some food that neither would be able to get on their own, and then share the food. Second, they could both defect, attacking each other and both walking away badly injured. And third and fourth, one could cooperate while the other defects, corresponding to one of them stabbing the other in the back and taking their resources. (Let’s suppose that each of the two are currently holding resources of more value than they could obtain by teaming up and hunting.) Now, the problem is that on standard accounts of rational decision making, the decision that maximizes expected reward for each individual is to defect. That’s bad! The best outcome for everybody is that the two team up and share the loot, and neither walks away injured! You might just respond “Well, who cares about what our theory of rational decision making says? Humans aren’t rational.” We’ll come back to this in a bit. But for now I’ll say that the problem is not just that our theory of rationality says that we should defect. It’s that this line of reasoning implies that cooperating is an unstable strategy. Imagine a society fully populated with cooperators. Now suppose an individual appears with a mutation that causes them to defect. This defector outperforms the cooperators, because they get to keep stabbing people in the back and stealing their loot and never have to worry about anybody doing the same to them. The result is then that the “gene for defecting” (speaking very metaphorically at this point; the behavior doesn’t necessarily have to be transmitted genetically) spreads like a virus through the population, eventually transforming our society of cooperators to a society of defectors. And everybody’s worse off. One the other hand, imagine a society full of defectors. What if a cooperator is born into this society? Well, they pretty much right away get stabbed in the back and die out. So a society of defectors stays a society of defectors, and a society of cooperators degenerates into a society of defectors. The technical way of speaking about this is to say that in prisoner’s dilemmas, cooperation is not a Nash equilibrium – a strategy that is stable against mutations when universally adopted. The only Nash equilibrium is universal defection. Okay, so this is all bad news. We have good game theoretic reasons to expect society to degenerate into a bunch of people stabbing each other in the back. But mysteriously, the record of history has humans coming together to form larger and larger cooperative institutions. What Yuval Noah Harari and many others argue is that the distinctively human force that saves us from these game theoretic traps and creates civilizations is the power of shared myths. For instance, suppose that the two strangers happened to share a belief in a powerful all-knowing God that punishes defectors in the afterlife and rewards cooperators. Think about how this shifts the reasoning. Now each person thinks “Even if I successfully defect and loot this other person’s resources, I still will have hell to pay in the afterlife. It’s just not worth it to risk incurring God’s wrath! I’ll cooperate.” And thus we get a cooperative equilibrium! Still you might object “Okay, but what if an atheist is born into this society of God-fearing cooperative people? They’ll begin defecting and successfully spread through the population, right? And then so much for your cooperative equilibrium.” The superbly powerful thing about these shared myths is the way in which they can restructure society around them. So for instance, it would make sense for a society with the cooperator-punishing God myth to develop social norms around punishing defectors. The mythical punishment becomes an actual real-world punishment by the myth’s adherents. And this is enough to tilt the game-theoretic balance even for atheists. The point being: The spreading of a powerful shared myth can shift the game theoretic structure of the world, altering the landscape of possible social structures. What’s more, such myths can increase the overall fitness of a society. And we need not rely on group selection arguments here; the presence of the shared myth increases the fitness of every individual. A deeper point is that the specific way in which the landscape is altered depends on the details of the shared myth. So if we contrast the God myth above to a God that punishes defectors but also punishes mortals who punish defectors, we lose the stability property that we sought. The suggestion being: different ideas alter the game theoretic balance of the world in different ways, and sometimes subtle differences can be hugely important. Another take-away from this simple example is that shared myths can become embodied within us, both in our behavior and in our physiology. Thus we come back to the “humans aren’t rational” point: The cooperator equilibrium becomes more stable if the God myth somehow becomes hardwired into our brains. These ideas take hold of us and shape us in their image. Let’s go further into this. In our sophisticated secular society, it’s not too controversial to refer to the belief in all-good and all-knowing gods as a myth. But Yuval Noah Harari goes further. To him, the concept of the shared myth goes much deeper than just our ideas about the supernatural. In fact, most of our native way of viewing the world consists of a network of shared myths and stories that we tell one another. After all, the universe is just physics. We’re atoms bumping into one another. There are no particles of fairness or human rights, no quantum fields for human meaning or karmic debts. These are all shared myths. Economic systems consist of mostly shared stories that we tell each other, stories about how much a dollar bill is worth and what the stock price of Amazon is. None of these things are really out there in the world. They are in our brains, and they are there for an important reason: they open up the possibility for societal structures that would otherwise be completely impossible. Imagine having a global trade network without the shared myth of the value of money. Or a group of millions of humans living packed together in a city that didn’t all on some level believe in the myths of human value and respect. Just think about this for a minute. Humans have this remarkable ability to radically change our way of interacting with one another and our environments by just changing the stories that we tell one another. We are able to do this because of two features of our brains. First, we are extraordinarily creative. We can come up with ideas like money and God and law and democracy and whole-heartedly believe in them, to the point that we are willing to sacrifice our lives for them. Second, we are able to communicate these ideas to one another. This allows the ideas to spread and become shared myths. And most remarkably, all of these ideas (capitalism and communism, democracy and fascism) are running on essentially the same hardware! In Harari’s words: While the behaviour patterns of archaic humans remained fixed for tens of thousands of years, Sapiens could transform their social structures, the nature of their interpersonal relations, their economic activities and a host of other behaviours within a decade or two. Consider a resident of Berlin, born in 1900 and living to the ripe age of one hundred. She spent her childhood in the Hohenzollern Empire of Wilhelm II; her adult years in the Weimar Republic, the Nazi Third Reich and Communist East Germany; and she died a citizen of a democratic and reunited Germany. She had managed to be a part of five very different sociopolitical systems, though her DNA remained exactly the same. # Anthropic reasoning in everyday life Thought experiment from a past post: A stranger comes up to you and offers to play the following game with you: “I will roll a pair of dice. If they land snake eyes (i.e. they both land 1), you give me one dollar. Otherwise, if they land anything else, I give you a dollar.” Do you play this game? […] Now imagine that the stranger is playing the game in the following way: First they find one person and offer to play the game with them. If the dice land snake eyes, then they collect a dollar and stop playing the game. Otherwise, they find ten new people and offer to play the game with them. Same as before: snake eyes, the stranger collects$1 from each and stops playing, otherwise he moves on to 100 new people. Et cetera forever.

When we include this additional information about the other games the stranger is playing, then the thought experiment becomes identical in form to the dice killer thought experiment. Thus updating on the anthropic information that you have been kidnapped gives a 90% chance of snake-eyes, which means you have a 90% chance of losing a dollar and only a 10% chance of gaining a dollar. Apparently you should now not take the offer!

This seems a little weird. Shouldn’t it be irrelevant if the game if being offered to other people? To an anthropic reasoner, the answer is a resounding no. It matters who else is, or might be, playing the game, because it gives us additional information about our place in the population of game-players.

Thus far this is nothing new. But now we take one more step: Just because you don’t know the spatiotemporal distribution of game offers doesn’t mean that you can ignore it!

So far the strange implications of anthropic reasoning have been mostly confined to bizarre thought experiments that don’t seem too relevant to the real world. But the implication of this line of reasoning is that anthropic calculations bleed out into ordinary scenarios. If there is some anthropically relevant information that would affect your probabilities, then you need to consider the probability that this information

In other words, if somebody comes up to you and makes you the offer described above, you can’t just calculate the expected value of the game and make your decision. Instead, you have to consider all possible distributions of game offers, calculate the probability of each, and average over the implied probabilities! This is no small order.

For instance, suppose that you have a 50% credence that the game is being offered only one time to one person: you. The other 50% is given to the “dice killer” scenario: that the game is offered in rounds to a group that decuples in size each round, and that this continues until the dice finally land snake-eyes. Presumably you then have to average over the expected value of playing the game for each scenario.

$EV_1 = - \1 \cdot \frac{35}{36} + \1 \cdot \frac{1}{36} = \ \frac{34}{36} \approx \0.94 \\~\\ EV_2 = \1 \cdot 0.1 + - \1 \cdot 0.9 = - \ 0.80 \\~\\ EV = 0.50 \cdot EV_1 + 0.50 \cdot EV_2 \approx \ .07$

In this case, the calculation wasn’t too bad. But that’s because it was highly idealized. In general, representing your knowledge of the possible distributions of games offered seems quite difficult. But the more crucial point is that it is apparently not enough to go about your daily life calculating the expected value of the decisions facing you. You have to also consider who else might be facing the same decisions, and how this influences your chances of winning.

Can anybody think of a real-life example where these considerations change the sign of the expected value calculation?

# Pushing anti-anthropic intuitions

A stranger comes up to you and offers to play the following game with you: “I will roll a pair of dice. If they land snake eyes (i.e. they both land 1), you give me one dollar. Otherwise, if they land anything else, I give you a dollar.”

Do you play this game?

Here’s an intuitive response: Yes, of course you should! You have a 35/36 chance of gaining $1, and only a 1/36 chance of losing$1. You’d have to be quite risk averse to refuse those odds.

What if the stranger tells you that they are giving this same bet to many other people? Should that change your calculation?

Intuitively: No, of course not! It doesn’t matter what else the stranger is doing with other people.

What if they tell you that they’ve given this offer to people in the past, and might give the offer to others in the future? Should that change anything?

Once again, it seems intuitively not to matter. The offers given to others simply have nothing to do with you. What matters are your possible outcomes and the probabilities of each of these outcomes. And what other people are doing has nothing to do with either of these.

… Right?

Now imagine that the stranger is playing the game in the following way: First they find one person and offer to play the game with them. If the dice land snake eyes, then they collect a dollar and stop playing the game. Otherwise, they find ten new people and offer to play the game with them. Same as before: snake eyes, the stranger collects $1 from each and stops playing, otherwise he moves on to 100 new people. Et cetera forever. We now ask the question: How does the average person given the offer do if they take the offer? Well, no matter how many rounds of offers the stranger gives, at least 90% of people end up in his last round. That means that at least 90% of people end up giving over$1 and at most 10% gain $1. This is clearly net negative for those that hand over money! Think about it this way: Imagine a population of individuals who all take the offer, and compare them to a population that all reject the offer. Which population does better on average? For the population who takes the offer, the average person loses money. An upper bound on how much they lose is 10% ($1) + 90% (-$1) = -$.80. For the population that reject the offer, nobody gains money or loses It either: the average case is exactly $0.$0 is better than -$.80, so the strategy of rejecting the offer is better, on average! This thought experiment is very closely related to the dice killer thought experiment. I think of it as a variant that pushes our anti-anthropic-reasoning intuitions. It just seems really wrong to me that if somebody comes up to you and offers you this deal that has a 35/36 chance of paying out you should reject it. The details of who else is being offered the deal seem totally irrelevant. But of course, all of the previous arguments I’ve made for anthropic reasoning apply here as well. And it is just true that the average person that rejects the offer does better than the average person that accepts it. Perhaps this is just another bullet that we have to bite in our attempt to formalize rationality! # An expected value puzzle Consider the following game setup: Each round of the game starts with you putting in all of your money. If you currently have$10, then you must put in all of it to play. Now a coin is flipped. If it lands heads, you get back 10 times what you put in ($100). If not, then you lose it all. You can keep playing this game until you have no more money. What does a perfectly rational expected value reasoner do? Supposing that this reasoner’s sole goal is to maximize the quantity of money that they own, then the expected value for putting in the money is always greater than 0. If you put in$X, then you stand a 50% chance of getting $10X back and a 50% chance of losing$X. Thus, your expected value is 5X – X/2 = 9X/2.

This means that the expected value reasoner that wants to maximize their winnings would keep putting in their money until, eventually, they lose it all.

What’s wrong with this line of reasoning (if anything)? Does it serve as a reductio ad absurdum of expected value reasoning?

# The Anthropic Dice Killer

Today we discuss anthropic reasoning.

## The Problem

Imagine the following scenario:

One piece of information that you have is that you are aware of the maniacal schemes of your captor. His plans began by capturing one random person. He then rolled a pair of dice to determine their fate. If the dice landed snake eyes (both 1), then the captive would be killed. If not, then they would be let free.

But if they are let free, the killer will search for new victims, and this time bring back ten new people and lock them alone in rooms. He will then determine their fate just as before, with a pair of dice. Snake eyes means they die, otherwise they will be let free and he will search for new victims (ten times as many as he just let free).

His murder spree will continue until the first time he rolls snake eyes. Then he will kill the group that he currently has imprisoned and retire from the serial-killer life.

Now. You become aware of a risky way out of the room you are locked in and to freedom. The chances of surviving this escape route are only 50%. Your choices are thus either (1) to traverse the escape route with a 50% chance of survival or (2) to just wait for the killer to roll his dice, and hope that it doesn’t land snake eyes.

What should you do?

Your chance of dying if you stay and wait is just the chance that the dice lands snake eyes. The probability of snake eyes is just 1/36 (1/6 for each dice landing 1).

So your chance of death is only 1/36 (≈ 3%) if you wait, and it’s 50% if you try to run for it. Clearly, you are better off waiting!

## But…

You guessed it, things aren’t that easy. You have extra information about your situation besides just how the dice works, and you should use it. In particular, the killing pattern of your captor turns out to be very useful information.

Ask the following question: Out of all of the people that have been captured or will be captured at some point by this madman, how many of them will end up dying? This is just the very last group, which, incidentally, is the largest group.

Consider: if the dice land snake eyes the first time they are rolled, then only one person is ever captured, and this person dies. So the fraction of those captured that die is 100%.

If they lands snake eyes the second time they are rolled, then 11 people total are captured, 10 of whom die. So the fraction of those captured that die is 10/11, or ≈ 91%.

If it’s the third time, then 111 people total are captured, 100 of whom die. Now the fraction is just over 90%.

In general, no matter how many times the dice rolls before landing snake eyes, it always ends up that over 90% of those captured end up being in the last round, and thus end up dying.

So! This looks like bad news for you… you’ve been captured, and over 90% of those that are captured always die. Thus, your chance of death is guaranteed to be greater than 90%.

The escape route with a 50% survival chance is looking nicer now, right?

## Wtf is this kind of reasoning??

What we just did is called anthropic reasoning. Anthropic reasoning really just means updating on all of the information available to you, including indexical information (information about your existence, age, location, and so on). In this case, the initial argument neglected the very crucial information that you are one of the people that were captured by the killer. When updating on this information, we get an answer that is very very different from what we started with. And in this life-or-death scenario, this is an important difference!

You might still feel hesitant about the answer we got. After all, if you expect a 90% chance of death, this means that you expect a 90% chance for the dice to land snake eyes. But it’s not that you think the dice are biased or anything… Isn’t this just blatantly contradictory?

This is a convincing-sounding rebuttal, but it’s subtly wrong. The key point is that even though the dice are fair, there is a selection bias in the results you are seeing. This selection bias amounts to the fact that when the dice inevitably lands snake-eyes, there are more people around to see it. The fact that you are more likely than 1/36 to see snake-eyes is kind of like the fact that if you are given the ticket of a random concert-goer, you have a higher chance of ending seeing a really popular band than if you just looked at the current proportion of shows performed by really popular bands.

It’s kind of like the fact that in your life you will spend more time waiting in long lines than short lines, and that on average your friends have more friends than you. This all seems counterintuitive and wrong until you think closely about the selection biases involved.

Anyway, I want to impress upon you that 90% really is the right answer, so I’ll throw some math at you. Let’s calculate in full detail what fraction of the group ends up surviving on average.

By the way, the discrepancy between the baseline chance of death (1/36) and the anthropic chance of death (90%) can be made as large as you like by manipulating the starting problem. Suppose that instead of 1/36, the chance of the group dying was 1/100, and instead of the group multiplying by 10 in size each round, it grew by a factor of 100. Then the baseline chance of death would be 1%, and the anthropic probability would be 99%.

We can find the general formula for any such scenario:

IF ANYBODY CAN SOLVE THIS, PLEASE TELL ME! I’ve been trying for too long now and would really like an analytic general solution. 🙂

There is a lot more to be said about this thought experiment, but I’ll leave it there for now. In the next post, I’ll present a slight variant on this thought experiment that appears to give us a way to get direct Bayesian evidence for different theories of consciousness! Stay tuned.

# What do I find conceptually puzzling?

There are lots of things that I don’t know, like, say, what the birth rate in Sweden is or what the effect of poverty on IQ is. There are also lots of things that I find really confusing and hard to understand, like quantum field theory and monetary policy. There’s also a special category of things that I find conceptually puzzling. These things aren’t difficult to grasp because the facts about them are difficult to understand or require learning complicated jargon. Instead, they’re difficult to grasp because I suspect that I’m confused about the concepts in use.

This is a much deeper level of confusion. It can’t be adjudicated by just reading lots of facts about the subject matter. It requires philosophical reflection on the nature of these concepts, which can sometimes leave me totally confused about everything and grasping for the solid ground of mere factual ignorance.

As such, it feels like a big deal when something I’ve been conceptually puzzled about becomes clear. I want to compile a list for future reference of things that I’m currently conceptually puzzled about and things that I’ve become un-puzzled about. (This is not a complete list, but I believe it touches on the major themes.)

# Things I’m conceptually puzzled about

### What is the relationship between consciousness and physics?

Essentially, at this point every available viewpoint on consciousness seems wrong to me.

Eliminativism amounts to a denial of pretty much the only thing that we can be sure can’t be denied – that we are having conscious experiences. Physicalism entails the claim that facts about conscious experience can be derived from laws of physics, which is wrong as a matter of logic.

Dualism entails that the laws of physics by themselves cannot account for the behavior of the matter in our brains, which is wrong. And epiphenomenalism entails that our beliefs about our own conscious experience are almost certainly wrong, and are no better representations of our actual conscious experiences than random chance.

### How do we make sense of decision theory if we deny libertarian free will?

Decision theory is ultimately about finding the decision D that maximizes expected utility EU(D). But to do this calculation, we have to decide what the set of possible decisions we are searching is.

Make this set too large, and you end up getting fantastical and impossible results (like that the optimal decision is to snap your fingers and make the world into a utopia). Make it too small, and you end up getting underwhelming results (in the extreme case, you just get that the optimal decision is to do exactly what you are going to do, since this is the only thing you can do in a strictly deterministic world).

We want to find a nice middle ground between these two – a boundary where we can say “inside here the things that are actually possible for us to do, and outside are those that are not.” But any principled distinction between what’s in the set and what’s not must be based on some conception of some actions being “truly possible” to us, and others being truly impossible. I don’t know how to make this distinction in the absence of a robust conception of libertarian free will.

### Are there objectively right choices of priors?

If you say no, then there are no objectively right answers to questions like “What should I believe given the evidence I have?” And if you say yes, then you have to deal with thought experiments like the cube problem, where any choice of priors looks arbitrary and unjustifiable.

(If you are going to be handed a cube, and all you know is that it has a volume less than 1 cm3, then setting maximum entropy priors over volumes gives different answers than setting maximum entropy priors over side areas or side lengths. This means that what qualifies as “maximally uncertain” depends on whether we frame our reasoning in terms of side length, areas, or cube volume. Other approaches besides MaxEnt have similar problems of concept dependence.)

### How should we deal with infinities in decision theory?

The basic problem is that expected utility theory does great at delivering reasonable answers when the rewards are finite, but becomes wacky when the rewards become infinite. There are a huge amount of examples of this. For instance, in the St. Petersburg paradox, you are given the option to play a game with an infinite expected payout, suggesting that you should buy in to the game no matter how high the cost. You end up making obviously irrational choices, such as spending \$1,000,000 on the hope that a fair coin will land heads 20 times in a row. Variants of this involve the inability of EU theory to distinguish between obviously better and worse bets that have infinite expected value.

And Pascal’s mugging is an even worse case. Roughly speaking, a person comes up to you and threatens you with infinite torture if you don’t submit to them and give them 20 dollars. Now, the probability that this threat is credible is surely tiny. But it is non-zero! (as long as you don’t think it is literally logically impossible for this threat to come true)

An infinite penalty times a finite probability is still an infinite expected penalty. So we stand to gain an infinite expected utility by just handing over the 20 dollars. This seems ridiculous, but I don’t know any reasonable formalization of decision theory that allows me to refute it.

### Is causality fundamental?

Causality has been nicely formalized by Pearl’s probabilistic graphical models. This is a simple extension of probability theory, out of which naturally falls causality and counterfactuals.

One can use this framework to represent the states of fundamental particles and how they change over time and interact with one another. What I’m confused about is that in some ways of looking at it, the causal relations appear to be useful but un-fundamental constructs for the sake of easing calculations. In other ways of looking at it, causal relations are necessarily built into the structure of the world, and we can go out and empirically discover them. I don’t know which is right. (Sorry for the vagueness in this one – it’s confusing enough to me that I have trouble even precisely phrasing the dilemma).

### How should we deal with the apparent dependence of inductive reasoning upon our choices of concepts?

I’ve written about this here. Beyond just the problem of concept-dependence in our choices of priors, there’s also the problem presented by the grue/bleen thought experiment.

This thought experiment proposes two new concepts: grue (= the set of things that are either green before 2100 or blue after 2100) and bleen (the inverse of grue). It then shows that if we reasoned in terms of grue and bleen, standard induction would have us concluding that all emeralds will suddenly turn blue after 2100. (We repeatedly observed them being grue before 2100, so we should conclude that they will be grue after 2100.)

In other words, choose the wrong concepts and induction breaks down. This is really disturbing – choices of concepts should be merely pragmatic matters! They shouldn’t function as fatal epistemic handicaps. And given that they appear to, we need to develop some criterion we can use to determine what concepts are good and what concepts are bad.

The trouble with this is that the only proposals I’ve seen for such a criterion reference the idea of concepts that “carve reality at its joints”; in other words, the world is composed of green and blue things, not grue and bleen things, so we should use the former rather than the latter. But this relies on the outcome of our inductive process to draw conclusions about the starting step on which this outcome depends!

I don’t know how to cash out “good choices of concepts” without ultimately reasoning circularly. I also don’t even know how to make sense of the idea of concepts being better or worse for more than merely pragmatic reasons.

### How should we reason about self defeating beliefs?

The classic self-defeating belief is “This statement is a lie.” If you believe it, then you are compelled to disbelieve it, eliminating the need to believe it in the first place. Broadly speaking, self-defeating beliefs are those that undermine the justifications for belief in them.

Here’s an example that might actually apply in the real world: Black holes glow. The process of emission is known as Hawking radiation. In principle, any configuration of particles with a mass less than the black hole can be emitted from it. Larger configurations are less likely to be emitted, but even configurations such as a human brain have a non-zero probability of being emitted. Henceforth, we will call such configurations black hole brains.

Now, imagine discovering some cosmological evidence that the era in which life can naturally arise on planets circling stars is finite, and that after this era there will be an infinite stretch of time during which all that exists are black holes and their radiation. In such a universe, the expected number of black hole brains produced is infinite (a tiny finite probability multiplied by an infinite stretch of time), while the expected number of “ordinary” brains produced is finite (assuming a finite spatial extent as well).

What this means is that discovering this cosmological evidence should give you an extremely strong boost in credence that you are a black hole brain. (Simply because most brains in your exact situation are black hole brains.) But most black hole brains have completely unreliable beliefs about their environment! They are produced by a stochastic process which cares nothing for producing brains with reliable beliefs. So if you believe that you are a black hole brain, then you should suddenly doubt all of your experiences and beliefs. In particular, you have no reason to think that the cosmological evidence you received was veridical at all!

I don’t know how to deal with this. It seems perfectly possible to find evidence for a scenario that suggests that we are black hole brains (I’d say that we have already found such evidence, multiple times). But then it seems we have no way to rationally respond to this evidence! In fact, if we do a naive application of Bayes’ theorem here, we find that the probability of receiving any evidence in support of black hole brains to be 0!

So we have a few options. First, we could rule out any possible skeptical scenarios like black hole brains, as well as anything that could provide any amount of evidence for them (no matter how tiny). Or we could accept the possibility of such scenarios but face paralysis upon actually encountering evidence for them! Both of these seem clearly wrong, but I don’t know what else to do.

### How should we reason about our own existence and indexical statements in general?

This is called anthropic reasoning. I haven’t written about it on this blog, but expect future posts on it.

A thought experiment: imagine a murderous psychopath who has decided to go on an unusual rampage. He will start by abducting one random person. He rolls a pair of dice, and kills the person if they land snake eyes (1, 1). If not, he lets them free and hunts down ten new people. Once again, he rolls his pair of die. If he gets snake eyes he kills all ten. Otherwise he frees them and kidnaps 100 new people. On and on until he eventually gets snake eyes, at which point his murder spree ends.

Now, you wake up and find that you have been abducted. You don’t know how many others have been abducted alongside you. The murderer is about to roll the dice. What is your chance of survival?

Your first thought might be that your chance of death is just the chance of both dice landing 1: 1/36. But think instead about the proportion of all people that are ever abducted by him that end up dying. This value ends up being roughly 90%! So once you condition upon the information that you have been captured, you end up being much more worried about your survival chance.

But at the same time, it seems really wrong to be watching the two dice tumble and internally thinking that there is a 90% chance that they land snake eyes. It’s as if you’re imagining that there’s some weird anthropic “force” pushing the dice towards snake eyes. There’s way more to say about this, but I’ll leave it for future posts.

# Things I’ve become un-puzzled about

### Newcomb’s problem – one box or two box?

To almost everyone, it is perfectly clear and obvious what should be done. The difficulty is that these people seem to divide almost evenly on the problem, with large numbers thinking that the opposing half is just being silly.

– Nozick, 1969

I’ve spent months and months being hopelessly puzzled about Newcomb’s problem. I now am convinced that there’s an unambiguous right answer, which is to take the one box. I wrote up a dialogue here explaining the justification for this choice.

In a few words, you should one-box because one-boxing makes it nearly certain that the simulation of you run by the predictor also one-boxed, thus making it nearly certain that you will get 1 million dollars. The dependence between your action and the simulation is not an ordinary causal dependence, nor even a spurious correlation – it is a logical dependence arising from the shared input-output structure. It is the same type of dependence that exists in the clone prisoner dilemma, where you can defect or cooperate with an individual you are assured is identical to you in every single way. When you take into account this logical dependence (also called subjunctive dependence), the answer is unambiguous: one-boxing is the way to go.

# Summing up:

Things I remain conceptually confused about:

• Consciousness
• Decision theory & free will
• Objective priors
• Infinities in decision theory
• Fundamentality of causality
• Dependence of induction on concept choice
• Self-defeating beliefs
• Anthropic reasoning

I recently showed the famous Monty Hall problem to a friend. This friend solved the problem right away, and we realized quickly that the standard presentation of the problem is highly misleading.

Here’s the setup as it was originally described in the magazine column that made it famous:

Suppose you’re on a game show, and you’re given the choice of three doors: Behind one door is a car; behind the others, goats. You pick a door, say No. 1, and the host, who knows what’s behind the doors, opens another door, say No. 3, which has a goat. He then says to you, “Do you want to pick door No. 2?” Is it to your advantage to switch your choice?

I encourage you to think through this problem for yourself and come to an answer. Will provide some blank space so that you don’t accidentally read ahead.

Now, the writer of the column was Marilyn vos Savant, famous for having an impossible IQ of 228 according to an interpretation of a test that violated “almost every rule imaginable concerning the meaning of IQs” (psychologist Alan Kaufman). In her response to the problem, she declared that switching gives you a 2/3 chance of winning the car, as opposed to a 1/3 chance for staying. She argued by analogy:

Yes; you should switch. The first door has a 1/3 chance of winning, but the second door has a 2/3 chance. Here’s a good way to visualize what happened. Suppose there are a million doors, and you pick door #1. Then the host, who knows what’s behind the doors and will always avoid the one with the prize, opens them all except door #777,777. You’d switch to that door pretty fast, wouldn’t you?

Notice that this answer contains a crucial detail that is not contained in the statement of the problem! Namely, the answer adds the stipulation that the host “knows what’s behind the doors and will always avoid the one with the prize.”

The original statement of the problem in no way implies this general statement about the host’s behavior. All you are justified to assume in an initial reading of the problem are the observational facts that (1) the host happened to open door No. 3, and (2) this door happened to contain a goat.

When nearly a thousand PhDs wrote in to the magazine explaining that her answer was wrong, she gave further arguments that failed to reference the crucial point; that her answer was only true given additional unstated assumptions.

My original answer is correct. But first, let me explain why your answer is wrong. The winning odds of 1/3 on the first choice can’t go up to 1/2 just because the host opens a losing door. To illustrate this, let’s say we play a shell game. You look away, and I put a pea under one of three shells. Then I ask you to put your finger on a shell. The odds that your choice contains a pea are 1/3, agreed? Then I simply lift up an empty shell from the remaining other two. As I can (and will) do this regardless of what you’ve chosen, we’ve learned nothing to allow us to revise the odds on the shell under your finger.

Notice that this argument is literally just a restatement of the original problem. If one didn’t buy the conclusion initially, restating it in terms of peas and shells is unlikely to do the trick!

This problem was made even more famous by this scene in the movie “21”, in which the protagonist demonstrates his brilliance by coming to the same conclusion as vos Savant. While the problem is stated slightly better in this scene, enough ambiguity still exists that the proper response should be that the problem is underspecified, or perhaps a set of different answers for different sets of auxiliary assumptions.

The wiki page on this ‘paradox’ describes it as a veridical paradox, “because the correct choice (that one should switch doors) is so counterintuitive it can seem absurd, but is nevertheless demonstrably true.”

Later on the page, we see the following:

In her book The Power of Logical Thinking, vos Savant (1996, p. 15) quotes cognitive psychologist Massimo Piattelli-Palmarini as saying that “no other statistical puzzle comes so close to fooling all the people all the time,” and “even Nobel physicists systematically give the wrong answer, and that they insist on it, and they are ready to berate in print those who propose the right answer.”

There’s something to be said about adequacy reasoning here; when thousands of PhDs and some of the most brilliant mathematicians in the world are making the same point, perhaps we are too quick to write it off as “Wow, look at the strength of this cognitive bias! Thank goodness I’m bright enough to see past it.”

In fact, the source of all of the confusion is fairly easy to understand, and I can demonstrate it in a few lines.

Solution to the problem as presented

Initially, all three doors are equally likely to contain the car.
So Pr(1) = Pr(2) = Pr(3) = ⅓

We are interested in how these probabilities update upon the observation that 3 does not contain the car.
Pr(1 | ~3) = Pr(1)・Pr(~3 | 1) / Pr(~3)
= (⅓ ・1) / ⅔ = ½

By the same argument,
Pr(2 | ~3) = ½

Voila. There’s the simple solution to the problem as it is presented, with no additional presumptions about the host’s behavior. Accepting this argument requires only accepting three premises:

(1) Initially all doors are equally likely to be hiding the car.

(2) Bayes’ rule.

(3) There is only one car.

(3) implies that Pr(the car is not behind a door | the car is behind a different door) = 100%, which we use when we replace Pr(~3 | 1) with 1.

The answer we get is perfectly obvious; in the end all you know is that the car is either in door 1 or door 2, and that you picked door 1 initially. Since which door you initially picked has nothing to do with which door the car was behind, and the host’s decision gives you no information favoring door 1 over door 2, the probabilities should be evenly split between the two.

It is also the answer that all the PhDs gave.

Now, why does taking into account the host’s decision process change things? Simply because the host’s decision is now contingent on your decision, as well as the actual location of the car. Given that you initially opened door 1, the host is guaranteed to not open door 1 for you, and is also guaranteed to not open up a door hiding the car.

Solution with specified host behavior

Initially, all three doors are equally likely to contain the car.
So Pr(1) = Pr(2) = Pr(3) = ⅓

We update these probabilities upon the observation that 3 does not contain the car, using the likelihood formulation of Bayes’ rule.

Pr(1 | open 3) / Pr(2 | open 3)
= Pr(1) / Pr(2)・Pr(open 3 | 1) / Pr(open 3 | 2)
= ⅓ / ⅓・½ / 1 = ½

So Pr(1 | open 3) = ⅓ and Pr(2 | open 3) = ⅔

Pr(open 3 | 2) = 1, because the host has no choice of which door to open if you have selected door 1 and the car is behind door 2.

Pr(open 3 | 1) = ½, because the host has a choice of either opening 2 or 3.

In fact, it’s worth pointing out that this requires another behavioral assumption about the host that is nowhere stated in the original post, or Savant’s solution. This is that if there is a choice about which of two doors to open, the host will pick randomly.

This assumption is again not obviously correct from the outset; perhaps the host chooses the larger of the two door numbers in such cases, or the one closer to themselves, or the one or the smaller number with 25% probability. There are an infinity of possible strategies the host could be using, and this particular strategy must be explicitly stipulated to get the answer that Wiki proclaims to be correct.

It’s also worth pointing out that once these additional assumptions are made explicit, the ⅓ answer is fairly obvious and not much of a paradox. If you know that the host is guaranteed to choose a door with a goat behind it, and not one with a car, then of course their decision about which door to open gives you information. It gives you information because it would have been less likely in the world where the car was under door 1 than in the world where the car was under door 2.

In terms of causal diagrams, the second formulation of the Monty Hall problem makes your initial choice of door and the location of the car dependent upon one another. There is a path of causal dependency that goes forwards from your decision to the host’s decision, which is conditioned upon, and then backward from the host’s decision to which door the car is behind.

Any unintuitiveness in this version of the Monty Hall problem is ultimately due to the unintuitiveness of the effects of conditioning on a common effect of two variables.

In summary, there is no paradox behind the Monty Hall problem, because there is no single Monty Hall problem. There are two different problems, each containing different assumptions, and each with different answers. The answers to each problem are fairly clear after a little thought, and the only appearance of a paradox comes from apparent disagreements between individuals that are actually just talking about different problems. There is no great surprise when ambiguous wording turns out multiple plausible solutions, it’s just surprising that so many people see something deeper than mere ambiguity here.