This might be surprising to those that know the basics of the double slit experiment. For those that don’t, very briefly:
A bunch of tiny particles are thrown one by one at a barrier with two thin slits in it, with a detector sitting on the other side. The pattern on the detector formed by the particles is an interference pattern, which appears to imply that each particle went through both slits in some sense, like a wave would do. Now, if you peek really closely at each slit to see which one each particle passes through, the results seem to change! The pattern on the detector is no longer an interference pattern, but instead looks like the pattern you’d classically expect from a particle passing through only one slit!
When you first learn about this strange dependence of the experimental results on, apparently, whether you’re looking at the system or not, it appears to be good evidence that your conscious observation is significant in some very deep sense. After all, observation appears to lead to fundamentally different behavior, collapsing the wave to a particle! Right?? This animation does a good job of explaining the experiment in a way that really pumps the intuition that consciousness matters:
(Fair warning, I find some aspects of this misleading and just plain factually wrong. I’m linking to it not as an endorsement, but so that you get the intuition behind the arguments I’m responding to in this post.)
The feeling that consciousness is playing an important role here is a fine intuition to have before you dive deep into the details of quantum mechanics. But now consider that the exact same behavior would be produced by a very simple process that is very clearly not a conscious observation. Namely, just put a single spin qubit at one of the slits in such a way that if the particle passes through that slit, it flips the spin upside down. Guess what you get? The exact same results as you got by peeking at the screen. You never need to look at the particle as it travels through the slits to the detector in order to collapse the wave-like behavior. Apparently a single qubit is sufficient to do this!
It turns out that what’s really going on here has nothing to do with the collapse of the wave function and everything to do with the phenomenon of decoherence. Decoherence is what happens when a quantum superposition becomes entangled with the degrees of freedom of its environment in such a way that the branches of the superposition end up orthogonal to each other. Interference can only occur between the different branches if they are not orthogonal, which means that decoherence is sufficient to destroy interference effects. This is all stuff that all interpretations of quantum mechanics agree on.
Once you know that decoherence destroys interference effects (which all interpretations of quantum mechanics agree on), and also that a conscious observing the state of a system is a process that results in extremely rapid and total decoherence (which everybody also agrees on), then the fact that observing the position of the particle causes interference effects to vanish becomes totally independent of the question of what causes wave function collapse. Whether or not consciousness causes collapse is 100% irrelevant to the results of the experiment, because regardless of which of these is true, quantum mechanics tells us to expect observation to result in the loss of interference!
This is why whether or not consciousness causes collapse has no real impact on what pattern shows up in the wall. All interpretations of quantum mechanics agree that decoherence is a thing that can happen, and decoherence is all that is required to explain the experimental results. The double slit experiment provides no evidence for consciousness causing collapse, but it also provides no evidence against it. It’s just irrelevant to the question! That said, however, given that people often hear the experiment presented in a way that makes it seem like evidence for consciousness causing collapse, hearing that qubits do the same thing should make them update downwards on this theory.
In the double slit experiment, particles travelling through a pair of thin slits exhibit wave-like behavior, forming an interference pattern where they land that indicates that the particles in some sense travelled through both slits.
Now, suppose that you place a single spin bit at the top slit, which starts off in the state |↑⟩ and flips to |↓⟩ iff a particle travels through the top slit. We fire off a single particle at a time, and then each time swap out that spin bit for a new spin bit that also starts off in the state |↑⟩. This serves as an extremely simple measuring device which encodes the information about which slit each particle went through.
Now what will you observe on the screen? It turns out that you’ll observe the classically expected distribution, which is a simple average over the two individual possibilities without any interference.
Okay, so what happened? Remember that the first pattern we observed was the result of the particles being in a superposition over the two possible paths, and then interfering with each other on the way to the detector screen. So it looks like simply having one bit of information recording the path of the particle was sufficient to collapse the superposition! But wait! Doesn’t this mean that the “consciousness causes collapse” theory is wrong? The spin bit was apparently able to cause collapse all by itself, so assuming that it isn’t a conscious system, it looks like consciousness isn’t necessary for collapse! Theory disproved!
No. As you might be expecting, things are not this simple. For one thing, notice that this ALSO would prove as false any other theory of wave function collapse that doesn’t allow single bits to cause collapse (including anything about complex systems or macroscopic systems or complex information processing). We should be suspicious of any simple argument that claims to conclusively prove a significant proportion of experts wrong.
To see what’s going on here, let’s look at what happens if we don’t assume that the spin bit causes the wave function to collapse. Instead, we’ll just model it as becoming fully entangled with the path of the particle, so that the state evolution over time looks like the following:
Now if we observe the particle’s position on the screen, the probability distribution we’ll observe is given by the Born rule. Assuming that we don’t observe the states of the spin bits, there are now two qualitatively indistinguishable branches of the wave function for each possible position on the screen. This means that the total probability for any given landing position will be given by the sum of the probabilities of each branch:
But hold on! Our final result is identical to the classically expected result! We just get the probability of the particle getting to |j⟩ from |A⟩, multiplied by the probability of being at |A⟩ in the first place (50%), plus the probability of the particle going from |B⟩ to |j⟩ times the same 50% for the particle getting to |B⟩.
In other words, our prediction is that we’d observe the classical pattern of a bunch of individual particles, each going through exactly one slit, with 50% going through the top slit and 50% through the bottom. The interference has vanished, even though we never assumed that the wave function collapsed!
What this shows is that wave function collapse is not required to get particle-like behavior. All that’s necessary is that the different branches of the superposition end up not interfering with each other. And all that’s necessary for that is environmental decoherence, which is exactly what we had with the single spin bit!
In other words, environmental decoherence is sufficient to produce the same type of behavior that we’d expect from wave function collapse. This is because interference will only occur between non-orthogonal branches of the wave function, and the branches become orthogonal upon decoherence (by definition). A particle can be in a superposition of multiple states but still act as if it has collapsed!
Now, maybe we want to say that the particle’s wave function is collapsed when its position is measured by the screen. But this isn’t necessary either! You could just say that the detector enters into a superposition and quickly decoheres, such that the different branches of the wave function (one for each possible detector state) very suddenly become orthogonal and can no longer interact. And then you could say that the collapse only really happens once a conscious being observes the detector! Or you could be a Many-Worlder and say that the collapse never happens (although then you’d have to figure out where the probabilities are coming from in the first place).
You might be tempted to say at this point: “Well, then all the different theories of wave function collapse are empirically equivalent! At least, the set of theories that say ‘wave function collapse = total decoherence + other necessary conditions possibly’. Since total decoherence removes all interference effects, the results of all experiments will be indistinguishable from the results predicted by saying that the wave function collapsed at some point!”
But hold on! This is forgetting a crucial fact: decoherence is reversible, while wave function collapse is not!!!
Let’s say that you run the same setup before with the spin bit recording the information about which slit the particle went through, but then we destroy that information before it interacts with the environment in any way, therefore removing any traces of the measurement. Now the two branches of the wave function have “recohered,” meaning that what we’ll observe is back to the interference pattern! (There’s a VERY IMPORTANT caveat, which is that the time period during which we’re destroying the information stored in the spin bit must be before the particle hits the detector screen and the state of the screen couples to its environment, thus decohering with the record of which slit the particle went through).
If you’re a collapse purist that says that wave function collapse = total decoherence (i.e. orthogonality of the relevant branches of the wave function), then you’ll end up making the wrong prediction! Why? Well, because according to you, the wave function collapsed as soon as the information was recorded, so there was no “other branch of the wave function” to recohere with once the information was destroyed!
This has some pretty fantastic implications. Since IN PRINCIPLE even the type of decoherence that occurs when your brain registers an observation is reversible (after all, the Schrodinger equation is reversible), you could IN PRINCIPLE recohere after an observation, allowing the branches of the wave function to interfere with each other again. These are big “in principle”s, which is why I wrote them big. But if you could somehow do this, then the “Consciousness Causes Collapse” theory would give different predictions from Many-Worlds! If your final observation shows evidence of interference, then “consciousness causes collapse” is wrong, since apparently conscious observation is not sufficient to cause the other branches of the wave function to vanish. Otherwise, if you observe the classical pattern, then Many Worlds is wrong, since the observation indicates that the other branches of the wave function were gone for good and couldn’t come back to recohere.
This suggests a general way to IN PRINCIPLE test any theory of wave function collapse: Look at processes right beyond the threshold where the theory says wave functions collapse. Then implement whatever is required to reverse the physical process that you say causes collapse, thus recohering the branches of the wave function (if they still exist). Now look to see if any evidence of interference exists. If it does, then the theory is proven wrong. If it doesn’t, then it might be correct, and any theory of wave function collapse that demands a more stringent standard for collapse (including Many-Worlds, the most stringent of them all) is proven wrong.
Consider the following simple model of the double-slit experiment:
A particle starts out at |O⟩, then evolves via the Schrödinger equation into an equal superposition of being at position |A⟩ (the top slit) and being at position |B⟩ (the bottom slit).
To figure out what happens next, we need to define what would happen for a particle leaving from each individual slit. In general, we can describe each possibility as a particular superposition over the screen.
Since quantum mechanics is linear, the particle that started at |O⟩ will evolve as follows:
If we now look at any given position |j⟩ on the screen, the probability of observing the particle at this position can be calculated using the Born rule:
Notice that the first term is what you’d expect to get for the probability of a particle leaving |A⟩ being observed at position |j⟩ and the second term is the probability of a particle from |B⟩ being observed at |j⟩. The final two terms are called interference terms, and they give us the non-classical wave-like behavior that’s typical of these double-slit setups.
Now, what we just imagined was a very idealized situation in which the only parts of the universe that are relevant to our calculation are the particle, the two slits and the detector. But in reality, as the particle is traveling to the detector, it’s likely going to be interacting with the environment. This interaction is probably going to be slightly different for a particle taking the path through |A⟩ than for a particle taking the path through |B⟩, and these differences end up being immensely important.
To capture the effects of the environment in our experimental setup, let’s add an “environment” term to all of our states. At time zero, when the particle is at the origin, we’ll say that the environment is in some state |ε0⟩. Now, as the particle traverses the path to |A⟩ or to |B⟩, the environment might change slightly, so we need to give two new labels for the state of the environment in each case. |εA⟩ will be our description for the state of the environment that would result if the particle traversed the path from |O⟩ to |A⟩, and |εB⟩ will be the label for the state of the environment resulting from the particle traveling from |O⟩ to |B⟩. Now, to describe our system, we need to take the tensor product of the vector for our particle’s state and the vector for the environment’s state:
Now, what is the probability of the particle being observed at position j? Well, there are two possible worlds in which the particle is observed at position j; one in which the environment is in state |εA⟩ and the other in which it’s in state |εB⟩. So the probability will just be the sum of the probabilities for each of these possibilities.
This final equation gives us the general answer to the double slit experiment, no matter what the changes to the environment are. Notice that all that is relevant about the environment is the overlap term ⟨εA|εB⟩, which we’ll give a special name to:
This term tells us how different the two possible end states for the environment look. If the overlap is zero, then the two environment states are completely orthogonal (corresponding to perfect decoherence of the initial superposition). If the overlap is one, then the environment states are identical.
And look what we get when we express the final probability in terms of this term!
Perfect decoherence gives us classical probabilities, and perfect coherence gives us the ideal equation we found in the first part of the post! Anything in between allows the two states to interfere with each other to some limited degree, not behaving like totally separate branches of the wavefunction, nor like one single branch.
The Schrodinger equation is the formula that describes the dynamics of quantum systems – how small stuff behaves.
One fundamental feature of quantum mechanics that differentiates it from classical mechanics is the existence of something called superposition. In the same way that a particle can be in the state of “being at position A” and could also be in the state of “being at position B”, there’s a weird additional possibility that the particle is in the state of “being in a superposition of being at position A and being at position B”. It’s necessary to introduce a new word for this type of state, since it’s not quite like anything we are used to thinking about.
Now, people often talk about a particle in a superposition of states as being in both states at once, but this is not technically correct. The behavior of a particle in a superposition of positions is not the behavior you’d expect from a particle that was at both positions at once. Suppose you sent a stream of small particles towards each position and looked to see if either one was deflected by the presence of a particle at that location. You would always find that exactly one of the streams was deflected. Never would you observe the particle having been in both positions, deflecting both streams.
But it’s also just as wrong to say that the particle is in either one state or the other. Again, particles simply do not behave this way. Throw a bunch of electrons, one at a time, through a pair of thin slits in a wall and see how they spread out when they hit a screen on the other side. What you’ll get is a pattern that is totally inconsistent with the image of the electrons always being either at one location or the other. Instead, the pattern you’d get only makes sense under the assumption that the particle traveled through both slits and then interfered with itself.
If a superposition of A and B is not the same as “A and B’ and it’s not the same as ‘A or B’, then what is it? Well, it’s just that: a superposition! A superposition is something fundamentally new, with some of the features of “and” and some of the features of “or”. We can do no better than to describe the empirically observed features and then give that cluster of features a name.
Now, quantum mechanics tells us that for any two possible states that a system can be in, there is another state that corresponds to the system being in a superposition of the two. In fact, there’s an infinity of such superpositions, each corresponding to a different weighting of the two states.
The Schrödinger equation is what tells how quantum mechanical systems evolve over time. And since all of nature is just one really big quantum mechanical system, the Schrödinger equation should also tell us how we evolve over time. So what does the Schrödinger equation tell us happens when we take a particle in a superposition of A and B and make a measurement of it?
The answer is clear and unambiguous: The Schrödinger equation tells us that we ourselves enter into a superposition of states, one in which we observe the particle in state A, the other in which we observe it in B. This is a pretty bizarre and radical answer! The first response you might have may be something like “When I observe things, it certainly doesn’t seem like I’m entering into a superposition… I just look at the particle and see it in one state or the other. I never see it in this weird in-between state!”
But this is not a good argument against the conclusion, as it’s exactly what you’d expect by just applying the Schrödinger equation! When you enter into a superposition of “observing A” and “observing B”, neither branch of the superposition observes both A and B. And naturally, since neither branch of the superposition “feels” the other branch, nobody freaks out about being superposed.
But there is a problem here, and it’s a serious one. The problem is the following: Sure, it’s compatible with our experience to say that we enter into superpositions when we make observations. But what predictions does it make? How do we take what the Schrödinger equation says happens to the state of the world and turn it into a falsifiable experimental setup? The answer appears to be that we can’t. At least, not using just the Schrödinger equation on its own. To get out predictions, we need an additional postulate, known as the Born rule.
This postulate says the following: For a system in a superposition, each branch of the superposition has an associated complex number called the amplitude. The probability of observing any particular branch of the superposition upon measurement is simply the square of that branch’s amplitude.
For example: A particle is in a superposition of positions A and B. The amplitude attached to A is 0.8. The amplitude attached to B is 0.4. If we now observe the position of the particle, we will find it to be at either A with probability (.6)2 (i.e. 36%), or B with probability (.8)2 (i.e. 64%).
Simple enough, right? The problem is to figure out where the Born rule comes from and what it even means. The rule appears to be completely necessary to make quantum mechanics a testable theory at all, but it can’t be derived from the Schrödinger equation. And it’s not at all inevitable; it could easily have been that probabilities associated with the amplitude were gotten by taking absolute values rather than squares. Or why not the fourth power of the amplitude? There’s a substantive claim here, that probabilities associate with the square of the amplitudes that go into the Schrödinger equation, that needs to be made sense of. There are a lot of different ways that people have tried to do this, and I’ll list a few of the more prominent ones here.
The Copenhagen Interpretation
(Prepare to be disappointed.) The Copenhagen interpretation, which has historically been the dominant position among working physicists, is that the Born rule is just an additional rule governing the dynamics of quantum mechanical systems. Sometimes systems evolve according to the Schrödinger equation, and sometimes according to the Born rule. When they evolve according to the Schrödinger equation, they split into superpositions endlessly. When they evolve according to the Born rule, they collapse into a single determinate state. What determines when the systems evolve one way or the other? Something measurement something something observation something. There’s no real consensus here, nor even a clear set of well-defined candidate theories.
If you’re familiar with the way that physics works, this idea should send your head spinning. The claim here is that the universe operates according to two fundamentallydifferent laws, and that the dividing line between the two hinges crucially on what we mean by the words “measurement” and “observation”. Suffice it to say, if this was the right way to understand quantum mechanics, it would go entirely against the spirit of the goal of finding a fundamental theory of physics. In a fundamental theory of physics, macroscopic phenomena like measurements and observations need to be built out of the behavior of lots of tiny things like electrons and quarks, not the other way around. We shouldn’t find ourselves in the position of trying to give a precise definition to these words, debating whether frogs have the capacity to collapse superpositions or if that requires a higher “measuring capacity”, in order to make predictions about the world (as proponents of the Copenhagen interpretation have in fact done!).
The Copenhagen interpretation is not an elegant theory, it’s not a clearly defined theory, and it’s fundamentally at tension with the project of theoretical physics. So why has it been, as I said, the dominant approach over the last century to understanding quantum mechanics? This really comes down to physicists not caring enough about the philosophy behind the physics to notice that the approach they are using is fundamentally flawed. In practice, the Copenhagen interpretation works. It allows somebody working in the lab to quickly assess the results of their experiments and to make predictions about how future experiments will turn out. It gives the right empirical probabilities and is easy to implement, even if the fuzziness in the details can start to make your head hurt if you start to think about it too much. As Jean Bricmont said, “You can’t blame most physicists for following this ‘shut up and calculate’ ethos because it has led to tremendous developments in nuclear physics, atomic physics, solid state physics and particle physics.” But the Copenhagen interpretation is not good enough for us. A serious attempt to make sense of quantum mechanics requires something more substantive. So let’s move on.
Objective Collapse Theories
These approaches hinge on the notion that the Schrödinger equation really is the only law at work in the universe, it’s just that we have that equation slightly wrong. Objective collapse theories add slight nonlinearities to the Schrödinger equation so that systems sometimes spread out in superpositions and other times collapse into definite states, all according to one single equation. The most famous of these is the spontaneous collapse theory, according to which quantum systems collapse with a probability that grows with the number of particles in the system.
This approach is nice for several reasons. For one, it gives us the Born rule without requiring a new equation. It makes sense of the Born rule as a fundamental feature of physical reality, and makes precise and empirically testable predictions that can distinguish it from from other interpretations. The drawback? It makes the Schrödinger equation ugly and complicated, and it adds extra parameters that determine how often collapse happens. And as we know, whenever you start adding parameters you run the risk of overfitting your data.
Hidden Variable Theories
These approaches claim that superpositions don’t really exist, they’re just a high-level consequence of the unusual behavior of the stuff at the smallest level of reality.They deny that the Schrödinger equation is truly fundamental, and say instead that it is a higher-level approximation of an underlying deterministic reality. “Deterministic?! But hasn’t quantum mechanics been shown conclusively to be indeterministic??” Well, not entirely. For a while there was a common sentiment amongst physicists that John Von Neumann and others had proved beyond a doubt that no deterministic theory could make the predictions that quantum mechanics makes. Later subtle mistakes were found in these purported proofs that left a door open for determinism. Today there are well-known fleshed-out hidden variable theories that successfully reproduce the predictions of quantum mechanics, and do so fully deterministically.
The most famous of these is certainly Bohmian mechanics, also called pilot wave theory. Here’s a nice video on it if you’d like to know more, complete with pretty animations. Bohmian mechanics is interesting, appear to work, give us the Born rule, and is probably empirically distinguishable from other theories (at least in principle). A serious issue with it is that it requires nonlocality, which is a challenge to any attempt to make it consistent with special relativity. Locality is such an important and well-understood feature of our reality that this constitutes a major challenge to the approach.
Many-Worlds / Everettian Interpretations
Ok, finally we talk about the approach that is most interesting in my opinion, and get to the title of this post. The Many-Worlds interpretation says, in essence, that we were wrong to ever want more than the Schrödinger equation. This is the only law that governs reality, and it gives us everything we need. Many-Worlders deny that superpositions ever collapse. The result of us performing a measurement on a system in superposition is simply that we end up in superposition, and that’s the whole story!
So superpositions never collapse, they just go deeper into superposition. There’s not just one you, there’s every you, spread across the different branches of the wave function of the universe. All these yous exist beside each other, living out all your possible life histories.
But then where does Many-Worlds get the Born rule from? Well, uh, it’s kind of a mystery. The Born rule isn’t an additional law of physics, because the Schrödinger equation is supposed to be the whole story. It’s not an a priori rule of rationality, because as we said before probabilities could have easily gone as the fourth power of amplitudes, or something else entirely. But if it’s not an a posteriori fact about physics, and also not an a priori knowable principle of rationality, then what is it?
This issue has seemed to me to be more and more important and challenging for Many-Worlds the more I have thought about it. It’s hard to see what exactly the rule is even saying in this interpretation. Say I’m about to make a measurement of a system in a superposition of states A and B. Suppose that I know the amplitude of A is much smaller than the amplitude of B. I need some way to say “I have a strong expectation that I will observe B, but there’s a small chance that I’ll see A.” But according to Many-Worlds, a moment from now both observations will be made. There will be a branch of the superposition in which I observe A, and another branch in which I observe B. So what I appear to need to say is something like “I am much more likely to be the me in the branch that observes B than the me that observes A.” But this is a really strange claim that leads us straight into the thorny philosophical issue of personal identity.
In what sense are we allowed to say that one and only one of the two resulting humans is really going to be you? Don’t both of them have equal claim to being you? They each have your exact memories and life history so far, the only difference is that one observed A and the other B. Maybe we can use anthropic reasoning here? If I enter into a superposition of observing-A and observing-B, then there are now two “me”s, in some sense. But that gives the wrong prediction! Using the self-sampling assumption, we’d just say “Okay, two yous, so there’s a 50% chance of being each one” and be done with it. But obviously not all binary quantum measurements we make have a 50% chance of turning out either way!
Maybe we can say that the world actually splits into some huge number of branches, maybe even infinite, and the fraction of the total branches in which we observe A is exactly the square of the amplitude of A? But this is not what the Schrödinger equation says! The Schrödinger equation tells exactly what happens after we make the observation: we enter a superposition of two states, no more, no less. We’re importing a whole lot into our interpretive apparatus by interpreting this result as claiming the literal existence of an infinity of separate worlds, most of which are identical, and the distribution of which is governed by the amplitudes.
What we’re seeing here is that Many-Worlds, by being too insistent on the reality of the superposition, the sole sovereignty of the Schrödinger equation, and the unreality of collapse, ends up running into a lot of problems in actually doing what a good theory of physics is supposed to do: making empirical predictions. The Many-Worlders can of course use the Born Rule freely to make predictions about the outcomes of experiments, but they have little to say in answer to what, in their eyes, this rule really amounts to. I don’t know of any good way out of this mess.
Basically where this leaves me is where I find myself with all of my favorite philosophical topics; totally puzzled and unsatisfied with all of the options that I can see.
There are many principles of rational choice that seem highly intuitively plausible at first glance. Sometimes upon further reflection we realize that a principle we initially endorsed is not quite as appealing as we first thought, or that it clashes with other equally plausible principles, or that it requires certain exceptions and caveats that were not initially apparent. We can think of many debates in philosophy as clashes of these general principles, where the thought experiments generated by philosophers serve as the data that put on display their relative merits and deficits. In this essay, I’ll explore a variety of different principles for rational decision making and consider the ways in which they satisfy and frustrate our intuitions. I will focus in especially on the notion of reflective consistency, and see what sort of decision theory results from treating this as our primary desideratum.
I want to start out by illustrating the back-and-forth between our intuitions and the general principles we formulate. Consider the following claim, known as the dominance principle: If a rational agent believes that doing A is better than not doing A in every possible world, then that agent should do A even if uncertain about which world they are in. Upon first encountering this principle, it seems perfectly uncontroversial and clearly valid. But now consider the following application of the dominance principle:
“A student is considering whether to study for an exam. He reasons that if he will pass the exam, then studying is wasted effort. Also, if he will not pass the exam, then studying is wasted effort. He concludes that because whatever will happen, studying is wasted effort, it is better not to study.” (Titelbaum 237)
The fact that this is clearly a bad argument casts doubt on the dominance principle. It is worth taking a moment to ask what went wrong here. How did this example turn our intuitions on their heads so completely? Well, the flaw in the student’s line of reasoning was that he was ignoring the effect of his studying on whether or not he ends up passing the exam. This dependency between his action and the possible world he ends up in should be relevant to his decision, and it apparently invalidates his dominance reasoning.
A restricted version of the dominance principle fixes this flaw: If a rational agent prefers doing A to not doing A in all possible worlds and which world they are in is independent of whether they do A or not, then that agent should do A even if they are uncertain about which world they are in. I’ll call this the simple dominance principle. This principle is much harder to disagree with than our starting principle, but the caveat about independence greatly limits its scope. It applies only when our uncertainty about the state of the world is independent of our decision, which is not the case in most interesting decision problems. We’ll see by the end of this essay that even this seemingly obvious principle can be made to conflict with another intuitively plausible principle of rational choice.
The process of honing our intuitions and fine-tuning our principles like this is sometimes called seeking reflective consistency, where reflective consistency is the hypothetical state you end up in after a sufficiently long period of consideration. Reflective consistency is achieved when you have reached a balance between your starting intuitions and other meta-level desiderata like consistency and simplicity, such that your final framework is stable against further intuition-pumping. This process has parallels in other areas of philosophy such as ethics and epistemology, but I want to suggest that it is particularly potent when applied to decision theory. The reason for this is that a decision theory makes recommendations for what action to take in any given setup, and we can craft setups where the choice to be made is about what decision theory to adopt. I’ll call these setups self-reflection problems. By observing what choices a decision theory makes in self-reflection problems, we get direct evidence about whether the decision theory is reflectively consistent or not. In other words, we don’t need to do all the hard work of allowing thought experiments to bump our intuitions around; we can just take a specific decision algorithm and observe how it behaves upon self-reflection!
What we end up with is the following principle: Whatever decision theory we end up endorsing should be self-recommending. We should not end up in a position where we endorse decision theory X as the final best theory of rational choice, but then decision theory X recommends that we abandon it for some other decision theory that we consider less rational.
The connection between self-recommendation and reflective consistency is worth fleshing out in a little more detail. I am not saying that self-recommendation is sufficient for reflective consistency. A self-recommending decision theory might be obviously in contradiction with our notion of rational choice, such that any philosopher considering this decision theory would immediately discard it as a candidate. Consider, for instance, alphabetical decision theory, which always chooses the option which comes alphabetically first in its list of choices. When faced with a choice between alphabetical decision theory and, say, evidential decision theory, alphabetical decision theory will presumably choose itself, reasoning that ‘a’ comes before ‘e’. But we don’t want to call this a virtue of alphabetical decision theory. Even if it is uniformly self-recommending, alphabetical decision theory is sufficiently distant from any reasonable notion of rational choice that we can immediately discard it as a candidate.
On the other hand, even though not all self-recommending theories are reflectively consistent, any reflectively consistent decision theory must be self-recommending. Self-recommendation is a necessary but not sufficient condition for an adequate account of rational choice.
Now, it turns out that this principle is too strong as I’ve phrased it and requires a few caveats. One issue with it is what I’ll call the problem of unfair decision problems. For example, suppose that we are comparing evidential decision theory (henceforth EDT) to causal decision theory (henceforth CDT). (For the sake of time and space, I will assume as background knowledge the details of how each of these theories work.) We put each of them up against the following self reflection problem:
An omniscient agent peeks into your brain. If they see that you are an evidential decision theorist, they take all your money. Otherwise they leave you alone. Before they peek into your brain, you have the ability to modify your psychology such that you become either an evidential or causal decision theorist. What should you do?
EDT reasons as follows: If I stay an EDT, I lose all my money. If I self-modify to CDT, I don’t. I don’t want to lose all my money, so I’ll self-modify to CDT. So EDT is not self-recommending in this setup. But clearly this is just because the setup is unfairly biased against EDT, not because of any intrinsic flaw in EDT. In fact, it’s a virtue of a decision theory to not be self-recommending in such circumstances, as doing so indicates a basic awareness of the payoff structure of the world it faces.
While this certainly seems like the right thing to say about this particular decision problem, we need to consider how exactly to formalize this intuitive notion of “being unfairly biased against a decision theory.” There are a few things we might say here. For one, the distinguishing feature of this setup seems to be that the payout is determined not based off the decision made by an agent, but by their decision theory itself. This seems to be at the root of the intuitive unfairness of the problem; EDT is being penalized not for making a bad decision, but simply for being EDT. A decision theory should be accountable for the decisions it makes, not for simply being the particular decision theory that it happens to be.
In addition, by swapping “evidential decision theory” and “causal decision theory” everywhere in the setup, we end up arriving at the exact opposite conclusion (evidential decision theory looks stable, while causal decision theory does not). As long as we don’t have any a priori reason to consider one of these setups more important to take into account than the other, then there is no net advantage of one decision theory over the other. If a decision problem belongs to a set of equally a priori important problems obtained by simply swapping out the terms for different decision theories, and no decision theory comes out ahead on the set as a whole, then perhaps we can disregard the entire set for the purposes of evaluating decision theories.
The upshot of all of this is that what we should care about is decision problems that don’t make any direct reference to a particular decision theory, only to decisions. We’ll call such problems decision-determined. Our principle then becomes the following: Whatever decision theory we end up endorsing should be self-recommending in all decision-determined problems.
There’s certainly more to be said about this principle and if any other caveats need be applied to it, but for now let’s move on to seeing what we end up with when we apply this principle in its current state. We’ll start out with an analysis of the infamous Newcomb problem.
You enter a room containing two boxes, one opaque and one transparent. The transparent box contains $1,000. The opaque box contains either $0 or $1,000,000. Your choice is to either take just the opaque box (one-box) or to take both boxes (two-box). Before you entered the room, a predictor scanned your brain and created a simulation of you to see what you would do. If the simulation one-boxed, then the predictor filled the opaque box with $1,000,000. If the simulation two-boxed, then the opaque box was left empty. What do you choose?
EDT reasons as follows: If I one-box, then this gives me strong evidence that I have the type of brain that decides to one-box, which gives me strong evidence that the predictor’s simulation of me one-boxed, which in turn gives me strong evidence that the opaque box is full. So if I one-box, I expect to get $1,000,000. On the other hand, if I two-box, then this gives me strong evidence that my simulation two-boxed, in which case the opaque box is empty. So if I two-box, I expect to get only $1,000. Therefore one-boxing is better than two-boxing.
CDT reasons as follows: Whether the opaque box is full or empty is already determined by the time I entered the room, so my decision has no causal effect upon the box’s contents. And regardless of the contents of the box, I always expect to leave $1,000 richer by two-boxing than by one-boxing. So I should two-box.
At this point it’s important to ask whether Newcomb’s problem is a decision determined problem. After all, the predictor decides whether to fill the transparent box by scanning your brain and stimulating you. Isn’t that suspiciously similar to our earlier example of penalizing agents based off their decision theory? No. The simulator decides what to do not by evaluating your decision theory, but by its prediction about your decision. You aren’t penalized for being a CDT, just for being the type of agent that one-boxes. To see this you only need to observe that any decision theory that one-boxes would be treated identically to CDT in this problem. The determining factor is the decision, not the decision theory.
Now, let’s make the Newcomb problem into a test of reflective consistency. Instead of your choice being about whether to one-box or to two-box while in the room, your choice will now take place before you enter the room, and will be about whether to be an evidential decision theorist or a causal decision theorist when in the room. What does each theory do?
EDT’s reasoning: If I choose to be an evidential decision theorist, then I will one-box when in the room. The predictor will simulate me as one-boxing, so I’ll end up walking out with $1,000,000. If I choose to be a causal decision theorist, then I will two-box when in the room, the predictor will predict this, and I’ll walk out with only $1,000. So I will stay an EDT.
Interestingly, CDT agrees with this line of reasoning. The decision to be an evidential or causal decision theorist has a causal effect on how the predictor’s simulation behaves, so a causal decision theorist sees that the decision to stay a causal decision theorist will end up leaving them worse off than if they had switched over. So CDT switches to EDT. Notice that in CDT’s estimation, the decision to switch ends up making them $999,000 better off. This means that CDT would pay up to $999,000 just for the privilege of becoming an evidential decision theorist!
I think that looking at an actual example like this makes it more salient why reflective consistency and self-recommendation is something that we actually care about. There’s something very obviously off about a decision theory that knows beforehand that it will reliably perform worse than its opponent, so much so that it would be willing to pay up to $999,000 just for the privilege of becoming its opponent. This is certainly not the type of behavior that we associate with a rational agent that trusts itself to make good decisions.
Classically, this argument has been phrased in the literature as the “why ain’tcha rich?” objection to CDT, but I think that the objection goes much deeper than this framing would suggest. There are several plausible principles that all apply here, such as that a rational decision maker shouldn’t regret having the decision theory they have, a rational decision maker shouldn’t pay to limit their future options, and a rational decision maker shouldn’t pay to decrease the values in their payoff matrix. The first of these is fairly self-explanatory. One influential response to it has been from James Joyce, who said that the causal decision theorist does not regret their decision theory, just the situation they find themselves in. I’d suggest that this response makes little sense when the situation the find themselves in is a direct result of their decision theory. As for the second and third of these, we could imagine giving a causal decision theorist the choice to pay money to remove the future possibility of two-boxing, or to hire a gunman that would shoot them if they try to take the transparent box. In each of these cases, CDT would endorse the decision to pay. I mention these other principles just to suggest that we have sound philosophical reason to care about self-recommendation independent of the reflective consistency considerations that we started out by discussing.
The takeaway from this modified Newcomb problem is that CDT is not reflectively consistent, and that it will self-modify to EDT in the right circumstances. Does this mean that EDT is more reflectively consistent than CDT? It turns out that no, this is not the case. We can construct another thought experiment in which EDT self-modifies to become CDT, called the Newcomb’s Soda problem:
You were just given either Soda 1 or Soda 2, which are indistinguishable. There was a 50% chance of each being the type you were given. You are now given a choice as to whether to choose chocolate ice cream or vanilla ice cream. Those that had Soda 1 have a strong unconscious inclination to choose chocolate ice cream and will be given $1,000,000 after they choose the ice cream. Those that had Soda 2 have a strong unconscious inclination to choose vanilla ice cream and are given nothing. If you choose vanilla ice cream, you get $1,000. What do you choose?
EDT reasons that choosing chocolate ice cream gives them strong evidence that they were given Soda 1, in which case they will be given $1,000,000. So they would be willing to give up the $1000 reward for choosing vanilla ice cream, in order to increase their chances of getting the million. CDT reasons that what soda they were given is a fixed fact that isn’t causally influenced by the decision they make. So they have a 50% chance of getting the million regardless of their choice, but choosing to take the vanilla guarantees them an additional $1000. Therefore they take vanilla.
This is clearly a decision-determined problem, so let’s see how EDT and CDT do upon self-reflection. Given the chance to modify themselves to EDT before being given the soda, a causal decision theorist would see no advantage to doing so. After all, being an evidential decision theorist doesn’t make you more or less likely to get the $1,000,000, it just ensures that you won’t get the $1000. And the evidential decision theorist agrees! By changing to causal decision theory, they guarantee themselves an extra $1000 while not making it any more or less likely that they’ll be given the chocolate soda (and accordingly, the million dollars).
The upshot of this is that both CDT and EDT are reflective inconsistent in the class of decision determined problems. What we seek, then, is a new decision theory that behaves like EDT in the Newcomb problem and like CDT in Newcomb’s Soda. One such theory was pioneered by machine learning researcher Eliezer Yudkowsky, who named it timeless decision theory (henceforth TDT). To deliver different verdicts in the two problems, we must find some feature that allows us to distinguish between their structure. TDT does this by distinguishing between the type of correlation arising from ordinary common causes (like the soda in Newcomb’s Soda) and the type of correlation arising from faithful simulations of your behavior (as in Newcomb’s problem).
This second type of correlation is called logical dependence, and is the core idea motivating TDT. The simplest example of this is the following: two twins, physically identical down to the atomic level, raised in identical environments in a deterministic universe, will have perfectly correlated behavior throughout the lengths of their lives, even if they are entirely causally separated from each other. This correlation is apparently not due to a common cause or to any direct causal influence. It simply arises from the logical fact that two faithful instantiations of the same function will return the same output when fed the same input. Considering the behavior of a human being as an instantiation of an extremely complicated function, it becomes clear why you and your parallel-universe twin behave identically: you are instantiations of the same function! We can take this a step further by noting that two functions can have a similar input-output structure, in which case the physical instantiations of each function will have correlated input-output behavior. This correlation is what’s meant by logical dependence.
To spell this out a bit further, imagine that in a far away country, there are factories that sell very simple calculators. Each calculator is designed to only run only one specific computation. Some factories are multiplication-factories; they only sell calculators that compute 713*291. Others are addition-factories; they only sell calculators that compute 713+291. You buy two calculators from one of these factories, but you’re not sure which type of factory you purchased from. Your credences are 50/50 split between the factory you purchased from being a multiplication-factory and it being an addition-factory. You also have some logical uncertainty regarding what the value of 713*291 is. You are evenly split between the value being 207,481 and the value being 207,483. On the other hand, you have no uncertainty about what the value of 713+291 is; you know that it is 1004.
Now, you press “ENTER” on one of the two calculators you purchased, and find that the result is 207,483. For a rational reasoner, two things should now happen: First, you should treat this result as strong evidence that the factory from which both calculators were bought was a multiplication-factory, and therefore that the other calculator is also a multiplier. And second, you should update strongly on the other calculator outputting 207,483 rather than 207,481, since two calculators running the same computation will output the same result.
The point of this example is that it clearly separates out ordinary common cause correlation from a different type of dependence. The common cause dependence is what warrants you updating on the other calculator being a multiplier rather than an adder. But it doesn’t warrant you updating on the result on the other calculator being specifically 207,483; to do this, we need the notion of logical dependence, which is the type of dependence that arises whenever you encounter systems that are instantiating the same or similar computations.
Connecting this back to decision theory, TDT treats our decision as the output of a formal algorithm, which is our decision-making process. The behavior of this algorithm is entirely determined by its logical structure, which is why there are no upstream causal influences such as the soda in Newcomb’s Soda. But the behavior of this algorithm is going to be correlated with the parts of the universe that instantiate a similar function (as well as the parts of the universe it has a causal influence on). In Newcomb’s problem, for example, the predictor generates a detailed simulation of your decision process based off of a brain scan. This simulation of you is highly logically correlated with you, in that it will faithfully reproduce your behavior in a variety of situations. So if you decide to one-box, you are also learning that your simulation is very likely to one-box (and therefore that the opaque box is full).
Notice that the exact mechanism by which the predictor operates becomes very important for TDT. If the predictor operates by means of some ordinary common cause where no logical dependence exists, TDT will treat its prediction as independent of your choice. This translates over to why TDT behaves like CDT on Newcomb’s Soda, as well as other so-called “medical Newcomb problems” such as the smoking lesion problem. When the reason for the correlation between your behavior and the outcome is merely that both depend on a common input, TDT treats your decision as an intervention and therefore independent of the outcome.
One final way to conceptualize TDT and the difference between the different types of correlation is using structural equation modeling:
Direct causal dependence exists between A and B when A is a function of B or when B is a function of A.
> A = f(B) or B = g(A)
Common cause dependence exists between A and B when A and B are both functions of some other variable C.
> A = f(C) and B = g(C)
Logical dependence exists between A and B when A and B depend on their inputs in similar ways.
> A = f(C) and B = f(D)
TDT takes direct causal dependence and logical dependence seriously, and ignores common cause dependence. We can formally express this by saying that TDT calculates the expected utility of a decision by treating it like a causal intervention and fixing the output of all other instantiations of TDT to be identical interventions. Using Judea Pearl’s do-calculus notation for causal intervention, this looks like:
Here K is the TDT agent’s background knowledge, D is chosen from a set of possible decisions, and the sum is over all possible worlds. This equation isn’t quite right, since it doesn’t indicate what to do when the computation a given system instantiates is merely similar to TDT but not logically identical, but it serves as a first approximation to the algorithm.
You might notice that the notion of logical dependence depends on the idea of logical uncertainty, as without it the result of the computations would be known with certainty as soon as you learn that the calculators came out of a multiplication-factory, without ever having to observe their results. Thus any theory that incorporates logical dependence into its framework will be faced with a problem of logical omniscience, which is to say, it will have to give some account of how to place and update reasonable probability distributions over tautologies.
The upshot of all of this is that TDT is reflectively consistent on a larger class of problems than both EDT and CDT. Both EDT and CDT would self-modify into TDT in Newcomb-like problems if given the choice. Correspondingly, if you throw a bunch of TDTs and EDTs and CDTs into a world full of Newcomb and Newcomb-like problems, the TDTs will come out ahead. However, it turns out that TDT is not itself reflectively consistent on the whole class of decision-determined problems. Examples like the transparent Newcomb problem, Parfit’s hitchhiker, and counterfactual mugging all expose reflective inconsistency in TDT.
Let’s look at the transparent Newcomb problem. The structure is identical to a Newcomb problem (you walk into a room with two boxes, $1000 in one and either $1000000 or $0 in the other, determined based on the behavior of your simulation), except that both boxes are transparent. This means that you already know with certainty the contents of both boxes. CDT two-boxes here like always. EDT also two-boxes, since any dependence between your decision and the box’s contents is made irrelevant as soon as you see the contents. TDT agrees with this line of reasoning; even though it sees a logical dependence between your behavior and your simulation’s behavior, knowing whether the box is full or empty fully screens off this dependence.
Two-boxing feels to many like the obvious rational choice here. The choice you face is simply whether to take $1,000,000 or $1,001,000 if the box is full. If it’s empty, your choice is between taking $1,000 or walking out empty-handed. But two-boxing also has a few strange consequences. For one, imagine that you are placed, blindfolded, in a transparent Newcomb problem. You can at any moment decide to remove your blindfold. If you are an EDT, you will reason that if you don’t remove your blindfold, you are essentially in an ordinary Newcomb problem, so you will one-box and correspondingly walk away with $1,000,000. But if you do remove your blindfold, you’ll end up two-boxing and most likely walking away with only $1000. So an EDT would pay up to $999,000, just for the privilege of staying blindfolded. This seems to conflict with an intuitive principle of rational choice, which goes something like: A rational agent should never expect to be worse off by simply gaining information. Paying money to keep yourself from learning relevant information seems like a sure sign of a pathological decision theory.
Of course, there are two ways out of this. One way is to follow the causal decision theorist and two-box in both the ordinary Newcomb problem and the transparent problem. This has all the issues that we’ve already discussed, most prominently that you end up systematically and predictably worse off by doing so. If you pit a causal decision theorist against an agent that always one-boxes, even in transparent Newcomb problems, CDT ends up the poorer. And since CDT can reason this through beforehand, they would willingly self-modify to become the other type of agent.
What type of agent is this? None of the three decision theories we’ve discussed give the reflectively consistent response here, so we need to invent a new decision theory. The difficulty with any such theory is that it has to be able to justify sticking to its guns and one-boxing even after conditioning on the contents of the box.
In general, similar issues will arise whenever the recommendations made by a decision theory are not time-consistent. For instance, the decision that TDT prescribes for an agent with background knowledge K depends heavily on the information that TDT has at the time of prescription. This means that at different times, TDT will make different recommendations for what to do in the same situation (before entering the room TDT recommends one-boxing once in the room, while after entering the room TDT recommends two-boxing). This leads to suboptimal performance. Agents that can decide on one course of action and credibly precommit to it get certain benefits that aren’t available to agents that don’t have this ability. I think the clearest example of this is Parfit’s hitchhiker:
You are stranded in the desert, running out of water, and soon to die. A Predictor approaches and tells you that they will drive you to town only if they predict you will pay them $100 once you get there.
All of EDT, CDT, and TDT wish that they could credibly precommit to paying the money once in town, but can’t. Once they are in town they no longer have any reason to pay the $100. The fact that EDT, CDT, and TDT all have time-sensitive recommendations makes them worse off, leaving them all stranded in the desert to die. Each of these agents would willingly switch to a decision theory that doesn’t change their recommendations over time. How would such a decision theory work? It looks like we need a decision theory that acts as if it doesn’t know whether they’re in town even once in town, and acts as if it doesn’t know the contents of the box evenafter seeing them. One strategy for achieving this behavior is simple; you just decide on your strategy without ever conditioning on the fact that you are in town!
The decision theory that arises from this choice is appropriately named updateless decision theory (henceforth UDT). UDT is peculiar in that it never actually updates on any information when determining how to behave. That’s not to say that for UDT, the decision you make does not depend on the information you get through your lifetime. Instead, UDT tells you to choose a policy – a mapping from the possible pieces of information you might receive to possible decisions you could make – that maximizes the expected utility, calculated using your prior on possible worlds. This policy is set from time zero and never changes, and it determines how the UDT agent responds to any information they might receive at later points. So, for instance, a UDT agent reasons that adopting a policy of one-boxing in the transparent Newcomb case regardless of what you see maximizes expected utility as calculated using your prior. So once the UDT agent is in the room with the transparent box, it one-boxes. We can formalize this this by analogy with TDT:
One concern with this approach is that a UDT agent might end up making silly decisions as a result of not taking into account information that is relevant to their decisions. But once again, the UDT agent does take into account the information they learn in their lifetime. It’s merely that they decide what to do with that information before receiving it and never update this prescription. For example, suppose that a UDT agent anticipates facing exactly one decision problem in their life, regarding whether to push a button or not. They have a 50% prior credence that pushing the button will result in the loss of $10, and 50% that it will result in gaining $10. Now, at some point before they decide to push the button, they are given the information about whether pushing the button causes you to gain $10 or to lose $10. UDT deals with this by choosing a policy for how to respond to that information in either case. The expected utility maximizing policy here would be to push the button if you learn that pushing the button leads to gaining $10, and to not push the button if you learn the opposite.
Since UDT chooses its preferred policy based on its prior, this recommendation never changes throughout a UDT agent’s lifetime. This seems to indicate that UDT will be self-recommending in the class of all decision-determined problems, although I’m not aware of a full proof of this. If this is correct, then we have reached our goal of finding a self-recommending decision theory. It is interesting to consider what other principles of rational choice ended up being violated along the way. The simple dominance principle that we started off by discussing appears to be an example of this. In the transparent Newcomb problem, there is only one possible world that the agent considers when in the room (the one in which the box is full, say), and in this one world, two-boxing dominates one-boxing. Given that the box is full, your decision to one-box or to two-box is completely independent of the box’s contents. So the simple dominance principle recommends two-boxing. But UDT disagrees.
Another example of a deeply intuitive principle that UDT violates is the irrelevance of impossible outcomes. This principles says that impossible outcomes should not factor into your decision-making process. But UDT seems to often recommend acting as if some impossible world might come to be. For instance, suppose a predictor walks up to you and gives you a choice to either give them $10 or to give them $100. You will not face any future consequences on the basis of your decision (besides whether you’re out $100 or only $10). However, you learn that the predictor only approached you because it predicted that you would give the $10. Do you give the $10 or the $100? UDT recommends giving the $100, because agents that do so are less likely to have been approached by the predictor. But if you’ve already been approached, then you are letting considerations about an impossible world influence your decision process!
Our quest for reflective consistency took us from EDT and CDT to timeless decision theory. TDT used the notion of logical dependence to get self-recommending behavior in the Newcomb problem and medical Newcomb cases. But we found that TDT was itself reflectively inconsistent in problems like the Transparent Newcomb problem. This led us to create a new theory that made its recommendations without updating on information, which we called updateless decision theory. UDT turned out to be a totally static theory, setting forth a policy determining how to respond to all possible bits of information and never altering this policy. The unchanging nature of UDT indicates the possibility that we have found a truly self-recommending decision theory, while also leading to some quite unintuitive consequences.