Backwards induction and rationality

A fun problem I recently came across:

Consider two players: Alice and Bob. Alice moves first. At the start of the game, Alice has two piles of coins in front of her: one pile contains 4 coins and the other pile contains 1 coin. Each player has two moves available: either “take” the larger pile of coins and give the smaller pile to the other player or “push” both piles across the table to the other player. Each time the piles of coins pass across the table, the quantity of coins in each pile doubles. For example, assume that Alice chooses to “push” the piles on her first move, handing the piles of 1 and 4 coins over to Bob, doubling them to 2 and 8. Bob could now use his first move to either “take” the pile of 8 coins and give 2 coins to Alice, or he can “push” the two piles back across the table again to Alice, again increasing the size of the piles to 4 and 16 coins. The game continues for a fixed number of rounds or until a player decides to end the game by pocketing a pile of coins.

(from the wiki)

(Assume that if the game gets to the final round and the last player decides to “push”, the pot is doubled and they get the smaller pile.)

Assuming that they are self-interested, what do you think is the rational strategy for each of Alice and Bob to adopt? What is the rational strategy if they each know that the other reasons about decision-making in the same way that they themselves do? And what happens if two updateless decision theorists are pitted against each other?


If you have some prior familiarity with game theory, you might have seen the backwards induction proof right away. It turns out that standard game theory teaches us that the Nash equilibrium is to defect as soon as you can, thus never exploiting the “doubling” feature of the setup.

Why? Supposing that you have made it to the final round of the game, you stand to get a larger payout by “defecting” and taking the larger pile rather than the doubled smaller pile. But your opponent knows that you’ll reason this way, so they reason that they are better off defecting the round before… and so on all the way to the first round.

This sucks. The game ends right away, and none of that exponential goodness gets taken advantage of. If only Alice and Bob weren’t so rational! 

We can show that this conclusion follows as long as the three things are true of Alice and Bob:

  1. Given a choice between a definite value A and a smaller value B, both Alice and Bob will choose the larger value (A).
  2. Both Alice and Bob can accurately perform deductive reasoning.
  3. Both (1.) and (2.) are common knowledge to Alice and Bob.

It’s pretty hard to deny the reasonableness of any of these three assumptions!


Here’s a related problem:

An airline loses two suitcases belonging to two different travelers. Both suitcases happen to be identical and contain identical antiques. An airline manager tasked to settle the claims of both travelers explains that the airline is liable for a maximum of $100 per suitcase—he is unable to find out directly the price of the antiques.

To determine an honest appraised value of the antiques, the manager separates both travelers so they can’t confer, and asks them to write down the amount of their value at no less than $2 and no larger than $100. He also tells them that if both write down the same number, he will treat that number as the true dollar value of both suitcases and reimburse both travelers that amount. However, if one writes down a smaller number than the other, this smaller number will be taken as the true dollar value, and both travelers will receive that amount along with a bonus/malus: $2 extra will be paid to the traveler who wrote down the lower value and a $2 deduction will be taken from the person who wrote down the higher amount. The challenge is: what strategy should both travelers follow to decide the value they should write down?

(again, from the wiki)

Suppose you put no value on honesty, and only care about getting the most money possible. Further, suppose that both travelers reason the same way about decision problems, and that they both know this fact (and that they both know that they both know this fact, and so on).

The first intuition you might have is that both should just write down $100. But if you know that your partner is going to write down $100, then you stand to gain one whole dollar by defecting and writing $99 (thus collecting the $2 bonus for a total of $101). But if they know that you’re going to write $99, then they stand to gain one whole dollar by defecting and writing $98 (thus netting $100). And so on.

In the end both of these unfortunate “rational” individuals end up writing down $2. Once again, we see the tragedy of being a rational individual.


Of course, we could take these thought experiments to be an indication not of the inherent tragedy of rationality, but instead of the need for a better theory of rationality.

For instance, you might have noticed that the arguments we used in both cases relied on a type of reasoning where each agent assumes that they can change their decision, holding fixed the decision of the other agent. This is not a valid move in general, as it assumes independence! It might very well be that the information about what decision you make is relevant to your knowledge about what the other agent’s decision will be. In fact, when we stipulated that you reason similarly to the other agent, we are in essence stipulating an evidential relationship between your decision and theirs! So the arguments we gave above need to be looked at more closely.

If the agents do end up taking into account their similarity, then their behavior is radically different. For example, we can look at the behavior of updateless decision theory: two UDTs playing each other in the Centipede game “push” every single round (including the final one!), thus ending up with exponentially higher rewards (on the order of $2N, where N is the number of rounds). And two UDTs in the Traveller’s Dilemma would write down $100, thus both ending up roughly $98 better off than otherwise. So perhaps we aren’t doomed to a gloomy view of rationality as a burden eternally holding us back!


 

One final problem.

Two players, this time with just one pile of coins in front of them. Initially this pile contains just 1 coin. The players take turns, and each turn they can either take the whole pile or push it to the other side, in which case the size of the pile will double. This will continue for a fixed number of rounds or until a player ends the game by taking the pile.

On the final round, the last player has a choice of either taking all the coins or pushing them over, thus giving the entire doubled pile to their opponent. Both players are perfectly self-interested, and this fact is common knowledge. And finally, suppose that who goes first is determined by a coin flip.

Standard decision theory obviously says that the first person should just take the 1 coin and the game ends there. What would UDT do here? What do you think is the rational policy for each player?

Firing Squads and The Fine Tuning Argument

I’m confused about how satisfactory a multiverse is as an alternative explanation for the fine-tuning of our universe (alternative to God, that is).

My initial intuition about this is that it is a perfectly satisfactory explanation. It looks like we can justify this on Bayesian grounds by noting that the probability of the universe we’re in being fine-tuned for intelligent life given that there is a multiverse is nearly 1. The probability of fine-tuning given God is also presumably nearly 1, so the observation of fine-tuning shouldn’t push us much in one direction or other.

(Obligatory photo of the theorem doing the work here)

bayes.png

But here’s another argument I’m aware of: A firing squad of twenty sharpshooters aims at you and fires. They all miss. You are obviously very surprised by this. But now somebody comes up to you and tells you that in fact there is a multiverse full of “you”s in identical situations. They all faced down the firing squad, and the vast majority of them died. Now, given that you exist to ask the question, of COURSE you are in the universe in which they all missed. So should you be no longer surprised?

I take it the answer to this is “No, even though I know that I could only be alive right now asking this question if the firing squad missed, this doesn’t remove any mystery from the firing squad missing. It’s exactly as mysterious that I am alive right now as that the firing squad missed, so my existence doesn’t lessen the explanatory burden we face.

The firing squad situation seems exactly parallel to the fine-tuning of the universe. We find ourselves in a universe that is remarkably fine tuned in a way that seems extremely a priori improbable. Now we’re told that there are in fact a massive number of universes out there, the vast majority of which are devoid of life. So of course we exist in one of the universes that is fine-tuned for our existence.

Let’s make this even more intuitive: The earth exists in a Goldilocks zone around the Sun. Too much closer or further away and life would not be possible. Maybe this was mysterious at some point when humans still thought that there was just one solar system in the universe. But now we know that galaxies contain hundreds of billions of solar systems, most of which probably don’t have any planets in their Goldilocks zones. And with this knowledge, the mystery entirely disappears. Of course we’re on a planet that can support life, where else would we be??

So my question is: Why does this argument feel satisfactory in the fine-tuning and Goldilocks examples but not the firing squad example?

A friend I asked about this responded:

if you modify the firing squad scenario so that you don’t exist prior to the shooting and are only brought into existence if they all miss does it still feel less satisfactory then the multiverse case?

And I responded that no, it no longer feels less satisfactory than the multiverse case! Somehow this tweak “fixes” the intuitions. This suggests that the relevant difference between the two cases is something about existence prior to the time of the thought experiment. But how do we formalize this difference? And why should it be relevant? I’m perplexed.

The EPR Paradox

The Paradox

I only recently realized how philosophical the original EPR paper was. It starts out by providing a sufficient condition for something to be an “element of reality”, and proceeds from there to try to show the incompleteness of quantum mechanics. Let’s walk through this argument here:

The EPR Reality Condition: If at time t we can know the value of a measurable quantity with certainty without in any way disturbing the system, then there is an element of reality corresponding to that measurable quantity at time t. (i.e. this is a sufficient condition for a measurable property of a system at some moment to be an element of the reality of that system at that moment:)

Example 1: If you measure an electron spin to be up in the z direction, then quantum mechanics tells you that you can predict with certainty that the spin in the z direction will up at any future measurement. Since you can predict this with certainty, there must be an aspect or reality corresponding to the electron z-spin after you have measured it to be up the first time.

Example 2: If you measure an electron spin to be up in the z-direction, then QM tells you that you cannot predict the result of measuring the spin in the x-direction at a later time. So the EPR reality condition does not entail that the x-spin is an element of the reality of this electron. It also doesn’t entail that the x-spin is NOT an element of the reality of this electron, because the EPR reality condition is merely a sufficient condition, not a necessary condition.

Now, what does the EPR reality condition have to say about two particles with entangled spins? Well, suppose the state of the system is initially

|Ψ> = (|↑↓ – |↓↑) / √2

This state has the unusual property that it has the same form no matter what basis you express it in. You can show for yourself that in the x-spin basis, the state is equal to

|Ψ> = (|→← – |←→) / √2

Now, suppose that you measure the first electron in the z-basis and find it to be up. If you do this, then you know with certainty that the other electron will also be measured to be up. This means that after measuring it in the z-basis, the EPR reality condition says that electron 2 has z-spin up as an element of reality.

What if you instead measure the first electron in the x-basis and find it to be right? Well, then the EPR reality condition will tell you that the electron 2 has x-spin right as an element of reality.

Okay, so we have two claims:

  1. That after measuring the z-spin of electron 1, electron 2 has a definite z-spin, and
  2. that after measuring the x-spin of electron 1, electron 2 has a definite x-spin.

But notice that these two claims are not necessarily inconsistent with the quantum formalism, since they refer to the state of the system after a particular measurement. What’s required to bring out a contradiction is a further assumption, namely the assumption of locality.

For our purposes here, locality just means that it’s possible to measure the spin of electron 1 in such a way as to not disturb the state of electron 2. This is a really weak assumption! It’s not saying that any time you measure the spin of electron 1, you will not have disturbed electron 2. It’s just saying that it’s possible in principle to set up a measurement of the first electron in such a way as to not disturb the second one. For instance, take electrons 1 and 2 to opposite sides of the galaxy, seal them away in totally closed off and causally isolated containers, and then measure electron 1. If you agree that this should not disturb electron 2, then you agree with the assumption of locality.

Now, with this additional assumption, Einstein Podolsky and Rosen realized that our earlier claims (1) and (2) suddenly come into conflict! Why? Because if it’s possible to measure the z-spin of electron 1 in a way that doesn’t disturb electron 2 at all, then electron 2 must have had a definite z-spin even before the measurement of electron 1!

And similarly, if it’s possible to measure the x-spin of electron 1 in a way that doesn’t disturb electron 2, then electron 2 must have had a definite x-spin before the first electron was measured!

What this amounts to is that our two claims become the following:

  1. Electron 2 has a definite z-spin at time t before the measurement.
  2. Electron 2 has a definite x-spin at time t before the measurement.

And these two claims are in direct conflict with quantum theory! Quantum mechanics refuses to assign a simultaneous x and z spin to an electron, since these are incompatible observables. This entails that if you buy into locality and the EPR reality condition, then you must believe that quantum mechanics is an incomplete description of nature, or in other words that there are elements of reality that can not described by quantum mechanics.

The Resolution(s)

Our argument rested on two premises: the EPR reality condition and locality. Its conclusion was that quantum mechanics was incomplete. So naturally, there are three possible paths you can take to respond: accept the conclusion, deny the second premise, or deny the first premise.

To accept the conclusion is to agree that quantum mechanics is incomplete. This is where hidden variable approaches fall, and was the path that Einstein dearly hoped would be vindicated. For complicated reasons that won’t be covered in this post, but which I talk about here, the prospects for any local realist hidden variables theory (which was what Einstein wanted) look pretty dim.

To deny the second premise is to say that in fact, measuring the spin of the first electron necessarily disturbs the state of the second electron, no matter how you set things up. This is in essence a denial of locality, since the two electrons can be time-like separated, meaning that this disturbance must have propagated faster than the speed of light. This is a pretty dramatic conclusion, but is what orthodox quantum mechanics in fact says. (It’s implied by the collapse postulate.)

To deny the first premise is to say that in fact there can be some cases in which you can predict with certainty a measurable property of a system, but where nonetheless there is no element of reality corresponding to this property. I believe that this is where Many-Worlds falls, since measurement of z-spin doesn’t result in an electron in an unambiguous z-spin state, but in a combined superposition of yourself, your measuring device, the electron, and the environment. Needless to say, in this complicated superposition there is no definite fact about the z-spin of the electron.

I’m a little unsure about where the right place to put psi-epistemic approaches like Quantum Bayesianism, which resolve the paradox by treating the wave function not as a description of reality, but solely as a description of our knowledge. In this way of looking at things, it’s not surprising that learning something about an electron at one place can instantly tell you something about an electron at a distant location. This does not imply any faster-than-light communication, because all that’s being described is the way that information-processing occurs in a rational agent’s brain.

Measurement without interaction in quantum mechanics

In front of you is a sealed box, which either contains nothing OR an incredibly powerful nuclear bomb, the explosion of which threatens to wipe out humanity permanently. Even worse, this bomb is incredibly unstable and will blow up at the slightest contact with a single photon. This means that anybody that opens the box to look inside and see if there really is a bomb in there would end up certainly activating it and destroying the world. We don’t have any way to deactivate the bomb, but we could maintain it in isolation for arbitrarily long, despite the prohibitive costs of totally sealing it off from all contact.

Now, for obvious reasons, it would be extremely useful to know whether or not the bomb is actually active. If it’s not, the world can breathe a sigh of relief and not worry about spending lots of money on keeping it sealed away. And if it is, we know that the money is worth spending.

The obvious problem is that any attempt to test whether there is a bomb inside will involve in some way interacting with the box’s contents. And as we know, any such interaction will cause the bomb to detonate! So it seems that we’re stuck in this unfortunate situation where we have to act in ignorance of the full details of the situation. Right?

Well, it turns out that there’s a clever way that you can use quantum mechanics to do an “interaction-free measurement” that extracts some information from the system without causing the bomb to explode!

To explain this quantum bomb tester, we have to first start with a simpler system, a classic quantum interferometer setup:

56290881_2206928609636330_5087758915278471168_n.jpg

At the start, a photon is fired from the laser on the left. This photon then hits a beam splitter, which deflects the path of the photon with probability 50% and otherwise does nothing. It turns out that a photon that gets deflected by the beam splitter will pick up a 90º phase, which corresponds to multiplying the state vector by exp(iπ/2) = i. Each path is then redirected to another beam splitter, and then detectors are aligned across the two possible trajectories.

What do we get? Well, let’s just go through the calculation:

56414061_2282711835312851_8013422270024253440_n.jpg

We get destructive interference, which results in all photons arriving at detector B.

Now, what happens if you add a detector along one of the two paths? It turns out that the interference vanishes, and you find half the photons at detector A and the other half at detector B! That’s pretty weird… the observed frequencies appear to depend on whether or not you look at which path the photon went on. But that’s not quite right, because it turns out that you still get the 50/50 statistics whenever you place anything along one path whose state is changed by the passing photon!

Huh, that’s interesting… it indicates that by just looking for a photon at detector A, we can get evidence as to whether or not something interacted with the photon on the way to the detector! If we see a photon show up at the detector, then we know that there must have been some device which changed in state along the bottom path. Maybe you can already see where we’re going with this…

56451973_332761710713682_6105714719035752448_n.jpg

We have to put the box in the bottom path in such a way that if the box is empty, then when the photon passes by, nothing will change about either its state or the state of the photon. And if the box contains the bomb, then it will function like a detector (where the detection corresponds to whether or not the bomb explodes)!

Now, assuming that the box is empty, we get the same result as above. Let’s calculate the result we get assuming that the box contains the bomb:

56500763_2329442300624457_6272255095300161536_n

Something really cool happens here! We find that if the bomb is active, there is a 25% chance that the photon arrives as A without the bomb exploding. And remember, the photon arriving at detector A allows us to conclude with certainty that the bomb is active! In other words, this setup gives us a 25% chance of safely extracting the information about if the bomb is active!

25% is not that good, you might object. But it sure is better than 0%! And in fact, it turns out that you can strengthen this result, using a more complicated interferometer setup to learn with certainty whether the bomb is active with an arbitrarily small chance of setting off the bomb!

There’s so many weird little things about quantum mechanics that defy our classical intuitions, and this “interaction-free measurements” is one of my new favorites.

Is the double slit experiment evidence that consciousness causes collapse?

No! No no no.

This might be surprising to those that know the basics of the double slit experiment. For those that don’t, very briefly:

A bunch of tiny particles are thrown one by one at a barrier with two thin slits in it, with a detector sitting on the other side. The pattern on the detector formed by the particles is an interference pattern, which appears to imply that each particle went through both slits in some sense, like a wave would do. Now, if you peek really closely at each slit to see which one each particle passes through, the results seem to change! The pattern on the detector is no longer an interference pattern, but instead looks like the pattern you’d classically expect from a particle passing through only one slit!

When you first learn about this strange dependence of the experimental results on, apparently, whether you’re looking at the system or not, it appears to be good evidence that your conscious observation is significant in some very deep sense. After all, observation appears to lead to fundamentally different behavior, collapsing the wave to a particle! Right?? This animation does a good job of explaining the experiment in a way that really pumps the intuition that consciousness matters:

(Fair warning, I find some aspects of this misleading and just plain factually wrong. I’m linking to it not as an endorsement, but so that you get the intuition behind the arguments I’m responding to in this post.)

The feeling that consciousness is playing an important role here is a fine intuition to have before you dive deep into the details of quantum mechanics. But now consider that the exact same behavior would be produced by a very simple process that is very clearly not a conscious observation. Namely, just put a single spin qubit at one of the slits in such a way that if the particle passes through that slit, it flips the spin upside down. Guess what you get? The exact same results as you got by peeking at the screen. You never need to look at the particle as it travels through the slits to the detector in order to collapse the wave-like behavior. Apparently a single qubit is sufficient to do this!

It turns out that what’s really going on here has nothing to do with the collapse of the wave function and everything to do with the phenomenon of decoherence. Decoherence is what happens when a quantum superposition becomes entangled with the degrees of freedom of its environment in such a way that the branches of the superposition end up orthogonal to each other. Interference can only occur between the different branches if they are not orthogonal, which means that decoherence is sufficient to destroy interference effects. This is all stuff that all interpretations of quantum mechanics agree on.

Once you know that decoherence destroys interference effects (which all interpretations of quantum mechanics agree on), and also that a conscious observing the state of a system is a process that results in extremely rapid and total decoherence (which everybody also agrees on), then the fact that observing the position of the particle causes interference effects to vanish becomes totally independent of the question of what causes wave function collapse. Whether or not consciousness causes collapse is 100% irrelevant to the results of the experiment, because regardless of which of these is true, quantum mechanics tells us to expect observation to result in the loss of interference!

This is why whether or not consciousness causes collapse has no real impact on what pattern shows up in the wall. All interpretations of quantum mechanics agree that decoherence is a thing that can happen, and decoherence is all that is required to explain the experimental results. The double slit experiment provides no evidence for consciousness causing collapse, but it also provides no evidence against it. It’s just irrelevant to the question! That said, however, given that people often hear the experiment presented in a way that makes it seem like evidence for consciousness causing collapse, hearing that qubits do the same thing should make them update downwards on this theory.

Decoherence is not wave function collapse

In the double slit experiment, particles travelling through a pair of thin slits exhibit wave-like behavior, forming an interference pattern where they land that indicates that the particles in some sense travelled through both slits.

54518134_416516682243379_7426065872885645312_n.jpg

Now, suppose that you place a single spin bit at the top slit, which starts off in the state |↑⟩ and flips to |↓⟩ iff a particle travels through the top slit. We fire off a single particle at a time, and then each time swap out that spin bit for a new spin bit that also starts off in the state |↑⟩. This serves as an extremely simple measuring device which encodes the information about which slit each particle went through.

Now what will you observe on the screen? It turns out that you’ll observe the classically expected distribution, which is a simple average over the two individual possibilities without any interference.

53857812_1014378252080175_1814900750900264960_n.jpg

Okay, so what happened? Remember that the first pattern we observed was the result of the particles being in a superposition over the two possible paths, and then interfering with each other on the way to the detector screen. So it looks like simply having one bit of information recording the path of the particle was sufficient to collapse the superposition! But wait! Doesn’t this mean that the “consciousness causes collapse” theory is wrong? The spin bit was apparently able to cause collapse all by itself, so assuming that it isn’t a conscious system, it looks like consciousness isn’t necessary for collapse! Theory disproved!

No. As you might be expecting, things are not this simple. For one thing, notice that this ALSO would prove as false any other theory of wave function collapse that doesn’t allow single bits to cause collapse (including anything about complex systems or macroscopic systems or complex information processing). We should be suspicious of any simple argument that claims to conclusively prove a significant proportion of experts wrong.

To see what’s going on here, let’s look at what happens if we don’t assume that the spin bit causes the wave function to collapse. Instead, we’ll just model it as becoming fully entangled with the path of the particle, so that the state evolution over time looks like the following:

54114815_2222391794691111_7588527245694599168_n

Now if we observe the particle’s position on the screen, the probability distribution we’ll observe is given by the Born rule. Assuming that we don’t observe the states of the spin bits, there are now two qualitatively indistinguishable branches of the wave function for each possible position on the screen. This means that the total probability for any given landing position will be given by the sum of the probabilities of each branch:

54432750_422324315183100_8833216438886989824_n-e1552873260955.jpg

But hold on! Our final result is identical to the classically expected result! We just get the probability of the particle getting to |j⟩ from |A⟩, multiplied by the probability of being at |A⟩ in the first place (50%), plus the probability of the particle going from |B⟩ to |j⟩ times the same 50% for the particle getting to |B⟩.

54220547_1042042785982437_8822680544807485440_n.jpg

In other words, our prediction is that we’d observe the classical pattern of a bunch of individual particles, each going through exactly one slit, with 50% going through the top slit and 50% through the bottom. The interference has vanished, even though we never assumed that the wave function collapsed!

What this shows is that wave function collapse is not required to get particle-like behavior. All that’s necessary is that the different branches of the superposition end up not interfering with each other. And all that’s necessary for that is environmental decoherence, which is exactly what we had with the single spin bit!

In other words, environmental decoherence is sufficient to produce the same type of behavior that we’d expect from wave function collapse. This is because interference will only occur between non-orthogonal branches of the wave function, and the branches become orthogonal upon decoherence (by definition). A particle can be in a superposition of multiple states but still act as if it has collapsed!

Now, maybe we want to say that the particle’s wave function is collapsed when its position is measured by the screen. But this isn’t necessary either! You could just say that the detector enters into a superposition and quickly decoheres, such that the different branches of the wave function (one for each possible detector state) very suddenly become orthogonal and can no longer interact. And then you could say that the collapse only really happens once a conscious being observes the detector! Or you could be a Many-Worlder and say that the collapse never happens (although then you’d have to figure out where the probabilities are coming from in the first place).

You might be tempted to say at this point: “Well, then all the different theories of wave function collapse are empirically equivalent! At least, the set of theories that say ‘wave function collapse = total decoherence + other necessary conditions possibly’. Since total decoherence removes all interference effects, the results of all experiments will be indistinguishable from the results predicted by saying that the wave function collapsed at some point!”

But hold on! This is forgetting a crucial fact: decoherence is reversible, while wave function collapse is not!!! 

Screen Shot 2019-03-17 at 8.21.51 PM
Pretty picture from doi: 10.1038/srep15330

Let’s say that you run the same setup before with the spin bit recording the information about which slit the particle went through, but then we destroy that information before it interacts with the environment in any way, therefore removing any traces of the measurement. Now the two branches of the wave function have “recohered,” meaning that what we’ll observe is back to the interference pattern! (There’s a VERY IMPORTANT caveat, which is that the time period during which we’re destroying the information stored in the spin bit must be before the particle hits the detector screen and the state of the screen couples to its environment, thus decohering with the record of which slit the particle went through).

If you’re a collapse purist that says that wave function collapse = total decoherence (i.e. orthogonality of the relevant branches of the wave function), then you’ll end up making the wrong prediction! Why? Well, because according to you, the wave function collapsed as soon as the information was recorded, so there was no “other branch of the wave function” to recohere with once the information was destroyed!

This has some pretty fantastic implications. Since IN PRINCIPLE even the type of decoherence that occurs when your brain registers an observation is reversible (after all, the Schrodinger equation is reversible), you could IN PRINCIPLE recohere after an observation, allowing the branches of the wave function to interfere with each other again. These are big “in principle”s, which is why I wrote them big. But if you could somehow do this, then the “Consciousness Causes Collapse” theory would give different predictions from Many-Worlds! If your final observation shows evidence of interference, then “consciousness causes collapse” is wrong, since apparently conscious observation is not sufficient to cause the other branches of the wave function to vanish. Otherwise, if you observe the classical pattern, then Many Worlds is wrong, since the observation indicates that the other branches of the wave function were gone for good and couldn’t come back to recohere.

This suggests a general way to IN PRINCIPLE test any theory of wave function collapse: Look at processes right beyond the threshold where the theory says wave functions collapse. Then implement whatever is required to reverse the physical process that you say causes collapse, thus recohering the branches of the wave function (if they still exist). Now look to see if any evidence of interference exists. If it does, then the theory is proven wrong. If it doesn’t, then it might be correct, and any theory of wave function collapse that demands a more stringent standard for collapse (including Many-Worlds, the most stringent of them all) is proven wrong.

On decoherence

Consider the following simple model of the double-slit experiment:

53215233_424625501621649_7989218325425029120_n

A particle starts out at |O⟩, then evolves via the Schrödinger equation into an equal superposition of being at position |A⟩ (the top slit) and being at position |B⟩ (the bottom slit).

53847892_379521639268098_6802808492659834880_n

To figure out what happens next, we need to define what would happen for a particle leaving from each individual slit. In general, we can describe each possibility as a particular superposition over the screen.

Since quantum mechanics is linear, the particle that started at |O⟩ will evolve as follows:

53847892_379521639268098_6802808492659834880_n
53847892_379521639268098_6802808492659834880_n-3.jpg

If we now look at any given position |j⟩ on the screen, the probability of observing the particle at this position can be calculated using the Born rule:

54435476_1877330259039769_997663582227267584_n-1.jpg

Notice that the first term is what you’d expect to get for the probability of a particle leaving |A⟩ being observed at position |j⟩ and the second term is the probability of a particle from |B⟩ being observed at |j⟩. The final two terms are called interference terms, and they give us the non-classical wave-like behavior that’s typical of these double-slit setups.

53695256_2103980996560637_2413894306092810240_n

Now, what we just imagined was a very idealized situation in which the only parts of the universe that are relevant to our calculation are the particle, the two slits and the detector. But in reality, as the particle is traveling to the detector, it’s likely going to be interacting with the environment. This interaction is probably going to be slightly different for a particle taking the path through |A⟩ than for a particle taking the path through |B⟩, and these differences end up being immensely important.

To capture the effects of the environment in our experimental setup, let’s add an “environment” term to all of our states. At time zero, when the particle is at the origin, we’ll say that the environment is in some state |ε0⟩. Now, as the particle traverses the path to |A⟩ or to |B⟩, the environment might change slightly, so we need to give two new labels for the state of the environment in each case. |εA⟩ will be our description for the state of the environment that would result if the particle traversed the path from |O⟩ to |A⟩, and |εB⟩ will be the label for the state of the environment resulting from the particle traveling from |O⟩ to |B⟩. Now, to describe our system, we need to take the tensor product of the vector for our particle’s state and the vector for the environment’s state:

54519367_308525316478932_3398984521884893184_n.jpg

Now, what is the probability of the particle being observed at position j? Well, there are two possible worlds in which the particle is observed at position j; one in which the environment is in state |εA⟩ and the other in which it’s in state |εB⟩. So the probability will just be the sum of the probabilities for each of these possibilities.

54211669_1846645752108144_5863174741849800704_n

This final equation gives us the general answer to the double slit experiment, no matter what the changes to the environment are. Notice that all that is relevant about the environment is the overlap term ⟨εAB⟩, which we’ll give a special name to:

54278294_2524715951075704_6933032674168668160_n

This term tells us how different the two possible end states for the environment look. If the overlap is zero, then the two environment states are completely orthogonal (corresponding to perfect decoherence of the initial superposition). If the overlap is one, then the environment states are identical.

53298673_333616094167211_1463267431869841408_n

And look what we get when we express the final probability in terms of this term!

53208725_1847238688713413_8130541857772929024_n-1.jpg

Perfect decoherence gives us classical probabilities, and perfect coherence gives us the ideal equation we found in the first part of the post! Anything in between allows the two states to interfere with each other to some limited degree, not behaving like totally separate branches of the wavefunction, nor like one single branch.

The problem with the many worlds interpretation of quantum mechanics

The Schrodinger equation is the formula that describes the dynamics of quantum systems – how small stuff behaves.

One fundamental feature of quantum mechanics that differentiates it from classical mechanics is the existence of something called superposition. In the same way that a particle can be in the state of “being at position A” and could also be in the state of “being at position B”, there’s a weird additional possibility that the particle is in the state of “being in a superposition of being at position A and being at position B”. It’s necessary to introduce a new word for this type of state, since it’s not quite like anything we are used to thinking about.

Now, people often talk about a particle in a superposition of states as being in both states at once, but this is not technically correct. The behavior of a particle in a superposition of positions is not the behavior you’d expect from a particle that was at both positions at once. Suppose you sent a stream of small particles towards each position and looked to see if either one was deflected by the presence of a particle at that location. You would always find that exactly one of the streams was deflected. Never would you observe the particle having been in both positions, deflecting both streams.

But it’s also just as wrong to say that the particle is in either one state or the other. Again, particles simply do not behave this way. Throw a bunch of electrons, one at a time, through a pair of thin slits in a wall and see how they spread out when they hit a screen on the other side. What you’ll get is a pattern that is totally inconsistent with the image of the electrons always being either at one location or the other. Instead, the pattern you’d get only makes sense under the assumption that the particle traveled through both slits and then interfered with itself.

If a superposition of A and B is not the same as “A and B’ and it’s not the same as ‘A or B’, then what is it? Well, it’s just that: a superposition! A superposition is something fundamentally new, with some of the features of “and” and some of the features of “or”. We can do no better than to describe the empirically observed features and then give that cluster of features a name.

Now, quantum mechanics tells us that for any two possible states that a system can be in, there is another state that corresponds to the system being in a superposition of the two. In fact, there’s an infinity of such superpositions, each corresponding to a different weighting of the two states.

Now, the Schrödinger equation is what tells how quantum mechanical systems evolve over time. And since all of nature is just one really big quantum mechanical system, the Schrödinger equation should also tell us how we evolve over time. So what does the Schrödinger equation tell us happens when we take a particle in a superposition of A and B and make a measurement of it?

The answer is clear and unambiguous: The Schrödinger equation tells us that we ourselves enter into a superposition of states, one in which we observe the particle in state A, the other in which we observe it in B. This is a pretty bizarre and radical answer! The first response you might have may be something like “When I observe things, it certainly doesn’t seem like I’m entering into a superposition… I just look at the particle and see it in one state or the other. I never see it in this weird in-between state!”

But this is not a good argument against the conclusion, as it’s exactly what you’d expect by just applying the Schrödinger equation! When you enter into a superposition of “observing A” and “observing B”, neither branch of the superposition observes both A and B. And naturally, since neither branch of the superposition “feels” the other branch, nobody freaks out about being superposed.

But there is a problem here, and it’s a serious one. The problem is the following: Sure, it’s compatible with our experience to say that we enter into superpositions when we make observations. But what predictions does it make? How do we take what the Schrödinger equation says happens to the state of the world and turn it into a falsifiable experimental setup? The answer appears to be that we can’t. At least, not using just the Schrödinger equation on its own. To get out predictions, we need an additional postulate, known as the Born rule.

This postulate says the following: For a system in a superposition, each branch of the superposition has an associated complex number called the amplitude. The probability of observing any particular branch of the superposition upon measurement is simply the square of that branch’s amplitude.

For example: A particle is in a superposition of positions A and B. The amplitude attached to A is 0.8. The amplitude attached to B is 0.4. If we now observe the position of the particle, we will find it to be at either A with probability (.6)2 (i.e. 36%), or B with probability (.8)2 (i.e. 64%).

Simple enough, right? The problem is to figure out where the Born rule comes from and what it even means. The rule appears to be completely necessary to make quantum mechanics a testable theory at all, but it can’t be derived from the Schrödinger equation. And it’s not at all inevitable; it could easily have been that probabilities associated with the amplitude rather than the amplitude squared. Or why not the fourth power of the amplitude? There’s a substantive claim here, that probabilities associate with the square of the amplitudes that go into the Schrödinger equation, that needs to be made sense of. There are a lot of different ways that people have tried to do this, and I’ll list a few of the more prominent ones here.

The Copenhagen Interpretation

(Prepare to be disappointed.) The Copenhagen interpretation, which has historically been the dominant position among working physicists, is that the Born rule is just an additional rule governing the dynamics of quantum mechanical systems. Sometimes systems evolve according to the Schrödinger equation, and sometimes according to the Born rule. When they evolve according to the Schrödinger equation, they split into superpositions endlessly. When they evolve according to the Born rule, they collapse into a single determinate state. What determines when the systems evolve one way or the other? Something measurement something something observation something. There’s no real consensus here, nor even a clear set of well-defined candidate theories.

If you’re familiar with the way that physics works, this idea should send your head spinning. The claim here is that the universe operates according to two fundamentally different laws, and that the dividing line between the two hinges crucially on what we mean by the words “measurement and “observation. Suffice it to say, if this was the right way to understand quantum mechanics, it would go entirely against the spirit of the goal of finding a fundamental theory of physics. In a fundamental theory of physics, macroscopic phenomena like measurements and observations need to be built out of the behavior of lots of tiny things like electrons and quarks, not the other way around. We shouldn’t find ourselves in the position of trying to give a precise definition to these words, debating whether frogs have the capacity to collapse superpositions or if that requires a higher “measuring capacity”, in order to make predictions about the world.

The Copenhagen interpretation is not an elegant theory, it’s not a clearly defined theory, and it’s fundamentally at tension with the project of theoretical physics. So why has it been, as I said, the dominant approach over the last century to understanding quantum mechanics? This really comes down to physicists not caring enough about the philosophy behind the physics to notice that the approach they are using is fundamentally flawed. In practice, the Copenhagen interpretation works. It allows somebody working in the lab to quickly assess the results of their experiments and to make predictions about how future experiments will turn out. It gives the right empirical probabilities and is easy to implement, even if the fuzziness in the details can start to make your head hurt if you start to think about it too much. As Jean Bricmont said, “You can’t blame most physicists for following this ‘shut up and calculate’ ethos because it has led to tremendous develop­ments in nuclear physics, atomic physics, solid­ state physics and particle physics.” But the Copenhagen interpretation is not good enough for us. A serious attempt to make sense of quantum mechanics requires something more substantive. So let’s move on.

Objective Collapse Theories

These approaches hinge on the notion that the Schrödinger equation really is the only law at work in the universe, it’s just that we have that equation slightly wrong. Objective collapse theories add slight nonlinearities to the Schrödinger equation so that systems sometimes spread out in superpositions and other times collapse into definite states, all according to one single equation. The most famous of these is the spontaneous collapse theory, according to which quantum systems collapse with a probability that grows with the number of particles in the system.

This approach is nice for several reasons. For one, it gives us the Born rule without requiring a new equation. It makes sense of the Born rule as a fundamental feature of physical reality, and makes precise and empirically testable predictions that can distinguish it from from other interpretations. The drawback? It makes the Schrödinger equation ugly and complicated, and it adds extra parameters that determine how often collapse happens. And as we know, whenever you start adding parameters you run the risk of overfitting your data.

Hidden Variable Theories

These approaches claim that superpositions don’t really exist, they’re just a high-level consequence of the unusual behavior of the stuff at the smallest level of reality.  They deny that the Schrödinger equation is truly fundamental, and say instead that it is a higher-level approximation of an underlying deterministic reality. “Deterministic?! But hasn’t quantum mechanics been shown conclusively to be indeterministic??” Well, not entirely. For a while there was a common sentiment amongst physicists that John Von Neumann and others had proved beyond a doubt that no deterministic theory could make the predictions that quantum mechanics makes. Later subtle mistakes were found in these purported proofs that left a door open for determinism. Today there are well-known fleshed-out hidden variable theories that successfully reproduce the predictions of quantum mechanics, and do so fully deterministically.

The most famous of these is certainly Bohmian mechanics, also called pilot wave theory. Here’s a nice video on it if you’d like to know more, complete with pretty animations. Bohmian mechanics is interesting, appear to work, give us the Born rule, and is probably empirically distinguishable from other theories (at least in principle). A serious issue with it is that it requires nonlocality, which is a challenge to any attempt to make it consistent with special relativity. Locality is such an important and well-understood feature of our reality that this constitutes a major challenge to the approach.

Many-Worlds / Everettian Interpretations

Ok, finally we talk about the approach that is most interesting in my opinion, and get to the title of this post. The Many-Worlds interpretation says, in essence, that we were wrong to ever want more than the Schrödinger equation. This is the only law that governs reality, and it gives us everything we need. Many-Worlders deny that superpositions ever collapse. The result of us performing a measurement on a system in superposition is simply that we end up in superposition, and that’s the whole story!

So superpositions never collapse, they just go deeper into superposition. There’s not just one you, there’s every you, spread across the different branches of the wave function of the universe. All these yous exist beside each other, living out all your possible life histories.

But then where does Many-Worlds get the Born rule from? Well, uh, it’s kind of a mystery. The Born rule isn’t an additional law of physics, because the Schrödinger equation is supposed to be the whole story. It’s not an a priori rule of rationality, because as we said before probabilities could have easily gone as the fourth power of amplitudes, or something else entirely. But if it’s not an a posteriori fact about physics, and also not an a priori knowable principle of rationality, then what is it?

This issue has seemed to me to be more and more important and challenging for Many-Worlds the more I have thought about it. It’s hard to see what exactly the rule is even saying in this interpretation. Say I’m about to make a measurement of a system in a superposition of states A and B. Suppose that I know the amplitude of A is much smaller than the amplitude of B. I need some way to say “I have a strong expectation that I will observe B, but there’s a small chance that I’ll see A.” But according to Many-Worlds, a moment from now both observations will be made. There will be a branch of the superposition in which I observe A, and another branch in which I observe B. So what I appear to need to say is something like “I am much more likely to be the me in the branch that observes B than the me that observes A.” But this is a really strange claim that leads us straight into the thorny philosophical issue of personal identity.

In what sense are we allowed to say that one and only one of the two resulting humans is really going to be you? Don’t both of them have equal claim to being you? They each have your exact memories and life history so far, the only difference is that one observed A and the other B. Maybe we can use anthropic reasoning here? If I enter into a superposition of observing-A and observing-B, then there are now two “me”s, in some sense. But that gives the wrong prediction! Using the self-sampling assumption, we’d just say “Okay, two yous, so there’s a 50% chance of being each one” and be done with it. But obviously not all binary quantum measurements we make have a 50% chance of turning out either way!

Maybe we can say that the world actually splits into some huge number of branches, maybe even infinite, and the fraction of the total branches in which we observe A is exactly the square of the amplitude of A? But this is not what the Schrödinger equation says! The Schrödinger equation tells exactly what happens after we make the observation: we enter a superposition of two states, no more, no less. We’re importing a whole lot into our interpretive apparatus by interpreting this result as claiming the literal existence of an infinity of separate worlds, most of which are identical, and the distribution of which is governed by the amplitudes.

What we’re seeing here is that Many-Worlds, by being too insistent on the reality of the superposition, the sole sovereignty of the Schrödinger equation, and the unreality of collapse, ends up running into a lot of problems in actually doing what a good theory of physics is supposed to do: making empirical predictions. The Many-Worlders can of course use the Born Rule freely to make predictions about the outcomes of experiments, but they have little to say in answer to what, in their eyes, this rule really amounts to. I don’t know of any good way out of this mess.

Basically where this leaves me is where I find myself with all of my favorite philosophical topics; totally puzzled and unsatisfied with all of the options that I can see.

Consistently reflecting on decision theory

There are many principles of rational choice that seem highly intuitively plausible at first glance. Sometimes upon further reflection we realize that a principle we initially endorsed is not quite as appealing as we first thought, or that it clashes with other equally plausible principles, or that it requires certain exceptions and caveats that were not initially apparent. We can think of many debates in philosophy as clashes of these general principles, where the thought experiments generated by philosophers serve as the datum that put on display their relative merits and deficits. In this essay, I’ll explore a variety of different principles for rational decision making and consider the ways in which they satisfy and frustrate our intuitions. I will focus in especially on the notion of reflective consistency, and see what sort of decision theory results from treating this as our primary desideratum.

I want to start out by illustrating the back-and-forth between our intuitions and the general principles we formulate. Consider the following claim, known as the dominance principle: If a rational agent believes that doing A is better than not doing A in every possible world, then that agent should do A even if uncertain about which world they are in. Upon first encountering this principle, it seems perfectly uncontroversial and clearly valid. But now consider the following application of the dominance principle:

“A student is considering whether to study for an exam. He reasons that if he will pass the exam, then studying is wasted effort. Also, if he will not pass the exam, then studying is wasted effort. He concludes that because whatever will happen, studying is wasted effort, it is better not to study.” (Titelbaum 237)

The fact that this is clearly a bad argument casts doubt on the dominance principle. It is worth taking a moment to ask what went wrong here. How did this example turn our intuitions on their heads so completely? Well, the flaw in the student’s line of reasoning was that he was ignoring the effect of his studying on whether or not he ends up passing the exam. This dependency between his action and the possible world he ends up in should be relevant to his decision, and it apparently invalidates his dominance reasoning.

A restricted version of the dominance principle fixes this flaw: If a rational agent prefers doing A to not doing A in all possible worlds and which world they are in is independent of whether they do A or not, then that agent should do A even if they are uncertain about which world they are in. I’ll call this the simple dominance principle. This principle is much harder to disagree with than our starting principle, but the caveat about independence greatly limits its scope. It applies only when our uncertainty about the state of the world is independent of our decision, which is not the case in most interesting decision problems. We’ll see by the end of this essay that even this seemingly obvious principle can be made to conflict with another intuitively plausible principle of rational choice.

The process of honing our intuitions and fine-tuning our principles like this is sometimes called seeking reflective consistency, where reflective consistency is the hypothetical state you end up in after a sufficiently long period of consideration. Reflective consistency is achieved when you have reached a balance between your starting intuitions and other meta-level desiderata like consistency and simplicity, such that your final framework is stable against further intuition-pumping. This process has parallels in other areas of philosophy such as ethics and epistemology, but I want to suggest that it is particularly potent when applied to decision theory. The reason for this is that a decision theory makes recommendations for what action to take in any given setup, and we can craft setups where the choice to be made is about what decision theory to adopt. I’ll call these setups self-reflection problems. By observing what choices a decision theory makes in self-reflection problems, we get direct evidence about whether the decision theory is reflectively consistent or not. In other words, we don’t need to do all the hard work of allowing thought experiments to bump our intuitions around; we can just take a specific decision algorithm and observe how it behaves upon self-reflection!

What we end up with is the following principle: Whatever decision theory we end up endorsing should be self-recommending. We should not end up in a position where we endorse decision theory X as the final best theory of rational choice, but then decision theory X recommends that we abandon it for some other decision theory that we consider less rational.

The connection between self-recommendation and reflective consistency is worth fleshing out in a little more detail. I am not saying that self-recommendation is sufficient for reflective consistency. A self-recommending decision theory might be obviously in contradiction with our notion of rational choice, such that any philosopher considering this decision theory would immediately discard it as a candidate. Consider, for instance, alphabetical decision theory, which always chooses the option which comes alphabetically first in its list of choices. When faced with a choice between alphabetical decision theory and, say, evidential decision theory, alphabetical decision theory will presumably choose itself, reasoning that ‘a’ comes before ‘e’. But we don’t want to call this a virtue of alphabetical decision theory. Even if it is uniformly self-recommending, alphabetical decision theory is sufficiently distant from any reasonable notion of rational choice that we can immediately discard it as a candidate.

On the other hand, even though not all self-recommending theories are reflectively consistent, any reflectively consistent decision theory must be self-recommending. Self-recommendation is a necessary but not sufficient condition for an adequate account of rational choice.

Now, it turns out that this principle is too strong as I’ve phrased it and requires a few caveats. One issue with it is what I’ll call the problem of unfair decision problems. For example, suppose that we are comparing evidential decision theory (henceforth EDT) to causal decision theory (henceforth CDT). (For the sake of time and space, I will assume as background knowledge the details of how each of these theories work.) We put each of them up against the following self reflection problem:

An omniscient agent peeks into your brain. If they see that you are an evidential decision theorist, they take all your money. Otherwise they leave you alone. Before they peek into your brain, you have the ability to modify your psychology such that you become either an evidential or causal decision theorist. What should you do?

EDT reasons as follows: If I stay an EDT, I lose all my money. If I self-modify to CDT, I don’t. I don’t want to lose all my money, so I’ll self-modify to CDT. So EDT is not self-recommending in this setup. But clearly this is just because the setup is unfairly biased against EDT, not because of any intrinsic flaw in EDT. In fact, it’s a virtue of a decision theory to not be self-recommending in such circumstances, as doing so indicates a basic awareness of the payoff structure of the world it faces.

While this certainly seems like the right thing to say about this particular decision problem, we need to consider how exactly to formalize this intuitive notion of “being unfairly biased against a decision theory.” There are a few things we might say here. For one, the distinguishing feature of this setup seems to be that the payout is determined not based off the decision made by an agent, but by their decision theory itself. This seems to be at the root of the intuitive unfairness of the problem; EDT is being penalized not for making a bad decision, but simply for being EDT. A decision theory should be accountable for the decisions it makes, not for simply being the particular decision theory that it happens to be.

In addition, by swapping “evidential decision theory” and “causal decision theory” everywhere in the setup, we end up arriving at the exact opposite conclusion (evidential decision theory looks stable, while causal decision theory does not). As long as we don’t have any a priori reason to consider one of these setups more important to take into account than the other, then there is no net advantage of one decision theory over the other. If a decision problem belongs to a set of equally a priori important problems obtained by simply swapping out the terms for different decision theories, and no decision theory comes out ahead on the set as a whole, then perhaps we can disregard the entire set for the purposes of evaluating decision theories.

The upshot of all of this is that what we should care about is decision problems that don’t make any direct reference to a particular decision theory, only to decisions. We’ll call such problems decision-determined. Our principle then becomes the following: Whatever decision theory we end up endorsing should be self-recommending in all decision-determined problems.

There’s certainly more to be said about this principle and if any other caveats need be applied to it, but for now let’s move on to seeing what we end up with when we apply this principle in its current state. We’ll start out with an analysis of the infamous Newcomb problem.

You enter a room containing two boxes, one opaque and one transparent. The transparent box contains $1,000. The opaque box contains either $0 or $1,000,000. Your choice is to either take just the opaque box (one-box) or to take both boxes (two-box). Before you entered the room, a predictor scanned your brain and created a simulation of you to see what you would do. If the simulation one-boxed, then the predictor filled the opaque box with $1,000,000. If the simulation two-boxed, then the opaque box was left empty. What do you choose?

EDT reasons as follows: If I one-box, then this gives me strong evidence that I have the type of brain that decides to one-box, which gives me strong evidence that the predictor’s simulation of me one-boxed, which in turn gives me strong evidence that the opaque box is full. So if I one-box, I expect to get $1,000,000. On the other hand, if I two-box, then this gives me strong evidence that my simulation two-boxed, in which case the opaque box is empty. So if I two-box, I expect to get only $1,000. Therefore one-boxing is better than two-boxing.

CDT reasons as follows: Whether the opaque box is full or empty is already determined by the time I entered the room, so my decision has no causal effect upon the box’s contents. And regardless of the contents of the box, I always expect to leave $1,000 richer by two-boxing than by one-boxing. So I should two-box.

At this point it’s important to ask whether Newcomb’s problem is a decision determined problem. After all, the predictor decides whether to fill the transparent box by scanning your brain and stimulating you. Isn’t that suspiciously similar to our earlier example of penalizing agents based off their decision theory? No. The simulator decides what to do not by evaluating your decision theory, but by its prediction about your decision. You aren’t penalized for being a CDT, just for being the type of agent that one-boxes. To see this you only need to observe that any decision theory that one-boxes would be treated identically to CDT in this problem. The determining factor is the decision, not the decision theory.

Now, let’s make the Newcomb problem into a test of reflective consistency. Instead of your choice being about whether to one-box or to two-box while in the room, your choice will now take place before you enter the room, and will be about whether to be an evidential decision theorist or a causal decision theorist when in the room. What does each theory do?

EDT’s reasoning: If I choose to be an evidential decision theorist, then I will one-box when in the room. The predictor will simulate me as one-boxing, so I’ll end up walking out with $1,000,000. If I choose to be a causal decision theorist, then I will two-box when in the room, the predictor will predict this, and I’ll walk out with only $1,000. So I will stay an EDT.

Interestingly, CDT agrees with this line of reasoning. The decision to be an evidential or causal decision theorist has a causal effect on how the predictor’s simulation behaves, so a causal decision theorist sees that the decision to stay a causal decision theorist will end up leaving them worse off than if they had switched over. So CDT switches to EDT. Notice that in CDT’s estimation, the decision to switch ends up making them $999,000 better off. This means that CDT would pay up to $999,000 just for the privilege of becoming an evidential decision theorist!

I think that looking at an actual example like this makes it more salient why reflective consistency and self-recommendation is something that we actually care about. There’s something very obviously off about a decision theory that knows beforehand that it will reliably perform worse than its opponent, so much so that it would be willing to pay up to $999,000 just for the privilege of becoming its opponent. This is certainly not the type of behavior that we associate with a rational agent that trusts itself to make good decisions.

Classically, this argument has been phrased in the literature as the “why ain’tcha rich?” objection to CDT, but I think that the objection goes much deeper than this framing would suggest. There are several plausible principles that all apply here, such as that a rational decision maker shouldn’t regret having the decision theory they have, a rational decision maker shouldn’t pay to limit their future options, and a rational decision maker shouldn’t pay to decrease the values in their payoff matrix. The first of these is fairly self-explanatory. One influential response to it has been from James Joyce, who said that the causal decision theorist does not regret their decision theory, just the situation they find themselves in. I’d suggest that this response makes little sense when the situation the find themselves in is a direct result of their decision theory. As for the second and third of these, we could imagine giving a causal decision theorist the choice to pay money to remove the future possibility of two-boxing, or to hire a gunman that would shoot them if they try to take the transparent box. In each of these cases, CDT would endorse the decision to pay. I mention these other principles just to suggest that we have sound philosophical reason to care about self-recommendation independent of the reflective consistency considerations that we started out by discussing.

The takeaway from this modified Newcomb problem is that CDT is not reflectively consistent, and that it will self-modify to EDT in the right circumstances. Does this mean that EDT is more reflectively consistent than CDT? It turns out that no, this is not the case. We can construct another thought experiment in which EDT self-modifies to become CDT, called the Newcomb’s Soda problem:

You were just given either Soda 1 or Soda 2, which are indistinguishable. There was a 50% chance of each being the type you were given. You are now given a choice as to whether to choose chocolate ice cream or vanilla ice cream. Those that had Soda 1 have a strong unconscious inclination to choose chocolate ice cream and will be given $1,000,000 after they choose the ice cream. Those that had Soda 2 have a strong unconscious inclination to choose vanilla ice cream and are given nothing. If you choose vanilla ice cream, you get $1,000. What do you choose?

EDT reasons that choosing chocolate ice cream gives them strong evidence that they were given Soda 1, in which case they will be given $1,000,000. So they would be willing to give up the $1000 reward for choosing vanilla ice cream, in order to increase their chances of getting the million. CDT reasons that what soda they were given is a fixed fact that isn’t causally influenced by the decision they make. So they have a 50% chance of getting the million regardless of their choice, but choosing to take the vanilla guarantees them an additional $1000. Therefore they take vanilla.

This is clearly a decision-determined problem, so let’s see how EDT and CDT do upon self-reflection. Given the chance to modify themselves to EDT before being given the soda, a causal decision theorist would see no advantage to doing so. After all, being an evidential decision theorist doesn’t make you more or less likely to get the $1,000,000, it just ensures that you won’t get the $1000. And the evidential decision theorist agrees! By changing to causal decision theory, they guarantee themselves an extra $1000 while not making it any more or less likely that they’ll be given the chocolate soda (and accordingly, the million dollars).

The upshot of this is that both CDT and EDT are reflective inconsistent in the class of decision determined problems. What we seek, then, is a new decision theory that behaves like EDT in the Newcomb problem and like CDT in Newcomb’s Soda. One such theory was pioneered by machine learning researcher Eliezer Yudkowsky, who named it timeless decision theory (henceforth TDT). To deliver different verdicts in the two problems, we must find some feature that allows us to distinguish between their structure. TDT does this by distinguishing between the type of correlation arising from ordinary common causes (like the soda in Newcomb’s Soda) and the type of correlation arising from faithful simulations of your behavior (as in Newcomb’s problem).

This second type of correlation is called logical dependence, and is the core idea motivating TDT. The simplest example of this is the following: two twins, physically identical down to the atomic level, raised in identical environments in a deterministic universe, will have perfectly correlated behavior throughout the lengths of their lives, even if they are entirely causally separated from each other. This correlation is apparently not due to a common cause or to any direct causal influence. It simply arises from the logical fact that two faithful instantiations of the same function will return the same output when fed the same input. Considering the behavior of a human being as an instantiation of an extremely complicated function, it becomes clear why you and your parallel-universe twin behave identically: you are instantiations of the same function! We can take this a step further by noting that two functions can have a similar input-output structure, in which case the physical instantiations of each function will have correlated input-output behavior. This correlation is what’s meant by logical dependence.

To spell this out a bit further, imagine that in a far away country, there are factories that sell very simple calculators. Each calculator is designed to only run only one specific computation. Some factories are multiplication-factories; they only sell calculators that compute 713*291. Others are addition-factories; they only sell calculators that compute 713+291. You buy two calculators from one of these factories, but you’re not sure which type of factory you purchased from. Your credences are 50/50 split between the factory you purchased from being a multiplication-factory and it being an addition-factory. You also have some logical uncertainty regarding what the value of 713*291 is. You are evenly split between the value being 207,481 and the value being 207,483. On the other hand, you have no uncertainty about what the value of 713+291 is; you know that it is 1004.

Now, you press “ENTER” on one of the two calculators you purchased, and find that the result is 207,483. For a rational reasoner, two things should now happen: First, you should treat this result as strong evidence that the factory from which both calculators were bought was a multiplication-factory, and therefore that the other calculator is also a multiplier. And second, you should update strongly on the other calculator outputting 207,483 rather than 207,481, since two calculators running the same computation will output the same result.

The point of this example is that it clearly separates out ordinary common cause correlation from a different type of dependence. The common cause dependence is what warrants you updating on the other calculator being a multiplier rather than an adder. But it doesn’t warrant you updating on the result on the other calculator being specifically 207,483; to do this, we need the notion of logical dependence, which is the type of dependence that arises whenever you encounter systems that are instantiating the same or similar computations.

Connecting this back to decision theory, TDT treats our decision as the output of a formal algorithm, which is our decision-making process. The behavior of this algorithm is entirely determined by its logical structure, which is why there are no upstream causal influences such as the soda in Newcomb’s Soda. But the behavior of this algorithm is going to be correlated with the parts of the universe that instantiate a similar function (as well as the parts of the universe it has a causal influence on). In Newcomb’s problem, for example, the predictor generates a detailed simulation of your decision process based off of a brain scan. This simulation of you is highly logically correlated with you, in that it will faithfully reproduce your behavior in a variety of situations. So if you decide to one-box, you are also learning that your simulation is very likely to one-box (and therefore that the opaque box is full).

Notice that the exact mechanism by which the predictor operates becomes very important for TDT. If the predictor operates by means of some ordinary common cause where no logical dependence exists, TDT will treat its prediction as independent of your choice. This translates over to why TDT behaves like CDT on Newcomb’s Soda, as well as other so-called “medical Newcomb problems” such as the smoking lesion problem. When the reason for the correlation between your behavior and the outcome is merely that both depend on a common input, TDT treats your decision as an intervention and therefore independent of the outcome.

One final way to conceptualize TDT and the difference between the different types of correlation is using structural equation modeling:

Direct causal dependence exists between A and B when A is a function of B or when B is a function of A.
 > A = f(B) or B = g(A)

Common cause dependence exists between A and B when A and B are both functions of some other variable C.
 > A = f(C) and B = g(C)

Logical dependence exists between A and B when A and B depend on their inputs in similar ways.
 > A = f(C) and B = f(D)

TDT takes direct causal dependence and logical dependence seriously, and ignores common cause dependence. We can formally express this by saying that TDT calculates the expected utility of a decision by treating it like a causal intervention and fixing the output of all other instantiations of TDT to be identical interventions. Using Judea Pearl’s do-calculus notation for causal intervention, this looks like:

Screen Shot 2019-03-09 at 12.48.13 AM

Here K is the TDT agent’s background knowledge, D is chosen from a set of possible decisions, and the sum is over all possible worlds. This equation isn’t quite right, since it doesn’t indicate what to do when the computation a given system instantiates is merely similar to TDT but not logically identical, but it serves as a first approximation to the algorithm.

You might notice that the notion of logical dependence depends on the idea of logical uncertainty, as without it the result of the computations would be known with certainty as soon as you learn that the calculators came out of a multiplication-factory, without ever having to observe their results. Thus any theory that incorporates logical dependence into its framework will be faced with a problem of logical omniscience, which is to say, it will have to give some account of how to place and update reasonable probability distributions over tautologies.

The upshot of all of this is that TDT is reflectively consistent on a larger class of problems than both EDT and CDT. Both EDT and CDT would self-modify into TDT in Newcomb-like problems if given the choice. Correspondingly, if you throw a bunch of TDTs and EDTs and CDTs into a world full of Newcomb and Newcomb-like problems, the TDTs will come out ahead. However, it turns out that TDT is not itself reflectively consistent on the whole class of decision-determined problems. Examples like the transparent Newcomb problem, Parfit’s hitchhiker, and counterfactual mugging all expose reflective inconsistency in TDT.

Let’s look at the transparent Newcomb problem. The structure is identical to a Newcomb problem (you walk into a room with two boxes, $1000 in one and either $1000000 or $0 in the other, determined based on the behavior of your simulation), except that both boxes are transparent. This means that you already know with certainty the contents of both boxes. CDT two-boxes here like always. EDT also two-boxes, since any dependence between your decision and the box’s contents is made irrelevant as soon as you see the contents. TDT agrees with this line of reasoning; even though it sees a logical dependence between your behavior and your simulation’s behavior, knowing whether the box is full or empty fully screens off this dependence.

Two-boxing feels to many like the obvious rational choice here. The choice you face is simply whether to take $1,000,000 or $1,001,000 if the box is full. If it’s empty, your choice is between taking $1,000 or walking out empty-handed. But two-boxing also has a few strange consequences. For one, imagine that you are placed, blindfolded, in a transparent Newcomb problem. You can at any moment decide to remove your blindfold. If you are an EDT, you will reason that if you don’t remove your blindfold, you are essentially in an ordinary Newcomb problem, so you will one-box and correspondingly walk away with $1,000,000. But if you do remove your blindfold, you’ll end up two-boxing and most likely walking away with only $1000. So an EDT would pay up to $999,000, just for the privilege of staying blindfolded. This seems to conflict with an intuitive principle of rational choice, which goes something like: A rational agent should never expect to be worse off by simply gaining information. Paying money to keep yourself from learning relevant information seems like a sure sign of a pathological decision theory.

Of course, there are two ways out of this. One way is to follow the causal decision theorist and two-box in both the ordinary Newcomb problem and the transparent problem. This has all the issues that we’ve already discussed, most prominently that you end up systematically and predictably worse off by doing so. If you pit a causal decision theorist against an agent that always one-boxes, even in transparent Newcomb problems, CDT ends up the poorer. And since CDT can reason this through beforehand, they would willingly self-modify to become the other type of agent.

What type of agent is this? None of the three decision theories we’ve discussed give the reflectively consistent response here, so we need to invent a new decision theory. The difficulty with any such theory is that it has to be able to justify sticking to its guns and one-boxing even after conditioning on the contents of the box.

In general, similar issues will arise whenever the recommendations made by a decision theory are not time-consistent. For instance, the decision that TDT prescribes for an agent with background knowledge K depends heavily on the information that TDT has at the time of prescription. This means that at different times, TDT will make different recommendations for what to do in the same situation (before entering the room TDT recommends one-boxing once in the room, while after entering the room TDT recommends two-boxing). This leads to suboptimal performance. Agents that can decide on one course of action and credibly precommit to it get certain benefits that aren’t available to agents that don’t have this ability. I think the clearest example of this is Parfit’s hitchhiker:

You are stranded in the desert, running out of water, and soon to die. A Predictor approaches and tells you that they will drive you to town only if they predict you will pay them $100 once you get there.

All of EDT, CDT, and TDT wish that they could credibly precommit to paying the money once in town, but can’t. Once they are in town they no longer have any reason to pay the $100, since they condition on the fact that they . The fact that EDT, CDT, and TDT all have time-sensitive recommendations makes them worse off, leaving them all stranded in the desert to die. Each of these agents would willingly switch to a decision theory that doesn’t change their recommendations over time. How would such a decision theory work? It looks like we need a decision theory that acts as if it doesn’t know whether they’re in town even once in town, and acts as if it doesn’t know the contents of the box even after seeing them. One strategy for achieving this behavior is simple; you just decide on your strategy without ever conditioning on the fact that you are in town!

The decision theory that arises from this choice is appropriately named updateless decision theory (henceforth UDT). UDT is peculiar in that it never actually updates on any information when determining how to behave. That’s not to say that for UDT, the decision you make does not depend on the information you get through your lifetime. Instead, UDT tells you to choose a policy – a mapping from the possible pieces of information you might receive to possible decisions you could make – that maximizes the expected utility, calculated using your prior on possible worlds. This policy is set from time zero and never changes, and it determines how the UDT agent responds to any information they might receive at later points. So, for instance, a UDT agent reasons that adopting a policy of one-boxing in the transparent Newcomb case regardless of what you see maximizes expected utility as calculated using your prior. So once the UDT agent is in the room with the transparent box, it one-boxes. We can formalize this this by analogy with TDT:

Screen Shot 2019-03-09 at 8.23.54 AM

One concern with this approach is that a UDT agent might end up making silly decisions as a result of not taking into account information that is relevant to their decisions. But once again, the UDT agent does take into account the information they learn in their lifetime. It’s merely that they decide what to do with that information before receiving it and never update this prescription. For example, suppose that a UDT agent anticipates facing exactly one decision problem in their life, regarding whether to push a button or not. They have a 50% prior credence that pushing the button will result in the loss of $10, and 50% that it will result in gaining $10. Now, at some point before they decide to push the button, they are given the information about whether pushing the button causes you to gain $10 or to lose $10. UDT deals with this by choosing a policy for how to respond to that information in either case. The expected utility maximizing policy here would be to push the button if you learn that pushing the button leads to gaining $10, and to not push the button if you learn the opposite.

Since UDT chooses its preferred policy based on its prior, this recommendation never changes throughout a UDT agent’s lifetime. This seems to indicate that UDT will be self-recommending in the class of all decision-determined problems, although I’m not aware of a full proof of this. If this is correct, then we have reached our goal of finding a self-recommending decision theory. It is interesting to consider what other principles of rational choice ended up being violated along the way. The simple dominance principle that we started off by discussing appears to be an example of this. In the transparent Newcomb problem, there is only one possible world that the agent considers when in the room (the one in which the box is full, say), and in this one world, two-boxing dominates one-boxing. Given that the box is full, your decision to one-box or to two-box is completely independent of the box’s contents. So the simple dominance principle recommends two-boxing. But UDT disagrees.

Another example of a deeply intuitive principle that UDT violates is the irrelevance of impossible outcomes. This principles says that impossible outcomes should not factor into your decision-making process. But UDT seems to often recommend acting as if some impossible world might come to be. For instance, suppose a predictor walks up to you and gives you a choice to either give them $10 or to give them $100. You will not face any future consequences on the basis of your decision (besides whether you’re out $100 or only $10). However, you learn that the predictor only approached you because it predicted that you would give the $10. Do you give the $10 or the $100? UDT recommends giving the $100, because agents that do so are less likely to have been approached by the predictor. But if you’ve already been approached, then you are letting considerations about an impossible world influence your decision process!

Our quest for reflective consistency took us from EDT and CDT to timeless decision theory. TDT used the notion of logical dependence to get self-recommending behavior in the Newcomb problem and medical Newcomb cases. But we found that TDT was itself reflectively inconsistent in problems like the Transparent Newcomb problem. This led us to create a new theory that made its recommendations without updating on information, which we called updateless decision theory. UDT turned out to be a totally static theory, setting forth a policy determining how to respond to all possible bits of information and never altering this policy. The unchanging nature of UDT indicates the possibility that we have found a truly self-recommending decision theory, while also leading to some quite unintuitive consequences.

Screen Shot 2019-03-09 at 12.05.45 PM

Decision Theory

Everywhere below where a Predictor is mentioned, assume that their predictions are made by scanning your brain to create a highly accurate simulation of you and then observing what this simulation does.

All the below scenarios are one-shot games. Your action now will not influence the future decision problems you end up in.

Newcomb’s Problem
Two boxes: A and B. B contains $1,000. A contains $1,000,000 if the Predictor thinks you will take just A, and $0 otherwise. Do you take just A or both A and B?

Transparent Newcomb, Full Box
Newcomb problem, but you can see that box A contains $1,000,000. Do you take just A or both A and B?

Transparent Newcomb, Empty Box
Newcomb problem, but you can see that box A contains nothing. Do you take just A or both A and B?

Newcomb with Precommitment
Newcomb’s problem, but you have the ability to irrevocably resolve to take just A in advance of the Predictor’s prediction (which will still be just as good if you do precommit). Should you precommit?

Take Opaque First
Newcomb’s problem, but you have already taken A and it has been removed from the room. Should you now also take B or leave it behind?

Smoking Lesion
Some people have a lesion that causes cancer as well as a strong desire to smoke. Smoking doesn’t cause cancer and you enjoy it. Do you smoke?

Smoking Lesion, Unconscious Inclination
Some people have a lesion that causes cancer as well as a strong unconscious inclination to smoke. Smoking doesn’t cause cancer and you enjoy it. Do you smoke?

Smoking and Appletinis
Drinking a third appletini is the kind of act much more typical of people with addictive personalities, who tend to become smokers. I’d like to drink a third appletini, but I really don’t want to be a smoker. Should I order the appletini?

Expensive Hospital
You just got into an accident which gave you amnesia. You need to choose to be treated at either a cheap hospital or an expensive one. The quality of treatment in the two is the same, but you know that billionaires, due to unconscious habit will be biased towards using the expensive one. Which do you choose?

Rocket Heels and Robots
The world contains robots and humans, and you don’t know which you are. Robots rescue people whenever possible and have rockets in their heels that activate whenever necessary. Your friend falls down a mine shaft and will die soon without robotic assistance. Should you jump in after them?

Death in Damascus
If you and Death are in the same city tomorrow, you die. Death is a perfect predictor, and will come where he predicts you will be. You can stay in Damascus or pay $1000 to flee to Aleppo. Do you stay or flee?

Psychopath Button
If you press a button, all psychopaths will be killed. Only a psychopath would press such a button. Do you press the button?

Parfit’s Hitchhiker
You are stranded in the desert, running out of water, and soon to die. A Predictor will drive you to town only if they predict you will pay them $1000 once you get there. You have been brought into town. Do you pay?

XOR Blackmail
An honest predictor sends you this letter: “I sent this letter because I predicted that you have termites iff you won’t send me $100. Send me $100.” Do you send the money?

Twin Prisoner’s Dilemma
You are in a prisoner’s dilemma with a twin of yourself. Do you cooperate or defect?

Predictor Extortion
A Predictor approaches you and threatens to torture you unless you hand over $100. They only approached you because they predicted beforehand that you would hand over the $100. Do you pay up?

Counterfactual Mugging
Predictor flips coin which lands heads, and approaches you and asks you for $100. If the coin had landed tails, it would have tortured you if it predicted you wouldn’t give the $100. Do you give?

Newcomb’s Soda
You have 50% credence that you were given Soda 1, and 50% that you were given Soda 2. Those that had Soda 1 have a strong unconscious inclination to choose chocolate ice cream and will be given $1,000,000. Those that had Soda 2 have a strong unconscious inclination to choose vanilla ice cream and are given nothing. If you choose vanilla ice cream, you get $1000. Do you choose chocolate or vanilla ice cream?

Meta-Newcomb Problem
Two boxes: A and B. A contains $1,000. Box B will contain either nothing or $1,000,000. What B will contain is (or will be) determined by a Predictor just as in the standard Newcomb problem. Half of the time, the Predictor makes his move before you by predicting what you’ll do. The other half, the Predictor makes his move after you by observing what you do. There is a Metapredictor, who has an excellent track record of predicting Predictor’s choices as well as your own. The Metapredictor informs you that either (1) you choose A and B and Predictor will make his move after you make your choice, or else (2) you choose only B, and Predictor has already made his choice. Do you take only B or both A and B?