The Monty Hall non-paradox

I recently showed the famous Monty Hall problem to a friend. This friend solved the problem right away, and we realized quickly that the standard presentation of the problem is highly misleading.

Here’s the setup as it was originally described in the magazine column that made it famous:

Suppose you’re on a game show, and you’re given the choice of three doors: Behind one door is a car; behind the others, goats. You pick a door, say No. 1, and the host, who knows what’s behind the doors, opens another door, say No. 3, which has a goat. He then says to you, “Do you want to pick door No. 2?” Is it to your advantage to switch your choice?

I encourage you to think through this problem for yourself and come to an answer. Will provide some blank space so that you don’t accidentally read ahead.

 

 

 

 

 

 

Now, the writer of the column was Marilyn vos Savant, famous for having an impossible IQ of 228 according to an interpretation of a test that violated “almost every rule imaginable concerning the meaning of IQs” (psychologist Alan Kaufman). In her response to the problem, she declared that switching gives you a 2/3 chance of winning the car, as opposed to a 1/3 chance for staying. She argued by analogy:

Yes; you should switch. The first door has a 1/3 chance of winning, but the second door has a 2/3 chance. Here’s a good way to visualize what happened. Suppose there are a million doors, and you pick door #1. Then the host, who knows what’s behind the doors and will always avoid the one with the prize, opens them all except door #777,777. You’d switch to that door pretty fast, wouldn’t you?

Notice that this answer contains a crucial detail that is not contained in the statement of the problem! Namely, the answer adds the stipulation that the host “knows what’s behind the doors and will always avoid the one with the prize.”

The original statement of the problem in no way implies this general statement about the host’s behavior. All you are justified to assume in an initial reading of the problem are the observational facts that (1) the host happened to open door No. 3, and (2) this door happened to contain a goat.

When nearly a thousand PhDs wrote in to the magazine explaining that her answer was wrong, she gave further arguments that failed to reference the crucial point; that her answer was only true given additional unstated assumptions.

My original answer is correct. But first, let me explain why your answer is wrong. The winning odds of 1/3 on the first choice can’t go up to 1/2 just because the host opens a losing door. To illustrate this, let’s say we play a shell game. You look away, and I put a pea under one of three shells. Then I ask you to put your finger on a shell. The odds that your choice contains a pea are 1/3, agreed? Then I simply lift up an empty shell from the remaining other two. As I can (and will) do this regardless of what you’ve chosen, we’ve learned nothing to allow us to revise the odds on the shell under your finger.

Notice that this argument is literally just a restatement of the original problem. If one didn’t buy the conclusion initially, restating it in terms of peas and shells is unlikely to do the trick!

This problem was made even more famous by this scene in the movie “21”, in which the protagonist demonstrates his brilliance by coming to the same conclusion as vos Savant. While the problem is stated slightly better in this scene, enough ambiguity still exists that the proper response should be that the problem is underspecified, or perhaps a set of different answers for different sets of auxiliary assumptions.

The wiki page on this ‘paradox’ describes it as a veridical paradox, “because the correct choice (that one should switch doors) is so counterintuitive it can seem absurd, but is nevertheless demonstrably true.”

Later on the page, we see the following:

In her book The Power of Logical Thinking, vos Savant (1996, p. 15) quotes cognitive psychologist Massimo Piattelli-Palmarini as saying that “no other statistical puzzle comes so close to fooling all the people all the time,” and “even Nobel physicists systematically give the wrong answer, and that they insist on it, and they are ready to berate in print those who propose the right answer.”

There’s something to be said about adequacy reasoning here; when thousands of PhDs and some of the most brilliant mathematicians in the world are making the same point, perhaps we are too quick to write it off as “Wow, look at the strength of this cognitive bias! Thank goodness I’m bright enough to see past it.”

In fact, the source of all of the confusion is fairly easy to understand, and I can demonstrate it in a few lines.

Solution to the problem as presented

Initially, all three doors are equally likely to contain the car.
So Pr(1) = Pr(2) = Pr(3) = ⅓

We are interested in how these probabilities update upon the observation that 3 does not contain the car.
Pr(1 | ~3) = Pr(1)・Pr(~3 | 1) / Pr(~3)
= (⅓ ・1) / ⅔ = ½

By the same argument,
Pr(2 | ~3) = ½

Voila. There’s the simple solution to the problem as it is presented, with no additional presumptions about the host’s behavior. Accepting this argument requires only accepting three premises:

(1) Initially all doors are equally likely to be hiding the car.

(2) Bayes’ rule.

(3) There is only one car.

(3) implies that Pr(the car is not behind a door | the car is behind a different door) = 100%, which we use when we replace Pr(~3 | 1) with 1.

The answer we get is perfectly obvious; in the end all you know is that the car is either in door 1 or door 2, and that you picked door 1 initially. Since which door you initially picked has nothing to do with which door the car was behind, and the host’s decision gives you no information favoring door 1 over door 2, the probabilities should be evenly split between the two.

It is also the answer that all the PhDs gave.

Now, why does taking into account the host’s decision process change things? Simply because the host’s decision is now contingent on your decision, as well as the actual location of the car. Given that you initially opened door 1, the host is guaranteed to not open door 1 for you, and is also guaranteed to not open up a door hiding the car.

Solution with specified host behavior

Initially, all three doors are equally likely to contain the car.
So Pr(1) = Pr(2) = Pr(3) = ⅓

We update these probabilities upon the observation that 3 does not contain the car, using the likelihood formulation of Bayes’ rule.

Pr(1 | open 3) / Pr(2 | open 3)
= Pr(1) / Pr(2)・Pr(open 3 | 1) / Pr(open 3 | 2)
= ⅓ / ⅓・½ / 1 = ½

So Pr(1 | open 3) = ⅓ and Pr(2 | open 3) = ⅔

Pr(open 3 | 2) = 1, because the host has no choice of which door to open if you have selected door 1 and the car is behind door 2.

Pr(open 3 | 1) = ½, because the host has a choice of either opening 2 or 3.

In fact, it’s worth pointing out that this requires another behavioral assumption about the host that is nowhere stated in the original post, or Savant’s solution. This is that if there is a choice about which of two doors to open, the host will pick randomly.

This assumption is again not obviously correct from the outset; perhaps the host chooses the larger of the two door numbers in such cases, or the one closer to themselves, or the one or the smaller number with 25% probability. There are an infinity of possible strategies the host could be using, and this particular strategy must be explicitly stipulated to get the answer that Wiki proclaims to be correct.

It’s also worth pointing out that once these additional assumptions are made explicit, the ⅓ answer is fairly obvious and not much of a paradox. If you know that the host is guaranteed to choose a door with a goat behind it, and not one with a car, then of course their decision about which door to open gives you information. It gives you information because it would have been less likely in the world where the car was under door 1 than in the world where the car was under door 2.

In terms of causal diagrams, the second formulation of the Monty Hall problem makes your initial choice of door and the location of the car dependent upon one another. There is a path of causal dependency that goes forwards from your decision to the host’s decision, which is conditioned upon, and then backward from the host’s decision to which door the car is behind.

Any unintuitiveness in this version of the Monty Hall problem is ultimately due to the unintuitiveness of the effects of conditioning on a common effect of two variables.

Monty Hall Causal

In summary, there is no paradox behind the Monty Hall problem, because there is no single Monty Hall problem. There are two different problems, each containing different assumptions, and each with different answers. The answers to each problem are fairly clear after a little thought, and the only appearance of a paradox comes from apparent disagreements between individuals that are actually just talking about different problems. There is no great surprise when ambiguous wording turns out multiple plausible solutions, it’s just surprising that so many people see something deeper than mere ambiguity here.

Taxonomy of infinity catastrophes for expected utility theory

Basics of expected utility theory

I’ve talked quite a bit in past posts about the problems that infinities raise for expected utility theory. In this post, I want to systematically go through and discuss the different categories of problems.

First of all, let’s define expected utility theory.

Definitions:
Given an action A, we have a utility function U over the possible consequences
U = { U1, U2, U3, … UN }
and a credence distribution P over the consequences
P = { P1, P2, P3, … PN }.
We define the expected utility of A to be EU(A) = P1U1 + P2U2 + … + PNUN

Expected Utility Theory:
The rational action is that which maximizes expected utility.

Just to give an example of how this works out, suppose that we can choose between two actions A1 and A2, defined as follows:

Action A1
U1 = { 20, -10 }
P1 = { 50%, 50% }

Action A2
U2 = { 10, -20 }
P2 = { 80%, 20% }

We can compare the expected utilities of these two actions by using the above formula.

EU(A1) = 20∙50% + -10∙50% = 5
EU(A2) = 10∙80% + -20∙20% = 4

Since EU(A1) is greater than EU(A2), expected utility theory mandates that A1 is the rational act for us to take.

Expected utility theory seems to work out fine in the case of finite payouts, but becomes strange when we begin to introduce infinities. Before even talking about the different problems that arise, though, you might be tempted to brush off this issue, thinking that infinite payouts don’t really exist in the real world.

While this is a tenable position to hold, it is certainly not obviously correct. We can easily construct games that are actually do-able that have an infinite expected payout. For instance, a friend of mine runs the following procedure whenever it is getting late and he is trying to decide whether or not he should head home: First, he flips a coin. If it lands heads, he heads home. If tails, he waits one minute and then re-flips the coin. If it lands heads this time, he heads home. If tails, then he waits two minutes and re-flips the coin. On the next flip, if it lands tails, he waits four minutes. Then eight. And so on. The danger of this procedure is that on overage, he ends up staying out for an infinitely long period of time.

This is a more dangerous real-world application of the St. Petersburg Paradox (although you’ll be glad to know that he hasn’t yet been stuck hanging out with me for an infinite amount of time). We might object: Yes, in theory this has an infinite expected time. But we know that in practice, there will be some cap on the total possible time. Perhaps this cap corresponds to the limit of tolerance that my friend has before he gives up on the game. Or, more conclusively, there is certainly an upper limit in terms of his life span.

Are there any real infinities out there that could translate into infinite utilities? Once again, plausibly no. But it doesn’t seem impossible that such infinities could arise. For instance, even if we wanted to map utilities onto positive-valence experiences and believed that there was a theoretical upper limit on the amount of positivity you could possible experience in a finite amount of time, we could still appeal to the possibility of an eternity of happiness. If God appeared before you and offered you an eternity of existence in a Heaven, then you would presumably be considering an offer with a net utility of positive infinity. Maybe you think this is implausible (I certainly do), but it is at least a possibility that we could be confronted with real infinities in expected utility calculations.

Reassured that infinite utilities are probably not a priori ruled out, we can now ask: How does expected utility theory handle these scenarios?

The answer is: not well.

There are three general classes of failures:

  1. Failure of dominance arguments
  2. Undefined expected utilities
  3. Nonsensical expected utilities

Failure of dominance arguments

A dominance argument is an argument that says that if the expected utility of one action is greater than the expected utility of another, no matter what is the case.

Here’s an example. Consider two lotteries: Lottery 1 and Lottery 2. Each one decides on whether a player wins or not by looking at some fixed random event (say, whether or not a radioactive atom decays within a fixed amount of time T), but the reward for winning differs. If the radioactive atom does decay within time T, then you would get $100,000 from Lottery 1 and $200,000 from Lottery 2. If it does not, then you lose $200 dollars from Lottery 1 and lose $100 dollars from Lottery 2. Now imagine that you can choose only one of these two lotteries.

To summarize: If the atom decays, then Lottery 1 gives you $100,000 less than Lottery 2. And if the atom doesn’t decay, then Lottery 1 charges you $100 more than Lottery 2.

In other words, no matter what ends up happing, you are better off choosing Lottery 2 than Lottery 1. This means that Lottery 2 dominates Lottery 1 as a strategy. There is no possible configuration of the world in which you would have been better off by choosing Lottery 1 than you were by Lottery 2, so this choice is essentially risk-free.

So we have the following general principle, which seems to follow nicely from a simple application of expected utility theory:

Dominance: If action A1 dominates action A2, then it is irrational to choose A2 over A1.

Amazingly, this straightforward and apparently obvious rule ends up failing us when we start to talk about infinite payoffs.

Consider the following setup:

Action 1
U = { ∞, 0 }
P = { .5, .5 }

Action 2
U = { ∞, 10 }
P = { .5, .5 }

Action 2 weakly dominates Action 1. This means that no matter what consequence ends up obtaining, we always end up either better off or equally well off if we take Action 2 than Action 1. But when we calculate the expected utilities…

EU(Action 1) = .5 ∙ ∞ + .5 ∙ 0 = ∞
EU(Action 2) = .5 ∙ ∞ + .5 ∙ 10 = ∞

… we find that the two actions are apparently equal in utility, so we should have no preference between them.

This is pretty bizarre. Imagine the following scenario: God is about to appear in front of you and ship you off to Heaven for an eternity of happiness. In the few minutes before he arrives, you are able to enjoy a wonderfully delicious-looking Klondike bar if you so choose. Obviously the rational thing to do is to eat the Klondike bar, right? Apparently not, according to expected utility theory. The additional little burst of pleasure you get fades into irrelevance as soon as the infinities enter the calculation.

Not only do infinities make us indifferent between two actions, one of which dominates the other, but they can even make us end up choosing actions that are clearly dominated! My favorite example of this is one that I’ve talked about earlier, featuring a recently deceased Donald Trump sitting in Limbo negotiating with God.

To briefly rehash this thought experiment, every day Donald Trump is given an offer by God that he spend one day in Hell and in reward get two days in Heaven afterwards. Each day, the rational choice is for Trump to take the offer, spending one more day in Hell before being able to receive his reward. But since he accepts the offer every day, he ends up always delaying his payout in Heaven, and therefore spends all of eternity in Hell, thinking that he’s making a great deal.

We can think of Trump’s reason for accepting each day as a simple expected utility calculation: U(2 days in Heaven) + U(1 day in Hell) > 0. But iterating this decision an infinity of times ends up leaving Trump in the worst possible scenario – eternal torture.

Undefined expected utilities

Now suppose that you get the following deal from God: Either (Option 1) you die and stop existing (suppose this has utility 0 to you), or (Option 2) you die and continue existing in the afterlife forever. If you choose the afterlife, then your schedule will be arranged as follows: 1,000 days of pure bliss in heaven, then one day of misery in hell. Suppose that each day of bliss has finite positive value to you, and each day of misery has finite negative value to you, and that these two values perfectly cancel each other out (a day in Hell is as bad as a day in Heaven is good).

Which option should you take? It seems reasonable that Option 2 is preferable, as you get a thousand to one ratio of happiness to unhappiness for all of eternity.

Option 1: 💀, 💀, 💀, 💀, 
Option 2:
😇 x 1000, 😟, 😇 x 1000, 😟, …

Since U(💀) = 0, we can calculate the expected utility of Option 1 fine. But what about Option 2? The answer we get depends on the order in which we add up the utilities of each day. If we take the days in chronological order, than we get a total infinite positive utility. If we alternate between Heaven days and Hell days, then we get a total expected utility of zero. And if we add up in the order (Hell, Hell, Heaven, Hell, Hell, Heaven, …), then we end up getting an infinite negative expected utility.

In other words, the expected utility of Option 2 is undefined, giving us no guidance as to which we should prefer. Intuitively, we would want a rational theory of preference would tell us that Option 2 is preferable.

A slightly different example of this: Consider the following three lotteries:

Lottery 1
U = { ∞, -∞ }
P = { .5, .5 }

Lottery 2
U = { ∞, -∞ }
P = { .01, .99 }

Lottery 3
U = { ∞, -∞ }
P = { .99, .01 }

Lottery 1 corresponds to flipping a fair coin to determine whether you go to Heaven forever or Hell forever. Lottery 2 corresponds to picking a number between 1 and 100 to decide. And Lottery 3 corresponds to getting to pick 99 numbers between 1 and 100 to decide. It should be obvious that if you were in this situation, then you should prefer Lottery 3 over Lottery 1, and Lottery 1 over Lottery 2. But here, again, expected utility theory fails us. None of these lotteries have defined expected utilities, because ∞ – ∞ is not well defined.

Nonsensical expected utilities

A stranger approaches you and demands twenty bucks, on pain of an eternity of torture. What should you do?

Expected utility theory tells us that as long as we have some non-zero credence in this person’s threat being credible, then we should hand over the twenty bucks. After all, a small but nonzero probability multiplied by -∞ is still just -∞.

Should we have a non-zero credence in the threat being credible? Plausibly so. To have a zero credence in the threat’s credibility is to imagine that there is no possible evidence that could make it any more likely. It is true that no experience you could have would make the threat any more credible? What if he demonstrated incredible control over

In the end, we have an inconsistent triad.

  1. The rational thing to do is that which maximizes expected utility.
  2. There is a nonzero chance that the stranger threatening you with eternal torture is actually able to follow through on this threat.
  3. It is irrational to hand over the five dollars to the stranger.

This is a rephrasing of Pascal’s wager, but without the same problems as that thought experiment.

Pascal’s mugging

  • You should make decisions by evaluating the expected utilities of your various options and choosing the largest one.

This is a pretty standard and uncontroversial idea. There is room for controversy about how to fill in the details about how to evaluate expected utilities, but this basic premise is hard to argue against. So let’s argue against it!

Suppose that a stranger walks up to you in the street and says to you “I have been wired in from outside the simulation to give you the following message: If you don’t hand over five dollars to me right now, your simulator will teleport you to a dungeon and torture you for all eternity.” What should you do?

The obviously correct answer is that you should chuckle, continue on with your day, and laugh about the incident later on with your friends.

The answer you get from a simple application of decision theory is that as long as you aren’t absolutely, 100% sure that they are wrong, you should give them the five dollars. And you should definitely not be 100% sure. Why?

Suppose that the stranger says next: “I know that you’re probably skeptical about the whole simulation business, so here’s some evidence. Say any word that you please, and I will instantly reshape the clouds in the sky into that word.” You do so, and sure enough the clouds reshape themselves. Would this push your credences around a little? If so, then you didn’t start at 100%. Truly certain beliefs are those that can’t be budged by any evidence whatsoever. You can never update downwards on truly certain beliefs, by the definition of ‘truly certain’.

To go more extreme, just suppose that they demonstrate to you that they’re telling you the truth by teleporting you to a dungeon for five minutes of torture, and then bringing you back to your starting spot. If you would even slightly update your beliefs about their credibility in this scenario, then you had a non-zero credence in their credibility from the start.

And after all, this makes sense. You should only have complete confidence in the falsity of logical contradictions, and it’s not literally logically impossible that we are in a simulation, or that the simulator decides to mess with our heads in this bizarre way.

Okay, so you have a nonzero credence in their ability to do what they say they can do. And any nonzero credence, no matter how tiny, will result in the rational choice being to hand over the $5. After all, if expected utility is just calculated by summing up utilities weighted by probabilities, then you have something like the following:

EU(keep $5) – EU(give $5) = ε · U(infinite torture) – U(keep $5)
where ε = P(infinite torture | keep $5) – P(infinite torture | give $5)

As long as losing $5 isn’t infinitely bad to you, you should hand over the money. This seems like a problem, either for our intuitions or for decision theory.

***

So here are four propositions, and you must reject at least one of them:

  1. There is a nonzero chance of the stranger’s threat being credible.
  2. Infinite torture is infinitely worse than losing $5.
  3. The rational thing to do is that which maximizes expected utility.
  4. It is irrational to give the stranger $5.

I’ve already argued for (1), and (2) seems virtually definitional. So our choice is between (3) and (4). In other words, we either abandon the principle of maximizing expected utility as a guide to instrumental rationality, or we reject our intuitive confidence in the correctness of (4).

Maybe at this point you feel more willing to accept (4). After all, intuitions are just intuitions, and humans are known to be bad at reasoning about very small probabilities and very large numbers. Maybe it actually makes sense to hand over the $5.

But consider where this line of reasoning leads.

The exact same argument should lead you to give in to any demand that the stranger makes of you, as long as it doesn’t have a literal negative infinity utility value. So if the stranger tells you to hand over your car keys, to go dance around naked in a public square, or to commit heinous crimes… all of these behaviors would be apparently rationally mandated.

Maybe, maybe, you might be willing to bite the bullet and say that yes, these behaviors are all perfectly rational, because of the tiny chance that this stranger is telling the truth. I’d still be willing to bet that you wouldn’t actually behave in this self-professedly “rational” manner if I now made this threat to you.

Also, notice that this dilemma is almost identical to Pascal’s wager. If you buy the argument here, then you should also be doing all that you can to ensure that you stay out of Hell. If you’re queasy about the infinities and think decision theory shouldn’t be messing around with such things, then we can easily modify the problem.

Instead of “your simulator will teleport you to a dungeon and torture you for all eternity”, make it “your simulator will teleport you to a dungeon and torture you for 3↑↑↑↑3 years.” The negative utility of this is large enough as to outweigh any reasonable credence you could place in the credibility of the threat. And if it isn’t, we can just make the number of years even larger.

Maybe the probability of a given payout scales inversely with the size of the payout? But this seems fairly arbitrary. Is it really the case that the ability to torture you for 3↑↑↑↑3 years is twice as likely as the ability to torture you for 2 ∙ 3↑↑↑↑3 years? I can’t imagine why. It seems like the probability of these are going to be roughly equal – essentially, once you buy into the prospect of a simulator that is able to torture you for 3↑↑↑↑3 years, you’ve already basically bought into the prospect that they are able to torture you for twice that amount of time.

All we’re left with is to throw our hands up and say “I can’t explain why this argument is wrong, and I don’t know how decision theory has gone wrong here, but I just know that it’s wrong. There is no way that the actually rational thing to do is to allow myself to get mugged by anybody that has heard of Pascal’s wager.”

In other words, it seems like the correct response to Pascal’s mugging is to reject (3) and deny the expected-utility-maximizing approach to decision theory. The natural next question is: If expected utility maximization has failed us, then what should replace it? And how would it deal with Pascal’s mugging scenarios? I would love to see suggestions in the comments, but I suspect that this is a question that we are simply not advanced enough to satisfactorily answer yet.

Timeless decision theory and homogeneity

Something that seems difficult to me about timeless decision theory is how to reason in a world where most people are not TDTists. In such a world, it seems like the subjunctive dependence between you and others gets weaker and weaker the more TDT influences your decision process.

Suppose you are deciding whether or not to vote. You think through all of the standard arguments you know of: your single vote is virtually guaranteed to not swing the election, so the causal effect of your vote is essentially nothing; the cost to you of voting is tiny, or maybe even positive if you go with a friend and show off your “I Voted” sticker all day;  if you vote, you might be able to persuade others to vote as well; etc. At the end of your pondering, you decide that it’s overall not worth it to vote.

Now a TDTist pops up behind your shoulder and says to you: “Look, think about all the other people out there reasoning similarly to you. If you end up not voting as a result of this reasoning, then it’s pretty likely that they’ll all not vote as well. On the other hand, if you do end up voting, then they probably will vote too! So instead of treating your decision as if it only makes the world 1 vote different, you should treat it is if it influences all the votes of those sufficiently similar to you.”

 Maybe you instantly find this convincing, and decide to go to the voting booth right away. But the problem is that in taking into account this extra argument, you have radically reduced the set of people whose overall reasoning process is similar to you!

This set was initially everybody that had thought through all the similar arguments and felt similarly to you about them, and most of these people ended up not voting. But as soon as the TDTist popped up and presented their argument, the set of people that were subjunctively dependent upon you shrunk to just those in the initial set that had also heard this argument.

In a world in which only a single person ever had thought about subjunctive dependence, and this person was not going to vote before thinking about it, the evidential effect of not voting is basically zero. Given this, the argument would have no sway on them.

This seems like it would weaken the TDTist’s case that TDTists do better in real world problems. At the same time, it seems actually right. In a case where very few people follow the same reasoning processes as you, your decisions tell you very little about the decisions of others, for the same reason that a highly neuro-atypical person should be hesitant to generalize information about their brain to other people.

Another conclusion of this is that timeless decision theory is most powerful in a community where there is homogeneity of thought and information. Propagation of the idea of timeless decision theory would amplify the coordination-inducing power of the procedure.

I’m not sure if this implies that a TDTist is motivated to spread the idea and homogenize their society, as doing so increases subjunctive dependence and thus enhances their influence. I’d guess that they would only reason this way if they thought themselves to be above average in decision-making, or to have information that others don’t, so that the expected utility of them having increased decision-making ability would outweigh the costs of homogeneity.

Timeless ethics and Kant

The more I think about timeless decision theory, the more it seems obviously correct to me.

The key idea is that sometimes there is a certain type of non-causal logical dependency (called a subjunctive dependence) between agents that must be taken into account by those agents in order to make rational decisions. The class of cases in which subjunctive dependences become relevant involve agents in environments that contain other agents trying to predict their actions, and also environments that contain other agents that are similar to them.

Here’s my favorite motivating thought experiment for TDT: Imagine that you encounter a perfect clone of yourself. You have lived identical lives, and are structurally identical in every way. Now you are placed in a prisoner’s dilemma together. Should you cooperate or defect?

A non-TDTist might see no good reason to cooperate – after all, defecting dominates cooperation as a strategy, and your decision doesn’t affect your clone’s decision. If the two of you share no common cause explanation for your similarity, then this conclusion is even stronger – both evidential and causal decision theory would defect. So both you and your clone defect and you both walk away unhappy.

TDT is just the admission that there is an additional non-causal dependence between your decision and your clone’s decision that must be taken into account. This dependence comes from the fact that you and your clone have a shared input-output structure. That is, no matter what you end up doing, you know that your clone must do the same thing, because your clone is operating identically to you.

In a deterministic world, it is logically impossible that you choose to do X and your clone does Y. The initial conditions are the same, so the final conditions must be the same. So you end up cooperating, as does your clone, and everybody walks away happy.

With an imperfect clone, it is no longer logically impossible, but there still exists a subjunctive dependence between your actions and your clone’s.

This is a natural and necessary modification to decision theory. We take into account not only the causal effects of our actions, but the evidential effects of our actions. Even if your action does not causally affect a given outcome, it might still make it more or less likely, and subjunctive dependence is one of the ways that this can happen.

TDTists interacting with each other would get along really nicely. They wouldn’t fall victim to coordination problems, because they wouldn’t see their decisions as isolated and disconnected from the decisions of the others. They wouldn’t undercut each other in bargaining problems in which one side gets to make the deals and the other can only accept or reject.

In general, they would behave in a lot of ways that are standardly depicted as irrational (like one-boxing in Newcomb’s problem and cooperating in the prisoner’s dilemma), and end up much better off as a result. Such a society seems potentially much nicer and subject to fewer of the common failure modes of standard decision theory.

In particular, in a society in which it is common knowledge that everybody is a perfect TDTist, there can be strong subjunctive dependencies between the actions causally disconnected agents. If a TDTist is considering whether or not to vote for their preferred candidate, they aren’t comparing outcomes that differ by a single vote. They are considering outcomes that differ by the size of the entire class of individuals that would be reasoning similar to them.

In simple enough cases, this could mean that your decision about whether to vote is really a decision about if millions of people will vote or not. This may sound weird, but it follows from the exact same type of reasoning as in the clone prisoner’s dilemma.

Imagine that the society consisted entirely of 10 million exact clones of you, each deciding whether or not to vote. In such a world, each individual’s choice is perfectly subjunctively dependent upon every other individual’s choice. If one of them decides not to vote, then all of them decide not to vote.

In a more general case, perfect clones of you don’t exist in your environment. But in any given context, there is still a large class of individuals that reason similarly to you as a result of a similar input-output process.

For example, all humans are very similar in certain ways. If I notice that my blood is red, and I had previously never heard about or seen the blood color of anybody else, then I should now strongly update on the redness of the blood of other humans. This is obviously not because my blood being red causes others to have red blood. It is also not because of a common cause – in principle, any such cause could be screened off, and we would expect the same dependence to exist solely in virtue of the similarity of structure. We would expect red blood in alien whose evolutionary history has been entirely causally separated from ours but who by a wild coincidence has the same DNA structure as humans.

Our similarities in structure can be less salient to us when we think about our minds and the way we make decisions, but they still are there. If you notice that you have a strong inclination to decide to take action X, then this actually does serve as evidence that a large class of other people will take action X. The size of this class and the strength of this evidence depends on which particular X is being analyzed.

Ethics and TDT

It is natural to wonder: what sort of ethical systems naturally arise out of TDT?

We can turn a decision theory into an ethical framework by choosing a utility function that encodes the values associated with that ethical framework. The utility function for hedonic utilitarianism assigns utility according to the total well-being in the universe. The utility function for egoism assigns utility only to your own happiness, apathetic to the well-being of others.

Virtue ethics and deontological ethics are harder to encode. We could do the first by assigning utility to virtuous character traits and disutility to vices. The second could potentially be achieved by assigning negative infinities to violations of the moral rules.

Let’s brush aside the fact that some of these assignments are less feasible than others. Pretend that your favorite ethical system has a nice neat way of being formalized as a utility function. Now, the distinctive feature of TDT-based ethics is that when we are trying to decide on the most ethical course of action, TDT says that we must imagine that our decision would also be taken by anybody else that is sufficiently similar to you in a sufficiently similar context.

In other words, in contemplating what the right action to take is, you imagine a world in which these actions are universalized! This sounds very Kantian. One of his more famous descriptions of the categorical imperative was:

Act only according to that maxim whereby you can at the same time will that it should become a universal law.

This could be a tagline for ethics in a world of TDTists! The maxim for your action resembles the notion of similarity in motivation and situational context that generates subjunctive dependence, and the categorical imperative is the demand that you must take into account this subjunctive dependence if you are to reason consistently.

But actually, I think that the resemblance between Kantian ethical reasoning and timeless decision theory begins to fade away when you look closer. I’ll list three main points of difference:

  1. Consistency vs expected utility
  2. Maxim vs subjunctive dependence
  3. Differences in application

1. Consistency vs expected utility

Kantian universalizability is not about expected utility, it is about consistency. The categorical imperative forbids acts that, when universalized, become self-undermining. If an act is consistently universalizable, then it is not a violation of the categorical imperative, even if it ends up with everybody in horrible misery.

Timeless decision theory looks at a world in which everybody acts according to the same maxim that you are acting under, and then asks whether this world looks nice or not. “Looks nice” refers to your utility function, not any notion of consistency or non-self-underminingness (not a word, I know).

So this is the first major difference: A timeless ethical theorist cares ultimately about optimizing their moral values, not about making sure that their values are consistently applicable in the limit of universal instantiation. This puts TDT-based ethics closer to a rule-based consequentialism than to Kantian ethics, although this comparison is also flawed.

Is this a bug or a feature of TDT?

I’m tempted to say it’s a feature. My favorite example of why Kantian consistency is not a desirable meta-ethical principle is that if everybody were to give to charity, then all the problems that could be solved by giving to charity would be solved, and the opportunity to give to charity would disappear. So the act of giving to charity becomes self-undermining upon universalization.

To which I think the right response is: “So what?”

If a world in which everybody gives to charity is a world in which there are no more problems to be solved by charity-giving is impossible, then that sounds pretty great to me. If this consistency requirement prevents you from solving the problems you set out to solve, then it seems like a pretty useless requirement for ethical reasoning.

If your values can be encoded into an expected utility function, then the goal of your ethics should be to maximize that function. The antecedent of this conditional could be reasonably disputed, but I think the conditional as a whole is fairly unobjectionable.

2. Maxim versus subjunctive dependence

One of the most common retorts to Kant’s formulation of the categorical imperative rests on the ambiguity of the term ‘maxim’.

For Kant, your maxim is supposed to be the motivating principle behind your action. It can be thought of as a general rule that determines the contexts in which you would take this action.

If your action is to donate money, then your maxim might be to give 10% of your income to charity every year. If your action is to lie to your boss about why you missed work, then your maxim might be to be dishonest whenever doing otherwise will damage your career prospects.

Now, the maxim is the thing that is universalized, not the action itself. So you don’t suppose that everybody suddenly starts lying to their boss. Instead, you imagine that anybody in a situation where being honest would hurt their career prospects begins lying.

In this situation, Kant would argue that if nobody was honest in these situations, then their bosses would just assume dishonesty, in which case, the employees would never even get the chance to lie in the first place. This is self-undermining; hence, forbidden!

I actually like this line of reasoning a lot. Scott Alexander describes it as similar to the following rule:

Don’t do things that undermine the possibility to offer positive-sum bargains.

Coordination problems arise because individuals decide to defect from optimal equilibriums. If these defectors were reasoning from the Kantian principle of universalizability, they would realize that if everybody behaved similarly then the ability to defect might be undermined.

But the problem largely lies in how one specifies the maxim. For example, compare the following two maxims:

Maxim 1: Lie to your boss whenever being honest would hurt your career opportunities.

Maxim 2: Lie to your boss about why you missed work whenever the real reason is that you went on all-night bar-hopping marathon with your friends Jackie and Khloe and then stayed up all night watched Breaking Bad highlight clips on your Apple TV.

If Maxim 2 is the true motivating principle of your action, then it seems a lot less obvious that the action is a violation of the categorical imperative. If only people in this precisely specified context lied to their bosses, then bosses would overall probably not become less trusting of their employees (unless your boss knows an unusual amount about your personal life). So the maxim is not self-undermining under universalization, and is therefore not forbidden.

Under Maxim 1, lying is forbidden, and under Maxim 2, it is not. But what is the true maxim? There’s no clear answer to this question. Any given action can be truthfully described as arising from numerous different motivational schema, and in general these choices will result in a variety of different moral guidelines.

In TDT, the analog to the concept of a maxim is subjunctive dependence, and this can be defined fairly precisely, without ambiguity. Subjunctive dependence between agents in a given context is just the degree of evidence you get about the actions of an agent given information about the actions of the other agents in that context.

More precisely, it is the degree of non-causal dependence between the actions of agents. It essentially arises from the fact that in a lawful physical universe, similar initial conditions will result in similar final conditions. This can be worded as similarity in initial conditions, in input-output structure, in computational structure, or in logical structure, but the basic idea is the same.

Not only is this precisely defined, it is a real dependence. You don’t have to imagine a fictional universe in which your action makes it more likely that others will act similarly; the claim of TDT is that this is actually the case!

In this sense, TDT is rooted in a simple acknowledgement of dependencies that really do exist and that can be precisely defined, while Kant’s categorical imperative relies on the ambiguous notion of a maxim, as well as a seemingly arbitrary hypothetical consideration. One might be tempted to ask: “Who cares what would happen if hypothetically everybody else acted according to a similar maxim? We should care about is what will actually happen in the real world; we shouldn’t be basing our decisions off of absurd hypothetical worlds!”

3. Difference in application

These two theoretical reasons are fairly convincing to me that Kantianism and TDT ethics are only superficially similar, and are theoretically quite different. But there still remains a question of how much the actual “outputs” of the two frameworks converge. Do they just end up giving similar ethical advice?

I don’t think so. First, consider the issue I touched on previously. I said that defectors that paid attention to the categorical imperative would rethink their decision, because it is not universalizable. But this is not in general true.

If defectors are always free to defect, regardless of how many others defect as well, then defecting will still be universalizable! It is only in special cases that Kantians will not defect, like when a mob boss will come in and necessitate cooperation if enough people defect, or where universal defection depletes an expendable resource that would otherwise be renewable.

The set of coordination problems in which defecting automatically becomes impossible at a certain point are the easiest cases of coordination problems. It’s much harder to get individuals to coordinate if there is no mob boss to step in and set everybody right. These are the cases where Kantianism fails, and TDT succeeds.

TDTists with shared goals for whom cooperation would be more effective for achieving these goals would always cooperate, even if “each would individually be better off” if they defected. (I put scare quotes because you only come to this conclusion by ignoring important dependencies in the problem).

The key difference here comes down again to #1: timeless decision theorists maximize expected utility, not Kantian consistency.

In addition, TDTists don’t necessarily have Kantian hangups about using people as means to an end: if doing so ends up producing a higher expected utility than not, then they’ll go for it without hesitation.

A TDTist that can save two people’s lives by causing a little harm to one person would probably do it if their utility function was relatively impartial and placed a positive value on life. A Kantian would forbid this act.

(Why? Well, Kant thought that this principle of treating people as ends in themselves rather than means to an end was equivalent to the universalizability principle, and as far as I know, pretty much nobody was convinced by his argument for why this was the case. As such, a lot of Kantian ethics looks like it doesn’t actually follow from the universalizability principle.)

An application that might be similar for Kantian ethics and TDT ethics is the treatment of dishonesty and deception. Kant famously forbid any lying of any kind, regardless of the consequences, on the basis that universal lying would undermine the trust that is necessary to make lying a possibility.

One can imagine a similar case made for honesty in TDT ethics. In a society of TDTs, a decision to lie is a decision to produce a society that is overall less honest and less trusting. In situations where the individual benefits of dishonesty are zero-sum, only the negative effects of dishonesty are amplified. This could plausibly make dishonesty on the whole a net negative policy.

Hyperreal decision theory

I’ve recently been writing a lot about how infinities screw up decision theory and ethics. In his paper Infinite Ethics, Nick Bostrom talks about attempts to apply nonstandard analysis to try to address the problems of infinities. I’ll just briefly describe nonstandard analysis in this post, as well as the types of solutions that he sketches out.

Hyperreals

So first of all, what is nonstandard analysis? It is a mathematical formalism that extends the ordinary number system to include infinitely large numbers and infinitely small numbers. It doesn’t do so just by adding the symbol ∞ to the set of real numbers ℝ. Instead, it adds an infinite amount of different infinitely large numbers, as well as an infinity of infinitesimal numbers, and proceeds to extend the ordinary operations of addition and multiplication to these numbers. The new number system is called the hyperreals.

So what actually are hyperreals? How does one do calculations with them?

A hyperreal number is an infinite sequence of real numbers. Here are some examples of them:

(3, 3, 3, 3, 3, …)
(1, 2, 3, 4, 5, …)
(1, ½, ⅓, ¼, ⅕, …)

It turns out that the first of these examples is just the hyperreal version of the ordinary number 3, the second is an infinitely larger hyperreal, and the third is an infinitesimal hyperreal. Weirded out yet? Don’t worry, we’ll explain how to make sense of this in a moment.

So every ordinary real number is associated with a hyperreal in the following way:

N = (N, N, N, N, N, …)

What if we just switch the first number in this sequence? For instance:

N’ = (1, N, N, N, N, …)

It turns out that this change doesn’t change the value of the hyperreal. In other words:

N = N’
(N, N, N, N, N, …) = (1, N, N, N, N, …)

In general, if you take any hyperreal number and change a finite amount of the numbers in its sequence, you end up with the same number you started with. So, for example,

3 = (3, 3, 3, 3, 3, …)
= (1, 5, 99, 3, 3, 3, …)
= (3, 3, 0, 0, 0, 0, 3, 3, …)

The general rule for when two hyper-reals are equal relies on the concept of a free ultrafilter, which is a little above the level that I want this post to be at. Intuitively, however, the idea is that for two hyperreals to be equal, the number of ways in which their sequences differ must be either finite or certain special kinds of infinities (I’ll leave this “special kinds” vague for exposition purposes).

Adding and multiplying hyperreals is super simple:

(a1, a2, a3, …) + (b1, b2, b3, …) = (a1 + b1, a2 + b2, a3 + b3, …)
(a1, a2, a3, …) · (b1, b2, b3, …) = (a1 · b1, a2 · b2, a3 · b3, …)

Here’s something that should be puzzling:

A = (0, 1, 0, 1, 0, 1, …)
B = (1, 0, 1, 0, 1, 0, …)
A · B = ?

Apparently, the answer is that A · B = 0. This means that at least one of A or B must also be 0. But both of them differ from 0 in an infinity of places! Subtleties like this are why we need to introduce the idea of a free ultrafilter, to allow certain types of equivalencies between infinitely differing sequences.

Anyway, let’s go on to the last property of hyperreals I’ll discuss:

(a1, a2, a3, …) < (b1, b2, b3, …)
if
an ≥ bn for only a finite amount of values of n

(This again has the same weird infinite exceptions as before, which we’ll ignore for now.)

Now at last we can see why (1, 2, 3, 4, …) is an infinite number:

Choose any real number N
N = (N, N, N, N, …)
ω = (1, 2, 3, 4, …)
So ωn ≤ Nn for n = 1, 2, 3, …, floor(N)
and ωn > Nn for all other n

This means that ω is larger than N, because there are only a finite amount of members of the sequence for which ωn is greater than ωn. And since this is true for any real number N, then ω must be larger than every real number! In other words, you can now give an answer to somebody who asks you to name a number that is bigger than every real number!

ε = (1, ½, ⅓, ¼, ⅕, …) is an infinitesimal hyperreal for a similar reason:

Choose any real number N > 0
N = (N, N, N, N, …)
ε = (1, ½, ⅓, ¼, ⅕, …)
So εn ≥ Nn for n = 1, 2, …, ceiling(1/N)
and εn < Nn for all other n

Once again, ε is only larger than N in a finite number of places, and is smaller in the other infinity. So ε is smaller than every real number greater than 0.

In addition, the sequence (0, 0, 0, …) is smaller than ω for every value of its sequence, so ε is larger than 0. A number that is smaller than every positive real and greater than 0 is an infinitesimal.

Okay, done introducing hyperreals! Let’s now see how this extended number system can help us with our decision theory problems.

Saint Petersburg Paradox

One standard example of weird infinities in decision theory is the St Petersburg Paradox, which I haven’t talked about yet on this blog. I’ll use this thought experiment as a template for the discussion. Briefly, then, imagine a game that works as follows:

Game 1

Round 1: Flip a coin.
If it lands H, then you get $2 and the game ends.
If it lands T, then you move on to Round 2.

Round 2: Flip the coin again.
If it lands H, then you get $4 and the game ends.
If T, then move on to Round 3.

Round 3: Flip a coin.
If it lands H, then you get $8 and the game ends.
If it lands T, then you move on to Round 4.

(et cetera to infinity)

This game looks pretty nice! You are guaranteed at least $2, and your payout doubles every time the coin lands H. The question is, how nice really is the game? What’s the maximum amount that you should be willing to pay in to play?

Here we run into a problem. To calculate this, we want to know what the expected value of the game is – how much you make on average. We do this by adding up the product of each outcome and the probability of that outcome:

EV = ½ · $2 + ¼ · $4 + ⅛ · $8 + …
= $1 + $1 + $1 + …
= ∞

Apparently, the expected payout of this game is infinite! This means that in order to make a profit, you should be willing to give literally all of your money in order to play just a single round of the game! This should seem wrong… If you pay $1,000,000 to play the game, then the only way that you make a profit is if the coin lands heads twenty times in a row. Does it really make sense to risk all of this money on such a tiny chance?

The response to this is that while the chance that this happens is of course tiny, the payout if it does happen is enormous – you stand to double, quadruple, octuple, (et cetera) your money. In this case, the paradox seems to really be a result of the failure of our brains to intuitively comprehend exponential growth.

There’s an even stronger reason to be unhappy with the St Petersburg Paradox. Say that instead of starting with a payout of $2 and doubling each time from there, you had started with a payout of $2000 and doubled from there.

Game 2

Round 1: Flip a coin.
If it lands H, then you get $2000 and the game ends.
If it lands T, then you move on to Round 2.

Round 2: Flip the coin again.
If it lands H, then you get $4000 and the game ends.
If T, then move on to Round 3.

Round 3: Flip a coin.
If it lands H, then you get $8000 and the game ends.
If it lands T, then you move on to Round 4.

(et cetera to infinity)

This alternative game must be better than the initial game – after all, no matter how many times the coin lands T before finally landing H, your payout is 1000 better than it would have been previously. So if you’re playing the first of the two games, then you should always wish that you were playing the second, no matter how many times the coin ends up landing T.

But the expected value comparison doesn’t grant you this! Both games have an infinite expected value, and infinity is infinity. We can’t have one infinity being larger than another infinity, right?

Enter the hyperreals! We’ll turn the expected value of the first game into a hyperreal as follows:

EV1 = ½ · $2 = $1
EV2 = ½ · $2 + ¼ · $4 = $1 + $1 = $2
EV3 = ½ · $2 + ¼ · $4 + ⅛ · $8 = $1 + $1 + $1 = $3

EV = (EV1, EV2, EV3, …)
= $(1, 2, 3, …)

Now we can compare it to the second game:

Game 1: $(1, 2, 3, …) = ω
Game 2: $(1000, 2000, 3000, …) = $1000 · ω

So hyperreals allow us to compare infinities, and justify why Game 2 has a 1000 times larger expected value than Game 1!

Let me give another nice result of this type of analysis. Imagine Game 1′, which is identical to Game 1 except for the first payout, which is $4 instead of $2. We can calculate the payouts:

Game 1: $(1, 2, 3, …) = ω
Game 1′: $(2, 3, 4, …) = $1 + ω

The result is that Game 1′ gives us an expected increase of just $1. And this makes perfect sense! After all, the only difference between the games is if they end in the first round, which happens with probability ½. And in this case, you get $4 instead of $2. The expected difference between the games should therefore be ½ · $2 = $1! Yay hyperreals!

Of course, this analysis still ends up concluding that the St Petersburg game does have an infinite expected payout. Personally, I’m (sorta) okay with biting this bullet and accepting that if your goal is to maximize money, then you should in principle give any arbitrary amount to play the game.

But what I really want to talk about are variants of the St Petersburg paradox where things get even crazier.

Getting freaky

For instance, suppose that instead of the initial game setup, we have the following setup:

Game 3

Round 1: Flip a coin.
If it lands H, then you get $2 and the game ends.
If it lands T, then you move on to Round 2.

Round 2: Flip the coin again.
If it lands H, then you pay $4 and the game ends.
If T, then move on to Round 3.

Round 3: Flip the coin again.
If it lands H, then you get $8 and the game ends.
If it lands T, then you move on to Round 4.

Round 4: Flip the coin again.
If it lands H, then you pay $16 and the game ends.
If it lands T, then you move on to Round 5.

(et cetera to infinity)

The only difference now is that if the coin lands H on any even round, then instead of getting money that round, you have to pay that money back to the dealer! Clearly this is a less fun game than the last one. How much less fun?

Here things get really weird. If we only looked at the odd rounds, then the expected value is ∞.

EV = ½ · $2 + ⅛ · $8 + …
= $1 + $1 + …
= ∞

But if we look at the odd rounds, then we get an expected value of -∞!

EV = ¼ · -$4 + 1/16 · -$16 + …
= -$1 + -$1 + …
= -∞

We find the total expected value by adding together these two. But can we add ∞ to -∞? Not with ordinary numbers! Let’s convert our numbers to hyperreals instead, and see what happens.

EV = $(1, -1, 1, -1, …)

This time, our result is a bit less intuitive than before. As a result of the ultrafilter business we’ve been avoiding talking about, we can use the following two equalities:

(1, -1, 1, -1, …) = 1
(-1, 1, -1, 1, …) = -1

This means that the expected value of Game 3 is $1. In addition, if Game 3 had started with you having to pay $2 for the first round rather than getting $2, then the expected value would be -$1.

So hyperreal decision theory recommends that you play the game, but only buy in if it costs you less than $1.

Now, the last thought experiment I’ll present is the weirdest of them.

Game 4

Round 1: Flip a coin.
If it lands H, then you pay $2 and the game ends.
If it lands T, then you move on to Round 2.

Round 2: Flip the coin again.
If it lands H, then you get $2 and the game ends.
If T, then move on to Round 3.

Round 3: Flip the coin again.
If it lands H, then you pay $2.67 and the game ends.
If it lands T, then you move on to Round 4.

Round 4: Flip the coin again.
If it lands H, then you get $4 and the game ends.
If it lands T, then you move on to Round 4.

Round 4: Flip the coin again.
If it lands H, then you pay $6.40 and the game ends.
If it lands T, then you move on to Round 4.

(et cetera to infinity)

The pattern is that the payoff on the nth round is (-2)n / n. From this, we see that the expected value of the nth round is 1/n. This sum converges as follows:

n=1 (-1)/ n = -ln(2) ≈ -.69

But by Cauchy’s rearrangement theorem, it turns out that by rearranging the terms of this sum, we can make it add up to any amount that we want! (this follows from the fact that the sum of the absolute values of the term is infinite)

This means that not only is the expected value for this game undefined, but it can be justified having every possible value. Not only do we not know the expected value of the game, but we don’t know whether it’s a positive game or a negative game. We can’t even figure out if it’s a finite game or an infinite game!

Let’s apply hyperreal numbers.

EV1 = -$1
EV2 = $(-1 + ½) = -$0.50
EV3 = $(-1 + ½ – ⅓) = -$0.83
EV4 = $(-1 + ½ – ⅓ + ¼) = -$0.58

So EV = $(-1.00, -0.50, -0.83, -0.58, …)

Since this series converges from above and below to -ln(2) ≈ -$0.69, the expected value is -$0.69 + ε, where ε is a particular infinitesimal number. So we get a precisely defined expectation value! One could imagine just empirically testing this value by running large numbers of simulations.

A weirdness about all of this is that the order in which you count up your expected value is extremely important. This is a general property of infinite summation, and seems like a requirement for consistent reasoning about infinities.

We’ve seen that hyperreal numbers can be helpful in providing a way to compare different infinities. But hyperreal numbers are only the first step into the weird realm of the infinite. The surreal number system is a generalization of the hyperreals that is much more powerful. In a future post, I’ll talk about the highly surreal decision theory that results from application of these numbers.

Infinite ethics

There are a whole bunch of ways in which infinities make decision theory go screwy. I’ve written about some of those ways here. This post is about a thought experiment in which infinities make ethics go screwy.

WaitButWhy has a great description of the thought experiment, and I recommend you check out the post. I’ll briefly describe it here anyway:

Imagine two worlds, World A and World B. Each is an infinite two-dimensional checkerboard, and on each square sits a conscious being that can either be very happy or very sad. At the birth of time, World A is entirely populated by happy beings, and World B entirely by sad beings.

From that moment forwards, World A gradually becomes infected with sadness in a growing bubble, while World B gradually becomes infected with happiness in a growing bubble. Both universes exist forever, so the bubble continues to grow forever.

Picture from WaitButWhy

The decision theory question is: if you could choose to be placed in one of these two worlds in a random square, which should you choose?

The ethical question is: which of the universes is morally preferable? Said another way: if you had to bring one of the two worlds into existence, which would you choose?

On spatial dominance

At every moment of time, World A contains an infinity of happiness and a finite amount of sadness. On the other hand, World B always contains an infinity of sadness and a finite amount of happiness.

This suggests the answer to both the ethical question and the decision theory question: World A is better. Ethically, it seems obvious that infinite happiness minus finite sadness is infinitely better than infinite sadness minus finite happiness. And rationally, given that there are always infinitely more people outside the bubble than inside, at any given moment in time you can be sure that you are on the outside.

A plot of the bubble radius over time in each world would look like this:

Infinite Ethics Plots

In this image, we can see that no matter what moment of time you’re looking at, World A dominates World B as a choice.

On temporal dominance

But there’s another argument.

Let’s look at a person at any given square on the checkerboard. In World A, they start out happy and stay that way for some finite amount of time. But eventually, they are caught by the expanding sadness bubble, and then stay sad forever. In World B, they start out sad for a finite amount of time, but eventually are caught by the expanding happiness bubble and are happy forever.

Plotted, this looks like:

Infinite Ethics Plots 2.png

So which do you prefer? Well, clearly it’s better to be sad for a finite amount of time and happy for an infinite amount of time than vice versa. And ethically, choosing World A amounts to dooming every individual to a lifetime of finite happiness and then infinite sadness, while World B is the reverse.

So no matter which position on the checkerboard you’re looking at, World B dominates World A as a choice!

An impossible synthesis

Let’s summarize: if you look at the spatial distribution for any given moment of time, you see that World A is infinitely preferable to World B. And if you look at the temporal distribution for any given position in space, you find that B is infinitely preferable to A.

Interestingly, I find that the spatial argument seems more compelling when considering the ethical question, while the temporal argument seems more compelling when considering the decision theory question. But both of these arguments apply equally well to both questions. For instance, if you are wondering which world you should choose to be in, then you can think forward to any arbitrary moment of time, and consider your chances of being happy vs being sad in that moment. This will get you the conclusion that you should go with World A, as for any moment in the future, you have a 100% chance of being one of the happy people as opposed to the sad people.

I wonder if the difference is that when we are thinking about decision theory, we are imagining ourselves in the world at a fixed location with time flowing past us, and it is less intuitive to think of ourselves at a fixed time and ask where we likely are.

Regardless, what do we do in the face of these competing arguments? One reasonable thing is to try to combine the two approaches. Instead of just looking at a fixed position for all time, or a fixed time over all space, we look at all space and all time, summing up total happiness moments and subtracting total sadness moments.

But now we have a problem… how do we evaluate this? What we have in both worlds is essentially a +∞ and a -∞ added together, and no clear procedure for how to make sense of this addition.

In fact, it’s worse than this. By cleverly choosing a way of adding up the total amount of the quantity happiness – sadness, we can make the result turn out however we want! For instance, we can reach the conclusion that World A results in a net +33 happiness – sadness by first counting up 33 happy moments, and then ever afterwords switching between counting a happy moment and a sad moment. This summation will eventually end up counting all the happy and sad moments, and will conclude that the total is +33.

But of course, there’s nothing special about +33; we could have chosen to reach any conclusion we wanted by just changing our procedure accordingly. This is unusual. It seems that both the total expected value and moral value are undefined for this problem.

The undefinability of the total happiness – sadness of this universe is a special case of the general rule that you can’t subtract infinity from infinity. This seems fairly harmless… maybe it keeps us from giving a satisfactory answer to this one thought experiment, but surely nothing like this could matter to real-world ethical or decision theory dilemmas?

Wrong! If in fact we live in an infinite universe, then we are faced with exactly this problem. If there are an infinite number of conscious experiencers out there, some suffering and some happy, then the total quantity of happiness – sadness in the universe is undefined! What’s more, a moral system that says that we ought to increase the total happiness of the universe will return an error if asked to evaluate what we ought to do in an in infinite universe!

If you think that you should do your part to make the universe a happier place, then you must have some notion of a total amount of happiness that can be increased. And if the total amount of happiness is unbounded, then there is no sensible way to increase it. This seems like a serious problem for most brands of consequentialism, albeit a very unusual one.