- You should make decisions by evaluating the expected utilities of your various options and choosing the largest one.

This is a pretty standard and uncontroversial idea. There is room for controversy about how to fill in the details about how to evaluate expected utilities, but this basic premise is hard to argue against. So let’s argue against it!

Suppose that a stranger walks up to you in the street and says to you “I have been wired in from outside the simulation to give you the following message: If you don’t hand over five dollars to me right now, your simulator will teleport you to a dungeon and torture you for all eternity.” What should you do?

The obviously correct answer is that you should chuckle, continue on with your day, and laugh about the incident later on with your friends.

The answer you get from a simple application of decision theory is that as long as you aren’t *absolutely,* *100%* sure that they are wrong, you should give them the five dollars. And you should *definitely* not be 100% sure. Why?

Suppose that the stranger says next: “I know that you’re probably skeptical about the whole simulation business, so here’s some evidence. Say any word that you please, and I will instantly reshape the clouds in the sky into that word.” You do so, and sure enough the clouds reshape themselves. Would this push your credences around a little? If so, then you didn’t start at 100%. Truly certain beliefs are those that can’t be budged by any evidence whatsoever. You can never update downwards on truly certain beliefs, by the *definition* of ‘truly certain’.

To go more extreme, just suppose that they demonstrate to you that they’re telling you the truth by teleporting you to a dungeon for five minutes of torture, and then bringing you back to your starting spot. If you would even slightly update your beliefs about their credibility in this scenario, then you had a non-zero credence in their credibility from the start.

And after all, this makes sense. You should only have complete confidence in the falsity of logical contradictions, and it’s not literally logically impossible that we are in a simulation, or that the simulator decides to mess with our heads in this bizarre way.

Okay, so you have a nonzero credence in their ability to do what they say they can do. And any nonzero credence, *no matter how tiny*, will result in the rational choice being to hand over the $5. After all, if expected utility is just calculated by summing up utilities weighted by probabilities, then you have something like the following:

EU(keep $5) – EU(give $5) = ε · U(infinite torture) – U(keep $5)

where ε = P(infinite torture | keep $5) – P(infinite torture | give $5)

As long as losing $5 isn’t infinitely bad to you, you should hand over the money. This seems like a problem, either for our intuitions or for decision theory.

*******

So here are four propositions, and you must reject at least one of them:

- There is a nonzero chance of the stranger’s threat being credible.
- Infinite torture is infinitely worse than losing $5.
- The rational thing to do is that which maximizes expected utility.
- It is irrational to give the stranger $5.

I’ve already argued for (1), and (2) seems virtually definitional. So our choice is between (3) and (4). In other words, we either abandon the principle of maximizing expected utility as a guide to instrumental rationality, or we reject our intuitive confidence in the correctness of (4).

Maybe at this point you feel more willing to accept (4). After all, intuitions are just intuitions, and humans are known to be bad at reasoning about very small probabilities and very large numbers. Maybe it *actually makes sense* to hand over the $5.

But consider where this line of reasoning leads.

The exact same argument should lead you to give in to *any* demand that the stranger makes of you, as long as it doesn’t have a literal negative infinity utility value. So if the stranger tells you to hand over your car keys, to go dance around naked in a public square, or to commit heinous crimes… all of these behaviors would be apparently rationally mandated.

Maybe, *maybe*, you might be willing to bite the bullet and say that yes, these behaviors are all perfectly rational, because of the tiny chance that this stranger is telling the truth. I’d still be willing to bet that you wouldn’t *actually* behave in this self-professedly “rational” manner if I now made this threat to you.

Also, notice that this dilemma is almost identical to Pascal’s wager. If you buy the argument here, then you should also be doing all that you can to ensure that you stay out of Hell. If you’re queasy about the infinities and think decision theory shouldn’t be messing around with such things, then we can easily modify the problem.

Instead of “your simulator will teleport you to a dungeon and torture you for all eternity”, make it “your simulator will teleport you to a dungeon and torture you for 3↑↑↑↑3 years.” The negative utility of this is large enough as to outweigh any reasonable credence you could place in the credibility of the threat. And if it isn’t, we can just make the number of years *even larger*.

Maybe the probability of a given payout scales inversely with the size of the payout? But this seems fairly arbitrary. Is it really the case that the ability to torture you for 3↑↑↑↑3 years is twice as likely as the ability to torture you for 2 ∙ 3↑↑↑↑3 years? I can’t imagine why. It seems like the probability of these are going to be roughly equal – essentially, once you buy into the prospect of a simulator that is able to torture you for 3↑↑↑↑3 years, you’ve *already* basically bought into the prospect that they are able to torture you for twice that amount of time.

All we’re left with is to throw our hands up and say “I can’t explain *why* this argument is wrong, and I don’t know *how* decision theory has gone wrong here, but I just *know* that it’s wrong. There is no way that the actually rational thing to do is to allow myself to get mugged by anybody that has heard of Pascal’s wager.”

In other words, it seems like the correct response to Pascal’s mugging is to reject (3) and deny the expected-utility-maximizing approach to decision theory. The natural next question is: If expected utility maximization has failed us, then what should replace it? And how would it deal with Pascal’s mugging scenarios? I would love to see suggestions in the comments, but I suspect that this is a question that we are simply not advanced enough to satisfactorily answer yet.

No, we don’t have to reject utility maximization, and I would argue there’s no paradox here. Were you going to give the stranger $5 before he started speaking? Of course not. Did you learn any new (substantial) information during the interaction? No. So why would your optimal strategy (ie. not giving him any money) change? People only feel like this is a paradox because they are inclined to believe the stranger without proof, but that is *not* a correct thing to do form a utilatarian perspective. Maybe the stranger is telling the truth, but it’s just as likely that he is telling the exact opposite: if you hand over the money THEN you get eternal torture. Since the probabilities of “he is telling the truth” and “the exact opposite of what he says is true” are equal, any utility calculation cancels out. Unless of course he gives you evidence, eg. he actually rearranges the clouds as you suggested. But then the paradox also disappears, because giving him the $5 doesn’t seem ridiculous anymore. Therefore the only interesting question is why we have an inclination to believe the stranger without proof, but that is a question of evolution, not math (I assume you’re familiar with The Selfish Gene – evolution doesn’t really care about the utility of the individual).

I haven’t assumed anywhere that you must “believe the stranger without proof”, and I think that talk of whether you believe the stranger is too binary to be useful. All we need for the paradox is that you have some non-zero credence, no matter how tiny, that the stranger’s threat is credible, i.e. that P(infinite torture | keep $5) > P(infinite torture | give $5).

“All we need for the paradox is that you have some non-zero credence” – this is exactly my point. Credence is given by proof, not claims; if the stranger doesn’t give proof, then the credence is zero. Not very tiny, but exactly zero. Therefore “P(infinite torture | keep $5) > P(infinite torture | give $5)” does not hold. To change this, not only does the stranger need to prove that I live in a simulation, and that some power has the means and intention to torture me, but also that said torture is linked to me giving him $5. If he fails to do so, there is no paradox because my expected utilities are unchanged. If he does show me proof, there is also no paradox because giving him $5 is clearly a reasonable thing to do.

“To change this, not only does the stranger need to prove that I live in a simulation, and that some power has the means and intention to torture me, but also that said torture is linked to me giving him.” There’s a lot of confusion indicated in this comment. I suggest that you familiarize yourself with the notion of Bayesian reasoning, it’s a helpful framework for thinking about these things: https://arbital.com/p/bayes_rule/ is a good starting point.

Why must “P(infinite torture | keep $5) > P(infinite torture | give $5)” be true? Although the chance of the stranger’s threat is nonzero, what if the stranger instead said that you would get tortured if you gave the money and you wouldn’t if you kept it? Then the inequality would be in reverse, because it would be the exact same situation, but with the actions of giving and keeping the $5 switched. But logically it shouldn’t matter what the stranger says; without evidence, it shouldn’t change your decisions. It just prompts you to think more deeply about the topic. There’s a nonzero chance that if you don’t punch the stranger in the face, then you’ll get tortured for eternity.

It’s like an objection some have to Pascal’s Wager. Maybe if you become a pious Christian, then you won’t go to Hell if God exists, but how do you know that it’s not true that it’s some other god that exists, and that god gets angrier and angrier at you each time you attend church?