- You should make decisions by evaluating the expected utilities of your various options and choosing the largest one.

This is a pretty standard and uncontroversial idea. There is room for controversy about how to fill in the details about how to evaluate expected utilities, but this basic premise is hard to argue against. So let’s argue against it!

Suppose that a stranger walks up to you in the street and says to you “I have been wired in from outside the simulation to give you the following message: If you don’t hand over five dollars to me right now, your simulator will teleport you to a dungeon and torture you for all eternity.” What should you do?

The obviously correct answer is that you should chuckle, continue on with your day, and laugh about the incident later on with your friends.

The answer you get from a simple application of decision theory is that as long as you aren’t *absolutely,* *100%* sure that they are wrong, you should give them the five dollars. And you should *definitely* not be 100% sure. Why?

Suppose that the stranger says next: “I know that you’re probably skeptical about the whole simulation business, so here’s some evidence. Say any word that you please, and I will instantly reshape the clouds in the sky into that word.” You do so, and sure enough the clouds reshape themselves. Would this push your credences around a little? If so, then you didn’t start at 100%. Truly certain beliefs are those that can’t be budged by any evidence whatsoever. You can never update downwards on truly certain beliefs, by the *definition* of ‘truly certain’.

To go more extreme, just suppose that they demonstrate to you that they’re telling you the truth by teleporting you to a dungeon for five minutes of torture, and then bringing you back to your starting spot. If you would even slightly update your beliefs about their credibility in this scenario, then you had a non-zero credence in their credibility from the start.

And after all, this makes sense. You should only have complete confidence in the falsity of logical contradictions, and it’s not literally logically impossible that we are in a simulation, or that the simulator decides to mess with our heads in this bizarre way.

Okay, so you have a nonzero credence in their ability to do what they say they can do. And any nonzero credence, *no matter how tiny*, will result in the rational choice being to hand over the $5. After all, if expected utility is just calculated by summing up utilities weighted by probabilities, then you have something like the following:

EU(keep $5) – EU(give $5) = ε · U(infinite torture) – U(keep $5)

where ε = P(infinite torture | keep $5) – P(infinite torture | give $5)

As long as losing $5 isn’t infinitely bad to you, you should hand over the money. This seems like a problem, either for our intuitions or for decision theory.

*******

So here are four propositions, and you must reject at least one of them:

- There is a nonzero chance of the stranger’s threat being credible.
- Infinite torture is infinitely worse than losing $5.
- The rational thing to do is that which maximizes expected utility.
- It is irrational to give the stranger $5.

I’ve already argued for (1), and (2) seems virtually definitional. So our choice is between (3) and (4). In other words, we either abandon the principle of maximizing expected utility as a guide to instrumental rationality, or we reject our intuitive confidence in the correctness of (4).

Maybe at this point you feel more willing to accept (4). After all, intuitions are just intuitions, and humans are known to be bad at reasoning about very small probabilities and very large numbers. Maybe it *actually makes sense* to hand over the $5.

But consider where this line of reasoning leads.

The exact same argument should lead you to give in to *any* demand that the stranger makes of you, as long as it doesn’t have a literal negative infinity utility value. So if the stranger tells you to hand over your car keys, to go dance around naked in a public square, or to commit heinous crimes… all of these behaviors would be apparently rationally mandated.

Maybe, *maybe*, you might be willing to bite the bullet and say that yes, these behaviors are all perfectly rational, because of the tiny chance that this stranger is telling the truth. I’d still be willing to bet that you wouldn’t *actually* behave in this self-professedly “rational” manner if I now made this threat to you.

Also, notice that this dilemma is almost identical to Pascal’s wager. If you buy the argument here, then you should also be doing all that you can to ensure that you stay out of Hell. If you’re queasy about the infinities and think decision theory shouldn’t be messing around with such things, then we can easily modify the problem.

Instead of “your simulator will teleport you to a dungeon and torture you for all eternity”, make it “your simulator will teleport you to a dungeon and torture you for 3↑↑↑↑3 years.” The negative utility of this is large enough as to outweigh any reasonable credence you could place in the credibility of the threat. And if it isn’t, we can just make the number of years *even larger*.

Maybe the probability of a given payout scales inversely with the size of the payout? But this seems fairly arbitrary. Is it really the case that the ability to torture you for 3↑↑↑↑3 years is twice as likely as the ability to torture you for 2 ∙ 3↑↑↑↑3 years? I can’t imagine why. It seems like the probability of these are going to be roughly equal – essentially, once you buy into the prospect of a simulator that is able to torture you for 3↑↑↑↑3 years, you’ve *already* basically bought into the prospect that they are able to torture you for twice that amount of time.

All we’re left with is to throw our hands up and say “I can’t explain *why* this argument is wrong, and I don’t know *how* decision theory has gone wrong here, but I just *know* that it’s wrong. There is no way that the actually rational thing to do is to allow myself to get mugged by anybody that has heard of Pascal’s wager.”

In other words, it seems like the correct response to Pascal’s mugging is to reject (3) and deny the expected-utility-maximizing approach to decision theory. The natural next question is: If expected utility maximization has failed us, then what should replace it? And how would it deal with Pascal’s mugging scenarios? I would love to see suggestions in the comments, but I suspect that this is a question that we are simply not advanced enough to satisfactorily answer yet.

No, we don’t have to reject utility maximization, and I would argue there’s no paradox here. Were you going to give the stranger $5 before he started speaking? Of course not. Did you learn any new (substantial) information during the interaction? No. So why would your optimal strategy (ie. not giving him any money) change? People only feel like this is a paradox because they are inclined to believe the stranger without proof, but that is *not* a correct thing to do form a utilatarian perspective. Maybe the stranger is telling the truth, but it’s just as likely that he is telling the exact opposite: if you hand over the money THEN you get eternal torture. Since the probabilities of “he is telling the truth” and “the exact opposite of what he says is true” are equal, any utility calculation cancels out. Unless of course he gives you evidence, eg. he actually rearranges the clouds as you suggested. But then the paradox also disappears, because giving him the $5 doesn’t seem ridiculous anymore. Therefore the only interesting question is why we have an inclination to believe the stranger without proof, but that is a question of evolution, not math (I assume you’re familiar with The Selfish Gene – evolution doesn’t really care about the utility of the individual).

I haven’t assumed anywhere that you must “believe the stranger without proof”, and I think that talk of whether you believe the stranger is too binary to be useful. All we need for the paradox is that you have some non-zero credence, no matter how tiny, that the stranger’s threat is credible, i.e. that P(infinite torture | keep $5) > P(infinite torture | give $5).

“All we need for the paradox is that you have some non-zero credence” – this is exactly my point. Credence is given by proof, not claims; if the stranger doesn’t give proof, then the credence is zero. Not very tiny, but exactly zero. Therefore “P(infinite torture | keep $5) > P(infinite torture | give $5)” does not hold. To change this, not only does the stranger need to prove that I live in a simulation, and that some power has the means and intention to torture me, but also that said torture is linked to me giving him $5. If he fails to do so, there is no paradox because my expected utilities are unchanged. If he does show me proof, there is also no paradox because giving him $5 is clearly a reasonable thing to do.

“To change this, not only does the stranger need to prove that I live in a simulation, and that some power has the means and intention to torture me, but also that said torture is linked to me giving him.” There’s a lot of confusion indicated in this comment. I suggest that you familiarize yourself with the notion of Bayesian reasoning, it’s a helpful framework for thinking about these things: https://arbital.com/p/bayes_rule/ is a good starting point.

Why must “P(infinite torture | keep $5) > P(infinite torture | give $5)” be true? Although the chance of the stranger’s threat is nonzero, what if the stranger instead said that you would get tortured if you gave the money and you wouldn’t if you kept it? Then the inequality would be in reverse, because it would be the exact same situation, but with the actions of giving and keeping the $5 switched. But logically it shouldn’t matter what the stranger says; without evidence, it shouldn’t change your decisions. It just prompts you to think more deeply about the topic. There’s a nonzero chance that if you don’t punch the stranger in the face, then you’ll get tortured for eternity.

It’s like an objection some have to Pascal’s Wager. Maybe if you become a pious Christian, then you won’t go to Hell if God exists, but how do you know that it’s not true that it’s some other god that exists, and that god gets angrier and angrier at you each time you attend church?

> “Maybe the probability of a given payout scales inversely with the size of the payout? But this seems fairly arbitrary.”

I would argue that this assumption is wrong. Why? Because it takes a lot of resources and will to torture you for any amount of time and the larger the amount of time it is, the smaller the probability that someone has the resources or the will to do it.

I will make an analogy with money, as it is closely related to power. Let’s say a stranger comes to you and gives you $5 and leaves. What is the probability of this happening to you in your lifetime? Maybe 1% (probably much less)? People do get generous and $5 is basically an amount of money that is available to anyone so we can safely say that it is possible and somewhat likely. Let’s increase this number to $500,000. Now this seems much less likely. A stranger approaching you and giving you $500,000 in cash seems like it might have happened a couple of times in history and maybe it hasn’t happened even once. Let’s go further. A stranger approaches you and gives you $500 trillion? This is impossible given that no-one has this kind of wealth (at least as far as I am aware of). But even if someone had that amount of money, why would he give it to you? I could imagine the stranger giving you $5, but $500 trillion? I think the chances are slim to none. I will admit that the chances are not 0, but the more we increase the amount, the less of a chance there is that this happens to you. I would even argue that the chance of an event where you get X amount of money from a complete stranger decreases much quicker than linearly with X (probably like the factorial of X).

If we now return to the stranger mugging you, we have the same situation as with a stranger giving you money. The larger the punishment that he is threatening, the less is the chance of the threat being real. In order for someone to be able to threaten you with a punishment in a dungeon for 3 \uparrow 3 units of time, that stranger would have to have a lot of power and knowledge about the universe and a really strong will to put in that many resources into torturing you. He would have to have knowledge about teleportation, human immortality, about how to make the universe not go into heat death, and of course have enough energy to do the torturing. The chance of the stranger being able and willing to punish you is ridiculously small. So small in fact, that when you multiply the chance by your expected utility, you get a ridiculously small number, close to zero and much less than the utility of your $5.

In mathematical terms, if we think of the utility of the torture as being a function of the amount of torture x, let’s call it A(x). The probability of the torture happening is also a function of x, let’s say B(x). The limit of A(x)*B(x) as x tends to infinity is 0.

The most important part of my argument is of course the question of why would the probability of the torture be so much smaller than the negative utility of the torture? The answer is that as an entity becomes more powerful, the possible configurations of the universe that it could achieve become larger. A bacteria has only a small leverage to influence the universe and a small number of actions it can take. This number is still huge, but much smaller than an average human being or a supernatural entity that could torture you forever. Such an entity would have immense power to shape the universe in different ways and it torturing you for an eternity is just a speck in the probability spectrum. The number of different configurations of the universe that an entity can successfully access scales close to the factorial of its power (f(x)~x!), while the power itself scales linearly with power (f(x)~x). If we combine this, we get that the limit of A(x)*B(x) is lim (x->inf) x * (1 / x!) = 0. The expected utility is thus 0 – $5 if we accept and 0 – 0 = 0 if we refuse.

We can also think about the utility of the stranger and this is a whole other point which I will not explain in much detail. Why would he torture you for an infinite amount of time just to gain $5? Is it rational on his behalf to do this? If we know it is irrational for the stranger to spend so much of his energy and resources just to gain $5, we should probably feel safe rejecting his offer. Furthermore, a stranger with this amount of power and knowledge of the universe would probably be able to read your mind and already know your answer so there is no need for him to even pose the question.

> “Maybe the probability of a given payout scales inversely with the size of the payout? But this seems fairly arbitrary.”

I would argue that this assumption is wrong. Why? Because it takes a lot of resources and will to torture you for any amount of time and the larger the amount of time it is, the smaller the probability that someone has the resources or the will to do it.

I will make an analogy with money, as it is closely related to power. Let’s say a stranger comes to you and gives you $5 and leaves. What is the probability of this happening to you in your lifetime? Maybe 1% (probably much less)? People do get generous and $5 is basically an amount of money that is available to anyone so we can safely say that it is possible. Let’s increase this number to $500,000. Now this seems much less likely. A stranger approaching you and giving you $500,000 in cash seems like it might have happened a couple of times in history and maybe it hasn’t happened even once. Let’s go further. A stranger approaches you and gives you $500 trillion? This is impossible given that no-one has this kind of wealth (at least as far as I am aware of). But even if someone had that amount of money, why would he give it to you? I could imagine the stranger giving you $5, but $500 trillion? I think the chances are slim to none. I will admit that the chances are not 0, but the more we increase the amount, the less of a chance there is that this happens to you. I would even argue that the chance of an event where you get X amount of money from a complete stranger decreases much quicker than linearly with X (probably like the factorial of X).

If we now return to the stranger mugging you, we have the same situation as with a stranger giving you money. The larger the punishment that he is threatening, the less is the chance of the threat being real. In order for someone to be able to threaten you with a punishment in a dungeon for 3 \uparrow 3 units of time, that stranger would have to have a lot of power and knowledge about the universe and a really strong will to put in that many resources into torturing you. He would have to have knowledge about teleportation, human immortality, about how to make the universe not go into heat death, and of course have enough energy to do the torturing. The chance of the stranger being able and willing to punish you is ridiculously small. So small in fact, that when you multiply the chance by your expected utility, you get a ridiculously small number, close to zero and much less than the utility of your $5.

In mathematical terms, if we think of the utility of the torture as being a function of the amount of torture x, let’s call it A(x). The probability of the torture happening is also a function of x, let’s say B(x). The limit of A(x)*B(x) as x tends to infinity is 0.

The most important part of my argument is of course the question of why would the probability of the torture be so much smaller than the negative utility of the torture? The answer is that as an entity becomes more powerful, the possible configurations of the universe that it could achieve become larger. A bacteria has only a small leverage to influence the universe and a small number of actions it can take. This number is still large, but much smaller than an average human being or a supernatural entity that could torture you forever. Such an entity would have immense power to shape the universe in different ways and it torturing you for an eternity is just a speck in the probability spectrum. The number of different configurations of the universe that an entity can successfully achieve scales like the factorial with the amount of power it has (f(x)~x!), while power scales linearly (by definition) with the amount of power. Since the negative utility of the torture requires the entity to have sufficient power, we can assume that the negative utility also scales linearly with power (f(x)~x). So, the expected utility of the threat becomes close to lim (x->inf) x * (1 / x!), which is 0. So your expected utility of not giving the money is 0 – 0 = 0 and the expected utility of giving the money is 0 – $5 = -$5.

We can also think about the utility of the stranger and this is a whole other point which I will not explain in much detail. Why would he torture you for an infinite amount of time just to gain $5? Is it rational on his behalf to do this? If we know it is irrational for the stranger to spend so much of his energy and resources just to gain $5, we should probably feel safe rejecting his offer. Furthermore, a stranger with this amount of power and knowledge of the universe would probably be able to read your mind and already know your answer so there is no need for him to even pose the question.