Paradoxes of infinite decision theory

(Two decision theory puzzles from this paper.)

Trumped

Donald Trump has just arrived in Purgatory. God visits him and offers him the following deal. If he spends tomorrow in Hell, Donald will be allowed to spend the next two days in Heaven, before returning to Purgatory forever. Otherwise he will spend forever in Purgatory. Since Heaven is as pleasant as Hell is unpleasant, Donald accepts the deal. The next evening, as he runs out of Hell, God offers Donald another deal: if Donald spends another day in Hell, he’ll earn an additional two days in Heaven, for a total of four days in Heaven (the two days he already owed him, plus two new ones) before his return to Purgatory. Donald accepts for the same reason as before. In fact, every time he drags himself out of Hell, God offers him the same deal, and he accepts. Donald spends all of eternity writhing in agony in Hell. Where did he go wrong?

Satan’s Apple

Satan has cut a delicious apple into infinitely many pieces, labeled by the natural numbers. Eve may take whichever pieces she chooses. If she takes merely finitely many of the pieces, then she suffers no penalty. But if she takes infinitely many of the pieces, then she is expelled from the Garden for her greed. Either way, she gets to eat whatever pieces she has taken.

Eve’s first priority is to stay in the Garden. Her second priority is to eat as much apple as possible. She is deciding what to do when Satan speaks. “Eve, you should make your decision one piece at a time. Consider just piece #1. No matter what other pieces you end up taking and rejecting, you do better to take piece #1 than to reject it. For if you take only finitely many other pieces, then taking piece #1 will get you more apple without incurring the greed penalty. On the other hand, if you take infinitely many other pieces, then you will be ejected from the Garden whether or not you take piece #1. So in that case you might as well take it, so that you can console yourself with some additional apple as you are escorted out.”

Eve finds this reasonable, and decides provisionally to take piece #1. Satan continues, “By the way, the same reasoning holds for piece #2.” Eve agrees. “And piece #3, and piece #4 and…” Eve takes every piece, and is ejected from the Garden. Where did she go wrong?

The second thought experiment is sufficiently similar to the first one that I won’t say much about it – just included it in here because I like it.

Analysis

Let’s assume that Trump is able to keep track of how many days he has been in hell, and can credibly pre-commit to strategies involving only accepting a fixed number of offers before rejecting. Now we can write out all possible strategies for sequences of responses that Trump could make:

Strategy 0 Accept none of the offers, and stay in Purgatory forever.

Strategy N Accept some finite number N of offers, after which you spend 2N days in Heaven and then infinity in Purgatory.

Strategy ∞ Accept all of the offers, and stay in Hell forever.

Assuming that a day in hell is exactly as bad as a day in heaven, Strategy 0 nets you 0 days in Heaven, Strategy N nets you N days in Heaven, and Strategy ∞ nets you ∞ days in Hell.

Obviously Strategy ∞ is the worst option (it is infinitely worse than all other strategies). And for every N, Strategy N is better than Strategy 0. So we have < 0 < N.

So we should choose Strategy N for some N. But which N? Obviously, for any choice of N, there will be arbitrarily better choices that you could have done. The problem is that there is no optimal choice of N. Any reasonable decision theory, when asked to optimize N for utility, is going to just return an error. It’s like asking somebody to tell you the largest integer. This is perhaps something that is difficult to come to terms with, but it is not paradoxical – there is no law of decision theory that every problem has a best solution.

But we still want to answer what we would do if we were in Trump’s shoes. If we actually have to pick an N, what should we do? I think the right answer is that there is no right answer for what we should do. We can say “x is better than y” for different strategies, but cannot say definitively the best answer… because there is no best answer.

One technique that I thought of, however, is the following (inspired by the Saint Petersburg Paradox):

On the first day, Trump should flip a coin. If it lands heads, then he chooses Strategy 1. If it lands tails, then he flips the coin again.

If on the next flip the coin lands heads, then he chooses Strategy 2. And if it lands tails, again he flips the coin.

If on this third flip the coin lands heads, then he chooses Strategy 4. And if not, then he flips again.

Et cetera to infinity.

With this decision strategy, we can calculate the expected number N that Trump will choose. This number is:

E[N] = ½・1 + ¼・2 + ⅛・4 + … = ∞

But at the same time, the coin will certainly eventually land heads, and the process will terminate. The probability that the coin lands tails an infinite number of times is zero! So by leveraging infinities in his favor, Trump gets an infinite positive expected value for days spent in heaven, and is guaranteed to not spend all eternity in Hell.

A weird question now arises: Why should Trump have started at Strategy 1? Or why multiply by 2 each time? Consider the following alternative decision process for the value of N:

On the first day, Trump should flip a coin. If it lands heads, then he chooses Strategy 1,000,000. If it lands tails, then he flips the coin again.

If on the next flip the coin lands heads, then he chooses Strategy 10,000,000. And if it lands tails, again he flips the coin.

If on this third flip the coin lands heads, then he chooses Strategy 100,000,000. And if not, then he flips again.

Et cetera to infinity.

This decision process seems obviously better than the previous one – the minimum number of days in heaven Trump nets is 1 million, which would only have previously happened if the coin had landed tails 20 times in a row. And the growth in number of net days in heaven per tail flip is 5x better than it was originally.

But now we have an analogous problem to the one we started with in choosing N. Any choice of starting strategy or growth rate seems suboptimal – there are always an infinity of arbitrarily better strategies.

At least here we have a way out: All such strategies are equivalent in that they net an infinite number of days. And none of these infinities are any larger than any others. So even if it intuitively seems like one decision process is better than another, on average both strategies will do equally well.

This is weird, and I’m not totally satisfied with it. But as far as I can tell, there isn’t a better alternative response.

Schelling fence

How could a strategy like Strategy N actually be instantiated? One potential way would be for Trump to set up a Schelling fence at a particular value of N. For example, Trump could pre-commit from the first day to only allowing himself to say yes 500 times, and after that saying no.

But there’s a problem with this – if Trump has any doubts about his ability to stick with his plan, and puts any credence in his breezing past Strategy N and staying in hell forever, then this will result in an infinite negative expected value of using a Schelling fence. In other words, use of a Schelling fence seems only advisable if you are 100% sure of your ability to credibly pre-commit.

Here’s an alternative strategy for instantiating Strategy N that smooths out this wrinkle: Each time Trump is given another offer by God, he accepts it with probability N/(N+1), and rejects it with probability 1/(N+1). By doing this, he will on average do Strategy N, but will sometimes do a different strategy M for an M that is close to N.

A harder variant would be if Trump’s memory is wiped clean after every day he spends in Hell, so that each day when he receives God’s offer, it is as if it is the first time. Even if Trump knows that his memory will be wiped clean on subsequent days, he now has a problem: he has no way to remember his Schelling fence, or to even know if he has reached it yet. And if he tries the probabilistic acceptance approach, he has no way to remember the value of N that he decided on.

But there’s still a way for him to get the infinite positive expected utility! He can do so by running a Saint Petersburg Paradox like above not just the first day, but every day! Every day he chooses a value of N using a process with an infinite expected value but a guaranteed finite actual value, and then probabilistically accepts/rejects the offer using this N.

Quick proof that this still ensures finitude: Suppose that he stays in Hell forever, never rejecting the offer. Since there is a finite chance that he selects N = 1, this means that he will select N = 1 an infinite number of times. For each of these times, he has a ½ chance of rejecting and ½ chance of accepting. Since this happens an infinite number of times, he is guaranteed to eventually reject an offer.

Question: what’s the expected number of days in Heaven for this new process? Infinite, just as before! But guaranteed finite. (There should be a name for these types of guaranteed-finite-but-infinite-expected-value quantities.)

Anyway, the conclusion of all of this? Infinite decision theory is really weird.

Does fine-tuning give evidence for God?

I used to think that the fine-tuning argument was the strongest argument out there for the existence of a creator deity. I was especially impressed by the apparent magnitude of the fine-tuning – Steven Weinberg has stated that the value of the cosmological constant was fine-tuned to one part in 10120.

If one takes a naïve (and as we’ll see, incorrect) Bayesian approach to assessing this as evidence, then it looks like this should serve as an incredible amount of evidence for the existence of a God, enough to totally overwhelm all other possible considerations. Why? Because if there is a God, then we expect fine-tuning, while if not, then the fine-tuning looks incredibly unlikely. Given this, the God explanation should receive a credence bump proportional to 10120 upon updating on the observation of fine-tuning.

As a quick aside before diving into the numbers, there is a lot of dispute about whether or not there even is fine-tuning in our universe. For the purposes of this post, I’m going to ignore all of these disputes, and pretend that there is a strong consensus on this matter. I’ll use Weinberg’s estimate of 10-120 for the fine-tuning of the cosmological constant. I know that this is controversial, but the point I’m making will stand for even this insanely tiny value.

Okay, so let’s first present a formal version of the fine-tuning argument for God.

F = “The universe is fine-tuned for life.”
G = “A creator deity fine-tuned the universe for life.”

O(G | F) = L(F | G) · O(G)
L(F | G) = P(F | G) / P(F | ~G) ≈ 1 / 10-120 = 1200 dB

So O(G | F) = 10120 · O(G)

This uses the odds formulation of Bayes’ rule – look it up if you’re unfamiliar.

This argument says that your credences should be adjusted by a factor of 10120 upon observing the fine-tuning of the universe. In other words, to not be virtually certain that there exists a creator deity that rules the universe after updating on fine-tuning, you’d have to have initially had a credence on the order of 10-120.

Let me point out that 10-120 is a really really small number. It’s virtually impossible to imagine any good reason why you would be justified in having a prior credence on this order of magnitude. Nobody should be that sure about anything. Evidence of a strength of 1200 dB is analogous in strength to a noise that is a quadrillion times more intense than the threshold of human hearing.

So what’s wrong with this argument? In fact, it fails at the first step. In calculating the strength of the evidence, we only considered two possible hypotheses: either God or, if not, then random coincidence. But there are many other options that we have to factor in as well, most famously the multiverse hypothesis.

But even if there are other hypotheses out there, shouldn’t they just all share the benefit of the credence boost? The existence of another hypothesis that made the same prediction shouldn’t count as a penalty, right?

Wrong! Probabilities have to add up to 1, and you can’t have multiple mutually exclusive competing hypotheses that you have virtual certainty about. Whatever happens when you add other hypotheses must be more subtle than that. So let’s calculate using Bayes’ rule!

O(T | E) = L(E | T) · O(T)
L(E | T) = P(E | T) / P(E | ~T)

For each theory T we consider, we have to take into account all other theories in the denominator of our likelihood function L. We’ll want to keep in mind the following identity:

P(B & C) = 0
implies
P(A | B or C) = [P(A | B) P(B) + P(A | C) P(C)] / [P(B) + P(C)]

So, for instance, let’s divide up our explanations of the fine-tuning F into three disjoint categories: (1) random coincidence C, (2) a deistic God G, and (3) all other explanations that are mutually incompatible.

L(F | X) = P(F | X) / P(F | ~X)
= P(F | X) (1 – P(X)) / ∑Y≠PP(F | Y) P(Y)

P(F | C) ≈ 10-120
P(F | G) ≈ 1
P(F | O) ≈ 1

L(F | C) ≈ 10-120
L(F | G) ≈ P(~G) / P(O)
L(F | O) ≈ P(~O) / P(G)

O(C | F) = 10-120 · O(C)
O(G | F) = P(G) / P(O)
O(O | F) = P(O) / P(G)

In the end, what we find is that the “Coincidence” hypothesis has been down-voted completely out of existence, leaving only the “God” hypothesis and the “Other” hypothesis.

And importantly, our final credence in either of these hypotheses is not on the order of magnitude of 1 – 10-120. The final balance depends entirely just on the ratio of prior credences in the two explanations.

Trial Run

Let’s look at two individuals updating on the observation of fine-tuning.

Atheist
P(G) = .01%
P(O) = 50%

Deist
P(G) = 99%
P(O) = 1%

(The exact details of these numbers aren’t that important, just that they’re somewhat qualitatively accurate.) Their final credences will be:

Atheist
P(G | F) = 0.02%
P(O | F) = 99.98%

Deist
P(G | F) = 99%
P(O | F) = 1%

And we see that nobody ends up significantly updating their religious beliefs on the evidence of fine-tuning. The deist held a worldview in which the random coincidence hypothesis was already ruled out, so the observation of fine-tuning doesn’t change anything for them. And the atheists were initially fairly agnostic about whether or not the universe was fine-tuned, but were very confident in the existence of other explanations besides God. As such, the observation of fine-tuning served as a minor increase in their belief in God (+.01%), while they become extremely confident that there must be some other explanation.

Fine-tuning would only serve as strong evidence for you if you were initially very sure that there was a God, but agnostic about if God would have designed the universe to accommodate human life, or if its design was purely random coincidence. Even in this case, the bump in credence you’d receive would be nothing like the massive update that seems apparent from a naïve (and wrong) application of Bayesian reasoning.