What do I find conceptually puzzling?

There are lots of things that I don’t know, like, say, what the birth rate in Sweden is or what the effect of poverty on IQ is. There are also lots of things that I find really confusing and hard to understand, like quantum field theory and monetary policy. There’s also a special category of things that I find conceptually puzzling. These things aren’t difficult to grasp because the facts about them are difficult to understand or require learning complicated jargon. Instead, they’re difficult to grasp because I suspect that I’m confused about the concepts in use.

This is a much deeper level of confusion. It can’t be adjudicated by just reading lots of facts about the subject matter. It requires philosophical reflection on the nature of these concepts, which can sometimes leave me totally confused about everything and grasping for the solid ground of mere factual ignorance.

As such, it feels like a big deal when something I’ve been conceptually puzzled about becomes clear. I want to compile a list for future reference of things that I’m currently conceptually puzzled about and things that I’ve become un-puzzled about. (This is not a complete list, but I believe it touches on the major themes.)

Things I’m conceptually puzzled about

What is the relationship between consciousness and physics?

I’ve written about this here.

Essentially, at this point every available viewpoint on consciousness seems wrong to me.

Eliminativism amounts to a denial of pretty much the only thing that we can be sure can’t be denied – that we are having conscious experiences. Physicalism entails the claim that facts about conscious experience can be derived from laws of physics, which is wrong as a matter of logic.

Dualism entails that the laws of physics by themselves cannot account for the behavior of the matter in our brains, which is wrong. And epiphenomenalism entails that our beliefs about our own conscious experience are almost certainly wrong, and are no better representations of our actual conscious experiences than random chance.

How do we make sense of decision theory if we deny libertarian free will?

Written about this here and here.

Decision theory is ultimately about finding the decision D that maximizes expected utility EU(D). But to do this calculation, we have to decide what the set of possible decisions we are searching is.

EU confusion

Make this set too large, and you end up getting fantastical and impossible results (like that the optimal decision is to snap your fingers and make the world into a utopia). Make it too small, and you end up getting underwhelming results (in the extreme case, you just get that the optimal decision is to do exactly what you are going to do, since this is the only thing you can do in a strictly deterministic world).

We want to find a nice middle ground between these two – a boundary where we can say “inside here the things that are actually possible for us to do, and outside are those that are not.” But any principled distinction between what’s in the set and what’s not must be based on some conception of some actions being “truly possible” to us, and others being truly impossible. I don’t know how to make this distinction in the absence of a robust conception of libertarian free will.

Are there objectively right choices of priors?

I’ve written about this here.

If you say no, then there are no objectively right answers to questions like “What should I believe given the evidence I have?” And if you say yes, then you have to deal with thought experiments like the cube problem, where any choice of priors looks arbitrary and unjustifiable.

(If you are going to be handed a cube, and all you know is that it has a volume less than 1 cm3, then setting maximum entropy priors over volumes gives different answers than setting maximum entropy priors over side areas or side lengths. This means that what qualifies as “maximally uncertain” depends on whether we frame our reasoning in terms of side length, areas, or cube volume. Other approaches besides MaxEnt have similar problems of concept dependence.)

How should we deal with infinities in decision theory?

I wrote about this here, here, here, and here.

The basic problem is that expected utility theory does great at delivering reasonable answers when the rewards are finite, but becomes wacky when the rewards become infinite. There are a huge amount of examples of this. For instance, in the St. Petersburg paradox, you are given the option to play a game with an infinite expected payout, suggesting that you should buy in to the game no matter how high the cost. You end up making obviously irrational choices, such as spending $1,000,000 on the hope that a fair coin will land heads 20 times in a row. Variants of this involve the inability of EU theory to distinguish between obviously better and worse bets that have infinite expected value.

And Pascal’s mugging is an even worse case. Roughly speaking, a person comes up to you and threatens you with infinite torture if you don’t submit to them and give them 20 dollars. Now, the probability that this threat is credible is surely tiny. But it is non-zero! (as long as you don’t think it is literally logically impossible for this threat to come true)

An infinite penalty times a finite probability is still an infinite expected penalty. So we stand to gain an infinite expected utility by just handing over the 20 dollars. This seems ridiculous, but I don’t know any reasonable formalization of decision theory that allows me to refute it.

Is causality fundamental?

Causality has been nicely formalized by Pearl’s probabilistic graphical models. This is a simple extension of probability theory, out of which naturally falls causality and counterfactuals.

One can use this framework to represent the states of fundamental particles and how they change over time and interact with one another. What I’m confused about is that in some ways of looking at it, the causal relations appear to be useful but un-fundamental constructs for the sake of easing calculations. In other ways of looking at it, causal relations are necessarily built into the structure of the world, and we can go out and empirically discover them. I don’t know which is right. (Sorry for the vagueness in this one – it’s confusing enough to me that I have trouble even precisely phrasing the dilemma).

How should we deal with the apparent dependence of inductive reasoning upon our choices of concepts?

I’ve written about this here. Beyond just the problem of concept-dependence in our choices of priors, there’s also the problem presented by the grue/bleen thought experiment.

This thought experiment proposes two new concepts: grue (= the set of things that are either green before 2100 or blue after 2100) and bleen (the inverse of grue). It then shows that if we reasoned in terms of grue and bleen, standard induction would have us concluding that all emeralds will suddenly turn blue after 2100. (We repeatedly observed them being grue before 2100, so we should conclude that they will be grue after 2100.)

In other words, choose the wrong concepts and induction breaks down. This is really disturbing – choices of concepts should be merely pragmatic matters! They shouldn’t function as fatal epistemic handicaps. And given that they appear to, we need to develop some criterion we can use to determine what concepts are good and what concepts are bad.

The trouble with this is that the only proposals I’ve seen for such a criterion reference the idea of concepts that “carve reality at its joints”; in other words, the world is composed of green and blue things, not grue and bleen things, so we should use the former rather than the latter. But this relies on the outcome of our inductive process to draw conclusions about the starting step on which this outcome depends!

I don’t know how to cash out “good choices of concepts” without ultimately reasoning circularly. I also don’t even know how to make sense of the idea of concepts being better or worse for more than merely pragmatic reasons.

How should we reason about self defeating beliefs?

The classic self-defeating belief is “This statement is a lie.” If you believe it, then you are compelled to disbelieve it, eliminating the need to believe it in the first place. Broadly speaking, self-defeating beliefs are those that undermine the justifications for belief in them.

Here’s an example that might actually apply in the real world: Black holes glow. The process of emission is known as Hawking radiation. In principle, any configuration of particles with a mass less than the black hole can be emitted from it. Larger configurations are less likely to be emitted, but even configurations such as a human brain have a non-zero probability of being emitted. Henceforth, we will call such configurations black hole brains.

Now, imagine discovering some cosmological evidence that the era in which life can naturally arise on planets circling stars is finite, and that after this era there will be an infinite stretch of time during which all that exists are black holes and their radiation. In such a universe, the expected number of black hole brains produced is infinite (a tiny finite probability multiplied by an infinite stretch of time), while the expected number of “ordinary” brains produced is finite (assuming a finite spatial extent as well).

What this means is that discovering this cosmological evidence should give you an extremely strong boost in credence that you are a black hole brain. (Simply because most brains in your exact situation are black hole brains.) But most black hole brains have completely unreliable beliefs about their environment! They are produced by a stochastic process which cares nothing for producing brains with reliable beliefs. So if you believe that you are a black hole brain, then you should suddenly doubt all of your experiences and beliefs. In particular, you have no reason to think that the cosmological evidence you received was veridical at all!

I don’t know how to deal with this. It seems perfectly possible to find evidence for a scenario that suggests that we are black hole brains (I’d say that we have already found such evidence, multiple times). But then it seems we have no way to rationally respond to this evidence! In fact, if we do a naive application of Bayes’ theorem here, we find that the probability of receiving any evidence in support of black hole brains to be 0!

So we have a few options. First, we could rule out any possible skeptical scenarios like black hole brains, as well as anything that could provide any amount of evidence for them (no matter how tiny). Or we could accept the possibility of such scenarios but face paralysis upon actually encountering evidence for them! Both of these seem clearly wrong, but I don’t know what else to do.

How should we reason about our own existence and indexical statements in general?

This is called anthropic reasoning. I haven’t written about it on this blog, but expect future posts on it.

A thought experiment: imagine a murderous psychopath who has decided to go on an unusual rampage. He will start by abducting one random person. He rolls a pair of dice, and kills the person if they land snake eyes (1, 1). If not, he lets them free and hunts down ten new people. Once again, he rolls his pair of die. If he gets snake eyes he kills all ten. Otherwise he frees them and kidnaps 100 new people. On and on until he eventually gets snake eyes, at which point his murder spree ends.

Now, you wake up and find that you have been abducted. You don’t know how many others have been abducted alongside you. The murderer is about to roll the dice. What is your chance of survival?

Your first thought might be that your chance of death is just the chance of both dice landing 1: 1/36. But think instead about the proportion of all people that are ever abducted by him that end up dying. This value ends up being roughly 90%! So once you condition upon the information that you have been captured, you end up being much more worried about your survival chance.

But at the same time, it seems really wrong to be watching the two dice tumble and internally thinking that there is a 90% chance that they land snake eyes. It’s as if you’re imagining that there’s some weird anthropic “force” pushing the dice towards snake eyes. There’s way more to say about this, but I’ll leave it for future posts.

Things I’ve become un-puzzled about

Newcomb’s problem – one box or two box?

To almost everyone, it is perfectly clear and obvious what should be done. The difficulty is that these people seem to divide almost evenly on the problem, with large numbers thinking that the opposing half is just being silly.

– Nozick, 1969

I’ve spent months and months being hopelessly puzzled about Newcomb’s problem. I now am convinced that there’s an unambiguous right answer, which is to take the one box. I wrote up a dialogue here explaining the justification for this choice.

In a few words, you should one-box because one-boxing makes it nearly certain that the simulation of you run by the predictor also one-boxed, thus making it nearly certain that you will get 1 million dollars. The dependence between your action and the simulation is not an ordinary causal dependence, nor even a spurious correlation – it is a logical dependence arising from the shared input-output structure. It is the same type of dependence that exists in the clone prisoner dilemma, where you can defect or cooperate with an individual you are assured is identical to you in every single way. When you take into account this logical dependence (also called subjunctive dependence), the answer is unambiguous: one-boxing is the way to go.

Summing up:

Things I remain conceptually confused about:

  • Consciousness
  • Decision theory & free will
  • Objective priors
  • Infinities in decision theory
  • Fundamentality of causality
  • Dependence of induction on concept choice
  • Self-defeating beliefs
  • Anthropic reasoning

The Monty Hall non-paradox

I recently showed the famous Monty Hall problem to a friend. This friend solved the problem right away, and we realized quickly that the standard presentation of the problem is highly misleading.

Here’s the setup as it was originally described in the magazine column that made it famous:

Suppose you’re on a game show, and you’re given the choice of three doors: Behind one door is a car; behind the others, goats. You pick a door, say No. 1, and the host, who knows what’s behind the doors, opens another door, say No. 3, which has a goat. He then says to you, “Do you want to pick door No. 2?” Is it to your advantage to switch your choice?

I encourage you to think through this problem for yourself and come to an answer. Will provide some blank space so that you don’t accidentally read ahead.







Now, the writer of the column was Marilyn vos Savant, famous for having an impossible IQ of 228 according to an interpretation of a test that violated “almost every rule imaginable concerning the meaning of IQs” (psychologist Alan Kaufman). In her response to the problem, she declared that switching gives you a 2/3 chance of winning the car, as opposed to a 1/3 chance for staying. She argued by analogy:

Yes; you should switch. The first door has a 1/3 chance of winning, but the second door has a 2/3 chance. Here’s a good way to visualize what happened. Suppose there are a million doors, and you pick door #1. Then the host, who knows what’s behind the doors and will always avoid the one with the prize, opens them all except door #777,777. You’d switch to that door pretty fast, wouldn’t you?

Notice that this answer contains a crucial detail that is not contained in the statement of the problem! Namely, the answer adds the stipulation that the host “knows what’s behind the doors and will always avoid the one with the prize.”

The original statement of the problem in no way implies this general statement about the host’s behavior. All you are justified to assume in an initial reading of the problem are the observational facts that (1) the host happened to open door No. 3, and (2) this door happened to contain a goat.

When nearly a thousand PhDs wrote in to the magazine explaining that her answer was wrong, she gave further arguments that failed to reference the crucial point; that her answer was only true given additional unstated assumptions.

My original answer is correct. But first, let me explain why your answer is wrong. The winning odds of 1/3 on the first choice can’t go up to 1/2 just because the host opens a losing door. To illustrate this, let’s say we play a shell game. You look away, and I put a pea under one of three shells. Then I ask you to put your finger on a shell. The odds that your choice contains a pea are 1/3, agreed? Then I simply lift up an empty shell from the remaining other two. As I can (and will) do this regardless of what you’ve chosen, we’ve learned nothing to allow us to revise the odds on the shell under your finger.

Notice that this argument is literally just a restatement of the original problem. If one didn’t buy the conclusion initially, restating it in terms of peas and shells is unlikely to do the trick!

This problem was made even more famous by this scene in the movie “21”, in which the protagonist demonstrates his brilliance by coming to the same conclusion as vos Savant. While the problem is stated slightly better in this scene, enough ambiguity still exists that the proper response should be that the problem is underspecified, or perhaps a set of different answers for different sets of auxiliary assumptions.

The wiki page on this ‘paradox’ describes it as a veridical paradox, “because the correct choice (that one should switch doors) is so counterintuitive it can seem absurd, but is nevertheless demonstrably true.”

Later on the page, we see the following:

In her book The Power of Logical Thinking, vos Savant (1996, p. 15) quotes cognitive psychologist Massimo Piattelli-Palmarini as saying that “no other statistical puzzle comes so close to fooling all the people all the time,” and “even Nobel physicists systematically give the wrong answer, and that they insist on it, and they are ready to berate in print those who propose the right answer.”

There’s something to be said about adequacy reasoning here; when thousands of PhDs and some of the most brilliant mathematicians in the world are making the same point, perhaps we are too quick to write it off as “Wow, look at the strength of this cognitive bias! Thank goodness I’m bright enough to see past it.”

In fact, the source of all of the confusion is fairly easy to understand, and I can demonstrate it in a few lines.

Solution to the problem as presented

Initially, all three doors are equally likely to contain the car.
So Pr(1) = Pr(2) = Pr(3) = ⅓

We are interested in how these probabilities update upon the observation that 3 does not contain the car.
Pr(1 | ~3) = Pr(1)・Pr(~3 | 1) / Pr(~3)
= (⅓ ・1) / ⅔ = ½

By the same argument,
Pr(2 | ~3) = ½

Voila. There’s the simple solution to the problem as it is presented, with no additional presumptions about the host’s behavior. Accepting this argument requires only accepting three premises:

(1) Initially all doors are equally likely to be hiding the car.

(2) Bayes’ rule.

(3) There is only one car.

(3) implies that Pr(the car is not behind a door | the car is behind a different door) = 100%, which we use when we replace Pr(~3 | 1) with 1.

The answer we get is perfectly obvious; in the end all you know is that the car is either in door 1 or door 2, and that you picked door 1 initially. Since which door you initially picked has nothing to do with which door the car was behind, and the host’s decision gives you no information favoring door 1 over door 2, the probabilities should be evenly split between the two.

It is also the answer that all the PhDs gave.

Now, why does taking into account the host’s decision process change things? Simply because the host’s decision is now contingent on your decision, as well as the actual location of the car. Given that you initially opened door 1, the host is guaranteed to not open door 1 for you, and is also guaranteed to not open up a door hiding the car.

Solution with specified host behavior

Initially, all three doors are equally likely to contain the car.
So Pr(1) = Pr(2) = Pr(3) = ⅓

We update these probabilities upon the observation that 3 does not contain the car, using the likelihood formulation of Bayes’ rule.

Pr(1 | open 3) / Pr(2 | open 3)
= Pr(1) / Pr(2)・Pr(open 3 | 1) / Pr(open 3 | 2)
= ⅓ / ⅓・½ / 1 = ½

So Pr(1 | open 3) = ⅓ and Pr(2 | open 3) = ⅔

Pr(open 3 | 2) = 1, because the host has no choice of which door to open if you have selected door 1 and the car is behind door 2.

Pr(open 3 | 1) = ½, because the host has a choice of either opening 2 or 3.

In fact, it’s worth pointing out that this requires another behavioral assumption about the host that is nowhere stated in the original post, or Savant’s solution. This is that if there is a choice about which of two doors to open, the host will pick randomly.

This assumption is again not obviously correct from the outset; perhaps the host chooses the larger of the two door numbers in such cases, or the one closer to themselves, or the one or the smaller number with 25% probability. There are an infinity of possible strategies the host could be using, and this particular strategy must be explicitly stipulated to get the answer that Wiki proclaims to be correct.

It’s also worth pointing out that once these additional assumptions are made explicit, the ⅓ answer is fairly obvious and not much of a paradox. If you know that the host is guaranteed to choose a door with a goat behind it, and not one with a car, then of course their decision about which door to open gives you information. It gives you information because it would have been less likely in the world where the car was under door 1 than in the world where the car was under door 2.

In terms of causal diagrams, the second formulation of the Monty Hall problem makes your initial choice of door and the location of the car dependent upon one another. There is a path of causal dependency that goes forwards from your decision to the host’s decision, which is conditioned upon, and then backward from the host’s decision to which door the car is behind.

Any unintuitiveness in this version of the Monty Hall problem is ultimately due to the unintuitiveness of the effects of conditioning on a common effect of two variables.

Monty Hall Causal

In summary, there is no paradox behind the Monty Hall problem, because there is no single Monty Hall problem. There are two different problems, each containing different assumptions, and each with different answers. The answers to each problem are fairly clear after a little thought, and the only appearance of a paradox comes from apparent disagreements between individuals that are actually just talking about different problems. There is no great surprise when ambiguous wording turns out multiple plausible solutions, it’s just surprising that so many people see something deeper than mere ambiguity here.

Taxonomy of infinity catastrophes for expected utility theory

Basics of expected utility theory

I’ve talked quite a bit in past posts about the problems that infinities raise for expected utility theory. In this post, I want to systematically go through and discuss the different categories of problems.

First of all, let’s define expected utility theory.

Given an action A, we have a utility function U over the possible consequences
U = { U1, U2, U3, … UN }
and a credence distribution P over the consequences
P = { P1, P2, P3, … PN }.
We define the expected utility of A to be EU(A) = P1U1 + P2U2 + … + PNUN

Expected Utility Theory:
The rational action is that which maximizes expected utility.

Just to give an example of how this works out, suppose that we can choose between two actions A1 and A2, defined as follows:

Action A1
U1 = { 20, -10 }
P1 = { 50%, 50% }

Action A2
U2 = { 10, -20 }
P2 = { 80%, 20% }

We can compare the expected utilities of these two actions by using the above formula.

EU(A1) = 20∙50% + -10∙50% = 5
EU(A2) = 10∙80% + -20∙20% = 4

Since EU(A1) is greater than EU(A2), expected utility theory mandates that A1 is the rational act for us to take.

Expected utility theory seems to work out fine in the case of finite payouts, but becomes strange when we begin to introduce infinities. Before even talking about the different problems that arise, though, you might be tempted to brush off this issue, thinking that infinite payouts don’t really exist in the real world.

While this is a tenable position to hold, it is certainly not obviously correct. We can easily construct games that are actually do-able that have an infinite expected payout. For instance, a friend of mine runs the following procedure whenever it is getting late and he is trying to decide whether or not he should head home: First, he flips a coin. If it lands heads, he heads home. If tails, he waits one minute and then re-flips the coin. If it lands heads this time, he heads home. If tails, then he waits two minutes and re-flips the coin. On the next flip, if it lands tails, he waits four minutes. Then eight. And so on. The danger of this procedure is that on overage, he ends up staying out for an infinitely long period of time.

This is a more dangerous real-world application of the St. Petersburg Paradox (although you’ll be glad to know that he hasn’t yet been stuck hanging out with me for an infinite amount of time). We might object: Yes, in theory this has an infinite expected time. But we know that in practice, there will be some cap on the total possible time. Perhaps this cap corresponds to the limit of tolerance that my friend has before he gives up on the game. Or, more conclusively, there is certainly an upper limit in terms of his life span.

Are there any real infinities out there that could translate into infinite utilities? Once again, plausibly no. But it doesn’t seem impossible that such infinities could arise. For instance, even if we wanted to map utilities onto positive-valence experiences and believed that there was a theoretical upper limit on the amount of positivity you could possible experience in a finite amount of time, we could still appeal to the possibility of an eternity of happiness. If God appeared before you and offered you an eternity of existence in a Heaven, then you would presumably be considering an offer with a net utility of positive infinity. Maybe you think this is implausible (I certainly do), but it is at least a possibility that we could be confronted with real infinities in expected utility calculations.

Reassured that infinite utilities are probably not a priori ruled out, we can now ask: How does expected utility theory handle these scenarios?

The answer is: not well.

There are three general classes of failures:

  1. Failure of dominance arguments
  2. Undefined expected utilities
  3. Nonsensical expected utilities

Failure of dominance arguments

A dominance argument is an argument that says that if the expected utility of one action is greater than the expected utility of another, no matter what is the case.

Here’s an example. Consider two lotteries: Lottery 1 and Lottery 2. Each one decides on whether a player wins or not by looking at some fixed random event (say, whether or not a radioactive atom decays within a fixed amount of time T), but the reward for winning differs. If the radioactive atom does decay within time T, then you would get $100,000 from Lottery 1 and $200,000 from Lottery 2. If it does not, then you lose $200 dollars from Lottery 1 and lose $100 dollars from Lottery 2. Now imagine that you can choose only one of these two lotteries.

To summarize: If the atom decays, then Lottery 1 gives you $100,000 less than Lottery 2. And if the atom doesn’t decay, then Lottery 1 charges you $100 more than Lottery 2.

In other words, no matter what ends up happing, you are better off choosing Lottery 2 than Lottery 1. This means that Lottery 2 dominates Lottery 1 as a strategy. There is no possible configuration of the world in which you would have been better off by choosing Lottery 1 than you were by Lottery 2, so this choice is essentially risk-free.

So we have the following general principle, which seems to follow nicely from a simple application of expected utility theory:

Dominance: If action A1 dominates action A2, then it is irrational to choose A2 over A1.

Amazingly, this straightforward and apparently obvious rule ends up failing us when we start to talk about infinite payoffs.

Consider the following setup:

Action 1
U = { ∞, 0 }
P = { .5, .5 }

Action 2
U = { ∞, 10 }
P = { .5, .5 }

Action 2 weakly dominates Action 1. This means that no matter what consequence ends up obtaining, we always end up either better off or equally well off if we take Action 2 than Action 1. But when we calculate the expected utilities…

EU(Action 1) = .5 ∙ ∞ + .5 ∙ 0 = ∞
EU(Action 2) = .5 ∙ ∞ + .5 ∙ 10 = ∞

… we find that the two actions are apparently equal in utility, so we should have no preference between them.

This is pretty bizarre. Imagine the following scenario: God is about to appear in front of you and ship you off to Heaven for an eternity of happiness. In the few minutes before he arrives, you are able to enjoy a wonderfully delicious-looking Klondike bar if you so choose. Obviously the rational thing to do is to eat the Klondike bar, right? Apparently not, according to expected utility theory. The additional little burst of pleasure you get fades into irrelevance as soon as the infinities enter the calculation.

Not only do infinities make us indifferent between two actions, one of which dominates the other, but they can even make us end up choosing actions that are clearly dominated! My favorite example of this is one that I’ve talked about earlier, featuring a recently deceased Donald Trump sitting in Limbo negotiating with God.

To briefly rehash this thought experiment, every day Donald Trump is given an offer by God that he spend one day in Hell and in reward get two days in Heaven afterwards. Each day, the rational choice is for Trump to take the offer, spending one more day in Hell before being able to receive his reward. But since he accepts the offer every day, he ends up always delaying his payout in Heaven, and therefore spends all of eternity in Hell, thinking that he’s making a great deal.

We can think of Trump’s reason for accepting each day as a simple expected utility calculation: U(2 days in Heaven) + U(1 day in Hell) > 0. But iterating this decision an infinity of times ends up leaving Trump in the worst possible scenario – eternal torture.

Undefined expected utilities

Now suppose that you get the following deal from God: Either (Option 1) you die and stop existing (suppose this has utility 0 to you), or (Option 2) you die and continue existing in the afterlife forever. If you choose the afterlife, then your schedule will be arranged as follows: 1,000 days of pure bliss in heaven, then one day of misery in hell. Suppose that each day of bliss has finite positive value to you, and each day of misery has finite negative value to you, and that these two values perfectly cancel each other out (a day in Hell is as bad as a day in Heaven is good).

Which option should you take? It seems reasonable that Option 2 is preferable, as you get a thousand to one ratio of happiness to unhappiness for all of eternity.

Option 1: 💀, 💀, 💀, 💀, 
Option 2:
😇 x 1000, 😟, 😇 x 1000, 😟, …

Since U(💀) = 0, we can calculate the expected utility of Option 1 fine. But what about Option 2? The answer we get depends on the order in which we add up the utilities of each day. If we take the days in chronological order, than we get a total infinite positive utility. If we alternate between Heaven days and Hell days, then we get a total expected utility of zero. And if we add up in the order (Hell, Hell, Heaven, Hell, Hell, Heaven, …), then we end up getting an infinite negative expected utility.

In other words, the expected utility of Option 2 is undefined, giving us no guidance as to which we should prefer. Intuitively, we would want a rational theory of preference would tell us that Option 2 is preferable.

A slightly different example of this: Consider the following three lotteries:

Lottery 1
U = { ∞, -∞ }
P = { .5, .5 }

Lottery 2
U = { ∞, -∞ }
P = { .01, .99 }

Lottery 3
U = { ∞, -∞ }
P = { .99, .01 }

Lottery 1 corresponds to flipping a fair coin to determine whether you go to Heaven forever or Hell forever. Lottery 2 corresponds to picking a number between 1 and 100 to decide. And Lottery 3 corresponds to getting to pick 99 numbers between 1 and 100 to decide. It should be obvious that if you were in this situation, then you should prefer Lottery 3 over Lottery 1, and Lottery 1 over Lottery 2. But here, again, expected utility theory fails us. None of these lotteries have defined expected utilities, because ∞ – ∞ is not well defined.

Nonsensical expected utilities

A stranger approaches you and demands twenty bucks, on pain of an eternity of torture. What should you do?

Expected utility theory tells us that as long as we have some non-zero credence in this person’s threat being credible, then we should hand over the twenty bucks. After all, a small but nonzero probability multiplied by -∞ is still just -∞.

Should we have a non-zero credence in the threat being credible? Plausibly so. To have a zero credence in the threat’s credibility is to imagine that there is no possible evidence that could make it any more likely. It is true that no experience you could have would make the threat any more credible? What if he demonstrated incredible control over the universe?

In the end, we have an inconsistent triad.

  1. The rational thing to do is that which maximizes expected utility.
  2. There is a nonzero chance that the stranger threatening you with eternal torture is actually able to follow through on this threat.
  3. It is irrational to hand over the five dollars to the stranger.

This is a rephrasing of Pascal’s wager, but without the same problems as that thought experiment.

Pascal’s mugging

  • You should make decisions by evaluating the expected utilities of your various options and choosing the largest one.

This is a pretty standard and uncontroversial idea. There is room for controversy about how to fill in the details about how to evaluate expected utilities, but this basic premise is hard to argue against. So let’s argue against it!

Suppose that a stranger walks up to you in the street and says to you “I have been wired in from outside the simulation to give you the following message: If you don’t hand over five dollars to me right now, your simulator will teleport you to a dungeon and torture you for all eternity.” What should you do?

The obviously correct answer is that you should chuckle, continue on with your day, and laugh about the incident later on with your friends.

The answer you get from a simple application of decision theory is that as long as you aren’t absolutely, 100% sure that they are wrong, you should give them the five dollars. And you should definitely not be 100% sure. Why?

Suppose that the stranger says next: “I know that you’re probably skeptical about the whole simulation business, so here’s some evidence. Say any word that you please, and I will instantly reshape the clouds in the sky into that word.” You do so, and sure enough the clouds reshape themselves. Would this push your credences around a little? If so, then you didn’t start at 100%. Truly certain beliefs are those that can’t be budged by any evidence whatsoever. You can never update downwards on truly certain beliefs, by the definition of ‘truly certain’.

To go more extreme, just suppose that they demonstrate to you that they’re telling you the truth by teleporting you to a dungeon for five minutes of torture, and then bringing you back to your starting spot. If you would even slightly update your beliefs about their credibility in this scenario, then you had a non-zero credence in their credibility from the start.

And after all, this makes sense. You should only have complete confidence in the falsity of logical contradictions, and it’s not literally logically impossible that we are in a simulation, or that the simulator decides to mess with our heads in this bizarre way.

Okay, so you have a nonzero credence in their ability to do what they say they can do. And any nonzero credence, no matter how tiny, will result in the rational choice being to hand over the $5. After all, if expected utility is just calculated by summing up utilities weighted by probabilities, then you have something like the following:

EU(keep $5) – EU(give $5) = ε · U(infinite torture) – U(keep $5)
where ε = P(infinite torture | keep $5) – P(infinite torture | give $5)

As long as losing $5 isn’t infinitely bad to you, you should hand over the money. This seems like a problem, either for our intuitions or for decision theory.


So here are four propositions, and you must reject at least one of them:

  1. There is a nonzero chance of the stranger’s threat being credible.
  2. Infinite torture is infinitely worse than losing $5.
  3. The rational thing to do is that which maximizes expected utility.
  4. It is irrational to give the stranger $5.

I’ve already argued for (1), and (2) seems virtually definitional. So our choice is between (3) and (4). In other words, we either abandon the principle of maximizing expected utility as a guide to instrumental rationality, or we reject our intuitive confidence in the correctness of (4).

Maybe at this point you feel more willing to accept (4). After all, intuitions are just intuitions, and humans are known to be bad at reasoning about very small probabilities and very large numbers. Maybe it actually makes sense to hand over the $5.

But consider where this line of reasoning leads.

The exact same argument should lead you to give in to any demand that the stranger makes of you, as long as it doesn’t have a literal negative infinity utility value. So if the stranger tells you to hand over your car keys, to go dance around naked in a public square, or to commit heinous crimes… all of these behaviors would be apparently rationally mandated.

Maybe, maybe, you might be willing to bite the bullet and say that yes, these behaviors are all perfectly rational, because of the tiny chance that this stranger is telling the truth. I’d still be willing to bet that you wouldn’t actually behave in this self-professedly “rational” manner if I now made this threat to you.

Also, notice that this dilemma is almost identical to Pascal’s wager. If you buy the argument here, then you should also be doing all that you can to ensure that you stay out of Hell. If you’re queasy about the infinities and think decision theory shouldn’t be messing around with such things, then we can easily modify the problem.

Instead of “your simulator will teleport you to a dungeon and torture you for all eternity”, make it “your simulator will teleport you to a dungeon and torture you for 3↑↑↑↑3 years.” The negative utility of this is large enough as to outweigh any reasonable credence you could place in the credibility of the threat. And if it isn’t, we can just make the number of years even larger.

Maybe the probability of a given payout scales inversely with the size of the payout? But this seems fairly arbitrary. Is it really the case that the ability to torture you for 3↑↑↑↑3 years is twice as likely as the ability to torture you for 2 ∙ 3↑↑↑↑3 years? I can’t imagine why. It seems like the probability of these are going to be roughly equal – essentially, once you buy into the prospect of a simulator that is able to torture you for 3↑↑↑↑3 years, you’ve already basically bought into the prospect that they are able to torture you for twice that amount of time.

All we’re left with is to throw our hands up and say “I can’t explain why this argument is wrong, and I don’t know how decision theory has gone wrong here, but I just know that it’s wrong. There is no way that the actually rational thing to do is to allow myself to get mugged by anybody that has heard of Pascal’s wager.”

In other words, it seems like the correct response to Pascal’s mugging is to reject (3) and deny the expected-utility-maximizing approach to decision theory. The natural next question is: If expected utility maximization has failed us, then what should replace it? And how would it deal with Pascal’s mugging scenarios? I would love to see suggestions in the comments, but I suspect that this is a question that we are simply not advanced enough to satisfactorily answer yet.

Timeless decision theory and homogeneity

Something that seems difficult to me about timeless decision theory is how to reason in a world where most people are not TDTists. In such a world, it seems like the subjunctive dependence between you and others gets weaker and weaker the more TDT influences your decision process.

Suppose you are deciding whether or not to vote. You think through all of the standard arguments you know of: your single vote is virtually guaranteed to not swing the election, so the causal effect of your vote is essentially nothing; the cost to you of voting is tiny, or maybe even positive if you go with a friend and show off your “I Voted” sticker all day;  if you vote, you might be able to persuade others to vote as well; etc. At the end of your pondering, you decide that it’s overall not worth it to vote.

Now a TDTist pops up behind your shoulder and says to you: “Look, think about all the other people out there reasoning similarly to you. If you end up not voting as a result of this reasoning, then it’s pretty likely that they’ll all not vote as well. On the other hand, if you do end up voting, then they probably will vote too! So instead of treating your decision as if it only makes the world 1 vote different, you should treat it is if it influences all the votes of those sufficiently similar to you.”

 Maybe you instantly find this convincing, and decide to go to the voting booth right away. But the problem is that in taking into account this extra argument, you have radically reduced the set of people whose overall reasoning process is similar to you!

This set was initially everybody that had thought through all the similar arguments and felt similarly to you about them, and most of these people ended up not voting. But as soon as the TDTist popped up and presented their argument, the set of people that were subjunctively dependent upon you shrunk to just those in the initial set that had also heard this argument.

In a world in which only a single person ever had thought about subjunctive dependence, and this person was not going to vote before thinking about it, the evidential effect of not voting is basically zero. Given this, the argument would have no sway on them.

This seems like it would weaken the TDTist’s case that TDTists do better in real world problems. At the same time, it seems actually right. In a case where very few people follow the same reasoning processes as you, your decisions tell you very little about the decisions of others, for the same reason that a highly neuro-atypical person should be hesitant to generalize information about their brain to other people.

Another conclusion of this is that timeless decision theory is most powerful in a community where there is homogeneity of thought and information. Propagation of the idea of timeless decision theory would amplify the coordination-inducing power of the procedure.

I’m not sure if this implies that a TDTist is motivated to spread the idea and homogenize their society, as doing so increases subjunctive dependence and thus enhances their influence. I’d guess that they would only reason this way if they thought themselves to be above average in decision-making, or to have information that others don’t, so that the expected utility of them having increased decision-making ability would outweigh the costs of homogeneity.

Timeless ethics and Kant

The more I think about timeless decision theory, the more it seems obviously correct to me.

The key idea is that sometimes there is a certain type of non-causal logical dependency (called a subjunctive dependence) between agents that must be taken into account by those agents in order to make rational decisions. The class of cases in which subjunctive dependences become relevant involve agents in environments that contain other agents trying to predict their actions, and also environments that contain other agents that are similar to them.

Here’s my favorite motivating thought experiment for TDT: Imagine that you encounter a perfect clone of yourself. You have lived identical lives, and are structurally identical in every way. Now you are placed in a prisoner’s dilemma together. Should you cooperate or defect?

A non-TDTist might see no good reason to cooperate – after all, defecting dominates cooperation as a strategy, and your decision doesn’t affect your clone’s decision. If the two of you share no common cause explanation for your similarity, then this conclusion is even stronger – both evidential and causal decision theory would defect. So both you and your clone defect and you both walk away unhappy.

TDT is just the admission that there is an additional non-causal dependence between your decision and your clone’s decision that must be taken into account. This dependence comes from the fact that you and your clone have a shared input-output structure. That is, no matter what you end up doing, you know that your clone must do the same thing, because your clone is operating identically to you.

In a deterministic world, it is logically impossible that you choose to do X and your clone does Y. The initial conditions are the same, so the final conditions must be the same. So you end up cooperating, as does your clone, and everybody walks away happy.

With an imperfect clone, it is no longer logically impossible, but there still exists a subjunctive dependence between your actions and your clone’s.

This is a natural and necessary modification to decision theory. We take into account not only the causal effects of our actions, but the evidential effects of our actions. Even if your action does not causally affect a given outcome, it might still make it more or less likely, and subjunctive dependence is one of the ways that this can happen.

TDTists interacting with each other would get along really nicely. They wouldn’t fall victim to coordination problems, because they wouldn’t see their decisions as isolated and disconnected from the decisions of the others. They wouldn’t undercut each other in bargaining problems in which one side gets to make the deals and the other can only accept or reject.

In general, they would behave in a lot of ways that are standardly depicted as irrational (like one-boxing in Newcomb’s problem and cooperating in the prisoner’s dilemma), and end up much better off as a result. Such a society seems potentially much nicer and subject to fewer of the common failure modes of standard decision theory.

In particular, in a society in which it is common knowledge that everybody is a perfect TDTist, there can be strong subjunctive dependencies between the actions causally disconnected agents. If a TDTist is considering whether or not to vote for their preferred candidate, they aren’t comparing outcomes that differ by a single vote. They are considering outcomes that differ by the size of the entire class of individuals that would be reasoning similar to them.

In simple enough cases, this could mean that your decision about whether to vote is really a decision about if millions of people will vote or not. This may sound weird, but it follows from the exact same type of reasoning as in the clone prisoner’s dilemma.

Imagine that the society consisted entirely of 10 million exact clones of you, each deciding whether or not to vote. In such a world, each individual’s choice is perfectly subjunctively dependent upon every other individual’s choice. If one of them decides not to vote, then all of them decide not to vote.

In a more general case, perfect clones of you don’t exist in your environment. But in any given context, there is still a large class of individuals that reason similarly to you as a result of a similar input-output process.

For example, all humans are very similar in certain ways. If I notice that my blood is red, and I had previously never heard about or seen the blood color of anybody else, then I should now strongly update on the redness of the blood of other humans. This is obviously not because my blood being red causes others to have red blood. It is also not because of a common cause – in principle, any such cause could be screened off, and we would expect the same dependence to exist solely in virtue of the similarity of structure. We would expect red blood in alien whose evolutionary history has been entirely causally separated from ours but who by a wild coincidence has the same DNA structure as humans.

Our similarities in structure can be less salient to us when we think about our minds and the way we make decisions, but they still are there. If you notice that you have a strong inclination to decide to take action X, then this actually does serve as evidence that a large class of other people will take action X. The size of this class and the strength of this evidence depends on which particular X is being analyzed.

Ethics and TDT

It is natural to wonder: what sort of ethical systems naturally arise out of TDT?

We can turn a decision theory into an ethical framework by choosing a utility function that encodes the values associated with that ethical framework. The utility function for hedonic utilitarianism assigns utility according to the total well-being in the universe. The utility function for egoism assigns utility only to your own happiness, apathetic to the well-being of others.

Virtue ethics and deontological ethics are harder to encode. We could do the first by assigning utility to virtuous character traits and disutility to vices. The second could potentially be achieved by assigning negative infinities to violations of the moral rules.

Let’s brush aside the fact that some of these assignments are less feasible than others. Pretend that your favorite ethical system has a nice neat way of being formalized as a utility function. Now, the distinctive feature of TDT-based ethics is that when we are trying to decide on the most ethical course of action, TDT says that we must imagine that our decision would also be taken by anybody else that is sufficiently similar to you in a sufficiently similar context.

In other words, in contemplating what the right action to take is, you imagine a world in which these actions are universalized! This sounds very Kantian. One of his more famous descriptions of the categorical imperative was:

Act only according to that maxim whereby you can at the same time will that it should become a universal law.

This could be a tagline for ethics in a world of TDTists! The maxim for your action resembles the notion of similarity in motivation and situational context that generates subjunctive dependence, and the categorical imperative is the demand that you must take into account this subjunctive dependence if you are to reason consistently.

But actually, I think that the resemblance between Kantian ethical reasoning and timeless decision theory begins to fade away when you look closer. I’ll list three main points of difference:

  1. Consistency vs expected utility
  2. Maxim vs subjunctive dependence
  3. Differences in application

1. Consistency vs expected utility

Kantian universalizability is not about expected utility, it is about consistency. The categorical imperative forbids acts that, when universalized, become self-undermining. If an act is consistently universalizable, then it is not a violation of the categorical imperative, even if it ends up with everybody in horrible misery.

Timeless decision theory looks at a world in which everybody acts according to the same maxim that you are acting under, and then asks whether this world looks nice or not. “Looks nice” refers to your utility function, not any notion of consistency or non-self-underminingness (not a word, I know).

So this is the first major difference: A timeless ethical theorist cares ultimately about optimizing their moral values, not about making sure that their values are consistently applicable in the limit of universal instantiation. This puts TDT-based ethics closer to a rule-based consequentialism than to Kantian ethics, although this comparison is also flawed.

Is this a bug or a feature of TDT?

I’m tempted to say it’s a feature. My favorite example of why Kantian consistency is not a desirable meta-ethical principle is that if everybody were to give to charity, then all the problems that could be solved by giving to charity would be solved, and the opportunity to give to charity would disappear. So the act of giving to charity becomes self-undermining upon universalization.

To which I think the right response is: “So what?”

If a world in which everybody gives to charity is a world in which there are no more problems to be solved by charity-giving, then that sounds pretty great to me. If this consistency requirement prevents you from solving the problems you set out to solve, then it seems like a pretty useless requirement for ethical reasoning.

If your values can be encoded into an expected utility function, then the goal of your ethics should be to maximize that function. The antecedent of this conditional could be reasonably disputed, but I think the conditional as a whole is fairly unobjectionable.

2. Maxim versus subjunctive dependence

One of the most common retorts to Kant’s formulation of the categorical imperative rests on the ambiguity of the term ‘maxim’.

For Kant, your maxim is supposed to be the motivating principle behind your action. It can be thought of as a general rule that determines the contexts in which you would take this action.

If your action is to donate money, then your maxim might be to give 10% of your income to charity every year. If your action is to lie to your boss about why you missed work, then your maxim might be to be dishonest whenever doing otherwise will damage your career prospects.

Now, the maxim is the thing that is universalized, not the action itself. So you don’t suppose that everybody suddenly starts lying to their boss. Instead, you imagine that anybody in a situation where being honest would hurt their career prospects begins lying.

In this situation, Kant would argue that if nobody was honest in these situations, then their bosses would just assume dishonesty, in which case, the employees would never even get the chance to lie in the first place. This is self-undermining; hence, forbidden!

I actually like this line of reasoning a lot. Scott Alexander describes it as similar to the following rule:

Don’t do things that undermine the possibility to offer positive-sum bargains.

Coordination problems arise because individuals decide to defect from optimal equilibriums. If these defectors were reasoning from the Kantian principle of universalizability, they would realize that if everybody behaved similarly then the ability to defect might be undermined.

But the problem largely lies in how one specifies the maxim. For example, compare the following two maxims:

Maxim 1: Lie to your boss whenever being honest would hurt your career opportunities.

Maxim 2: Lie to your boss about why you missed work whenever the real reason is that you went on all-night bar-hopping marathon with your friends Jackie and Khloe and then stayed up all night watched Breaking Bad highlight clips on your Apple TV.

If Maxim 2 is the true motivating principle of your action, then it seems a lot less obvious that the action is a violation of the categorical imperative. If only people in this precisely specified context lied to their bosses, then bosses would overall probably not become less trusting of their employees (unless your boss knows an unusual amount about your personal life). So the maxim is not self-undermining under universalization, and is therefore not forbidden.

Under Maxim 1, lying is forbidden, and under Maxim 2, it is not. But what is the true maxim? There’s no clear answer to this question. Any given action can be truthfully described as arising from numerous different motivational schema, and in general these choices will result in a variety of different moral guidelines.

In TDT, the analog to the concept of a maxim is subjunctive dependence, and this can be defined fairly precisely, without ambiguity. Subjunctive dependence between agents in a given context is just the degree of evidence you get about the actions of an agent given information about the actions of the other agents in that context.

More precisely, it is the degree of non-causal dependence between the actions of agents. It essentially arises from the fact that in a lawful physical universe, similar initial conditions will result in similar final conditions. This can be worded as similarity in initial conditions, in input-output structure, in computational structure, or in logical structure, but the basic idea is the same.

Not only is this precisely defined, it is a real dependence. You don’t have to imagine a fictional universe in which your action makes it more likely that others will act similarly; the claim of TDT is that this is actually the case!

In this sense, TDT is rooted in a simple acknowledgement of dependencies that really do exist and that can be precisely defined, while Kant’s categorical imperative relies on the ambiguous notion of a maxim, as well as a seemingly arbitrary hypothetical consideration. One might be tempted to ask: “Who cares what would happen if hypothetically everybody else acted according to a similar maxim? We should care about is what will actually happen in the real world; we shouldn’t be basing our decisions off of absurd hypothetical worlds!”

3. Difference in application

These two theoretical reasons are fairly convincing to me that Kantianism and TDT ethics are only superficially similar, and are theoretically quite different. But there still remains a question of how much the actual “outputs” of the two frameworks converge. Do they just end up giving similar ethical advice?

I don’t think so. First, consider the issue I touched on previously. I said that defectors that paid attention to the categorical imperative would rethink their decision, because it is not universalizable. But this is not in general true.

If defectors are always free to defect, regardless of how many others defect as well, then defecting will still be universalizable! It is only in special cases that Kantians will not defect, like when a mob boss will come in and necessitate cooperation if enough people defect, or where universal defection depletes an expendable resource that would otherwise be renewable.

The set of coordination problems in which defecting automatically becomes impossible at a certain point are the easiest cases of coordination problems. It’s much harder to get individuals to coordinate if there is no mob boss to step in and set everybody right. These are the cases where Kantianism fails, and TDT succeeds.

TDTists with shared goals for whom cooperation would be more effective for achieving these goals would always cooperate, even if “each would individually be better off” if they defected. (I put scare quotes because you only come to this conclusion by ignoring important dependencies in the problem).

The key difference here comes down again to #1: timeless decision theorists maximize expected utility, not Kantian consistency.

In addition, TDTists don’t necessarily have Kantian hangups about using people as means to an end: if doing so ends up producing a higher expected utility than not, then they’ll go for it without hesitation.

A TDTist that can save two people’s lives by causing a little harm to one person would probably do it if their utility function was relatively impartial and placed a positive value on life. A Kantian would forbid this act.

(Why? Well, Kant thought that this principle of treating people as ends in themselves rather than means to an end was equivalent to the universalizability principle, and as far as I know, pretty much nobody was convinced by his argument for why this was the case. As such, a lot of Kantian ethics looks like it doesn’t actually follow from the universalizability principle.)

An application that might be similar for Kantian ethics and TDT ethics is the treatment of dishonesty and deception. Kant famously forbid any lying of any kind, regardless of the consequences, on the basis that universal lying would undermine the trust that is necessary to make lying a possibility.

One can imagine a similar case made for honesty in TDT ethics. In a society of TDTs, a decision to lie is a decision to produce a society that is overall less honest and less trusting. In situations where the individual benefits of dishonesty are zero-sum, only the negative effects of dishonesty are amplified. This could plausibly make dishonesty on the whole a net negative policy.

Hyperreal decision theory

I’ve recently been writing a lot about how infinities screw up decision theory and ethics. In his paper Infinite Ethics, Nick Bostrom talks about attempts to apply nonstandard analysis to try to address the problems of infinities. I’ll just briefly describe nonstandard analysis in this post, as well as the types of solutions that he sketches out.


So first of all, what is nonstandard analysis? It is a mathematical formalism that extends the ordinary number system to include infinitely large numbers and infinitely small numbers. It doesn’t do so just by adding the symbol ∞ to the set of real numbers ℝ. Instead, it adds an infinite amount of different infinitely large numbers, as well as an infinity of infinitesimal numbers, and proceeds to extend the ordinary operations of addition and multiplication to these numbers. The new number system is called the hyperreals.

So what actually are hyperreals? How does one do calculations with them?

A hyperreal number is an infinite sequence of real numbers. Here are some examples of them:

(3, 3, 3, 3, 3, …)
(1, 2, 3, 4, 5, …)
(1, ½, ⅓, ¼, ⅕, …)

It turns out that the first of these examples is just the hyperreal version of the ordinary number 3, the second is an infinitely larger hyperreal, and the third is an infinitesimal hyperreal. Weirded out yet? Don’t worry, we’ll explain how to make sense of this in a moment.

So every ordinary real number is associated with a hyperreal in the following way:

N = (N, N, N, N, N, …)

What if we just switch the first number in this sequence? For instance:

N’ = (1, N, N, N, N, …)

It turns out that this change doesn’t change the value of the hyperreal. In other words:

N = N’
(N, N, N, N, N, …) = (1, N, N, N, N, …)

In general, if you take any hyperreal number and change a finite amount of the numbers in its sequence, you end up with the same number you started with. So, for example,

3 = (3, 3, 3, 3, 3, …)
= (1, 5, 99, 3, 3, 3, …)
= (3, 3, 0, 0, 0, 0, 3, 3, …)

The general rule for when two hyper-reals are equal relies on the concept of a free ultrafilter, which is a little above the level that I want this post to be at. Intuitively, however, the idea is that for two hyperreals to be equal, the number of ways in which their sequences differ must be either finite or certain special kinds of infinities (I’ll leave this “special kinds” vague for exposition purposes).

Adding and multiplying hyperreals is super simple:

(a1, a2, a3, …) + (b1, b2, b3, …) = (a1 + b1, a2 + b2, a3 + b3, …)
(a1, a2, a3, …) · (b1, b2, b3, …) = (a1 · b1, a2 · b2, a3 · b3, …)

Here’s something that should be puzzling:

A = (0, 1, 0, 1, 0, 1, …)
B = (1, 0, 1, 0, 1, 0, …)
A · B = ?

Apparently, the answer is that A · B = 0. This means that at least one of A or B must also be 0. But both of them differ from 0 in an infinity of places! Subtleties like this are why we need to introduce the idea of a free ultrafilter, to allow certain types of equivalencies between infinitely differing sequences.

Anyway, let’s go on to the last property of hyperreals I’ll discuss:

(a1, a2, a3, …) < (b1, b2, b3, …)
an ≥ bn for only a finite amount of values of n

(This again has the same weird infinite exceptions as before, which we’ll ignore for now.)

Now at last we can see why (1, 2, 3, 4, …) is an infinite number:

Choose any real number N
N = (N, N, N, N, …)
ω = (1, 2, 3, 4, …)
So ωn ≤ Nn for n = 1, 2, 3, …, floor(N)
and ωn > Nn for all other n

This means that ω is larger than N, because there are only a finite amount of members of the sequence for which ωn is greater than ωn. And since this is true for any real number N, then ω must be larger than every real number! In other words, you can now give an answer to somebody who asks you to name a number that is bigger than every real number!

ε = (1, ½, ⅓, ¼, ⅕, …) is an infinitesimal hyperreal for a similar reason:

Choose any real number N > 0
N = (N, N, N, N, …)
ε = (1, ½, ⅓, ¼, ⅕, …)
So εn ≥ Nn for n = 1, 2, …, ceiling(1/N)
and εn < Nn for all other n

Once again, ε is only larger than N in a finite number of places, and is smaller in the other infinity. So ε is smaller than every real number greater than 0.

In addition, the sequence (0, 0, 0, …) is smaller than ω for every value of its sequence, so ε is larger than 0. A number that is smaller than every positive real and greater than 0 is an infinitesimal.

Okay, done introducing hyperreals! Let’s now see how this extended number system can help us with our decision theory problems.

Saint Petersburg Paradox

One standard example of weird infinities in decision theory is the St Petersburg Paradox, which I haven’t talked about yet on this blog. I’ll use this thought experiment as a template for the discussion. Briefly, then, imagine a game that works as follows:

Game 1

Round 1: Flip a coin.
If it lands H, then you get $2 and the game ends.
If it lands T, then you move on to Round 2.

Round 2: Flip the coin again.
If it lands H, then you get $4 and the game ends.
If T, then move on to Round 3.

Round 3: Flip a coin.
If it lands H, then you get $8 and the game ends.
If it lands T, then you move on to Round 4.

(et cetera to infinity)

This game looks pretty nice! You are guaranteed at least $2, and your payout doubles every time the coin lands H. The question is, how nice really is the game? What’s the maximum amount that you should be willing to pay in to play?

Here we run into a problem. To calculate this, we want to know what the expected value of the game is – how much you make on average. We do this by adding up the product of each outcome and the probability of that outcome:

EV = ½ · $2 + ¼ · $4 + ⅛ · $8 + …
= $1 + $1 + $1 + …
= ∞

Apparently, the expected payout of this game is infinite! This means that in order to make a profit, you should be willing to give literally all of your money in order to play just a single round of the game! This should seem wrong… If you pay $1,000,000 to play the game, then the only way that you make a profit is if the coin lands heads twenty times in a row. Does it really make sense to risk all of this money on such a tiny chance?

The response to this is that while the chance that this happens is of course tiny, the payout if it does happen is enormous – you stand to double, quadruple, octuple, (et cetera) your money. In this case, the paradox seems to really be a result of the failure of our brains to intuitively comprehend exponential growth.

There’s an even stronger reason to be unhappy with the St Petersburg Paradox. Say that instead of starting with a payout of $2 and doubling each time from there, you had started with a payout of $2000 and doubled from there.

Game 2

Round 1: Flip a coin.
If it lands H, then you get $2000 and the game ends.
If it lands T, then you move on to Round 2.

Round 2: Flip the coin again.
If it lands H, then you get $4000 and the game ends.
If T, then move on to Round 3.

Round 3: Flip a coin.
If it lands H, then you get $8000 and the game ends.
If it lands T, then you move on to Round 4.

(et cetera to infinity)

This alternative game must be better than the initial game – after all, no matter how many times the coin lands T before finally landing H, your payout is 1000 better than it would have been previously. So if you’re playing the first of the two games, then you should always wish that you were playing the second, no matter how many times the coin ends up landing T.

But the expected value comparison doesn’t grant you this! Both games have an infinite expected value, and infinity is infinity. We can’t have one infinity being larger than another infinity, right?

Enter the hyperreals! We’ll turn the expected value of the first game into a hyperreal as follows:

EV1 = ½ · $2 = $1
EV2 = ½ · $2 + ¼ · $4 = $1 + $1 = $2
EV3 = ½ · $2 + ¼ · $4 + ⅛ · $8 = $1 + $1 + $1 = $3

EV = (EV1, EV2, EV3, …)
= $(1, 2, 3, …)

Now we can compare it to the second game:

Game 1: $(1, 2, 3, …) = ω
Game 2: $(1000, 2000, 3000, …) = $1000 · ω

So hyperreals allow us to compare infinities, and justify why Game 2 has a 1000 times larger expected value than Game 1!

Let me give another nice result of this type of analysis. Imagine Game 1′, which is identical to Game 1 except for the first payout, which is $4 instead of $2. We can calculate the payouts:

Game 1: $(1, 2, 3, …) = ω
Game 1′: $(2, 3, 4, …) = $1 + ω

The result is that Game 1′ gives us an expected increase of just $1. And this makes perfect sense! After all, the only difference between the games is if they end in the first round, which happens with probability ½. And in this case, you get $4 instead of $2. The expected difference between the games should therefore be ½ · $2 = $1! Yay hyperreals!

Of course, this analysis still ends up concluding that the St Petersburg game does have an infinite expected payout. Personally, I’m (sorta) okay with biting this bullet and accepting that if your goal is to maximize money, then you should in principle give any arbitrary amount to play the game.

But what I really want to talk about are variants of the St Petersburg paradox where things get even crazier.

Getting freaky

For instance, suppose that instead of the initial game setup, we have the following setup:

Game 3

Round 1: Flip a coin.
If it lands H, then you get $2 and the game ends.
If it lands T, then you move on to Round 2.

Round 2: Flip the coin again.
If it lands H, then you pay $4 and the game ends.
If T, then move on to Round 3.

Round 3: Flip the coin again.
If it lands H, then you get $8 and the game ends.
If it lands T, then you move on to Round 4.

Round 4: Flip the coin again.
If it lands H, then you pay $16 and the game ends.
If it lands T, then you move on to Round 5.

(et cetera to infinity)

The only difference now is that if the coin lands H on any even round, then instead of getting money that round, you have to pay that money back to the dealer! Clearly this is a less fun game than the last one. How much less fun?

Here things get really weird. If we only looked at the odd rounds, then the expected value is ∞.

EV = ½ · $2 + ⅛ · $8 + …
= $1 + $1 + …
= ∞

But if we look at the odd rounds, then we get an expected value of -∞!

EV = ¼ · -$4 + 1/16 · -$16 + …
= -$1 + -$1 + …
= -∞

We find the total expected value by adding together these two. But can we add ∞ to -∞? Not with ordinary numbers! Let’s convert our numbers to hyperreals instead, and see what happens.

EV = $(1, -1, 1, -1, …)

This time, our result is a bit less intuitive than before. As a result of the ultrafilter business we’ve been avoiding talking about, we can use the following two equalities:

(1, -1, 1, -1, …) = 1
(-1, 1, -1, 1, …) = -1

This means that the expected value of Game 3 is $1. In addition, if Game 3 had started with you having to pay $2 for the first round rather than getting $2, then the expected value would be -$1.

So hyperreal decision theory recommends that you play the game, but only buy in if it costs you less than $1.

Now, the last thought experiment I’ll present is the weirdest of them.

Game 4

Round 1: Flip a coin.
If it lands H, then you pay $2 and the game ends.
If it lands T, then you move on to Round 2.

Round 2: Flip the coin again.
If it lands H, then you get $2 and the game ends.
If T, then move on to Round 3.

Round 3: Flip the coin again.
If it lands H, then you pay $2.67 and the game ends.
If it lands T, then you move on to Round 4.

Round 4: Flip the coin again.
If it lands H, then you get $4 and the game ends.
If it lands T, then you move on to Round 4.

Round 4: Flip the coin again.
If it lands H, then you pay $6.40 and the game ends.
If it lands T, then you move on to Round 4.

(et cetera to infinity)

The pattern is that the payoff on the nth round is (-2)n / n. From this, we see that the expected value of the nth round is 1/n. This sum converges as follows:

n=1 (-1)/ n = -ln(2) ≈ -.69

But by Cauchy’s rearrangement theorem, it turns out that by rearranging the terms of this sum, we can make it add up to any amount that we want! (this follows from the fact that the sum of the absolute values of the term is infinite)

This means that not only is the expected value for this game undefined, but it can be justified having every possible value. Not only do we not know the expected value of the game, but we don’t know whether it’s a positive game or a negative game. We can’t even figure out if it’s a finite game or an infinite game!

Let’s apply hyperreal numbers.

EV1 = -$1
EV2 = $(-1 + ½) = -$0.50
EV3 = $(-1 + ½ – ⅓) = -$0.83
EV4 = $(-1 + ½ – ⅓ + ¼) = -$0.58

So EV = $(-1.00, -0.50, -0.83, -0.58, …)

Since this series converges from above and below to -ln(2) ≈ -$0.69, the expected value is -$0.69 + ε, where ε is a particular infinitesimal number. So we get a precisely defined expectation value! One could imagine just empirically testing this value by running large numbers of simulations.

A weirdness about all of this is that the order in which you count up your expected value is extremely important. This is a general property of infinite summation, and seems like a requirement for consistent reasoning about infinities.

We’ve seen that hyperreal numbers can be helpful in providing a way to compare different infinities. But hyperreal numbers are only the first step into the weird realm of the infinite. The surreal number system is a generalization of the hyperreals that is much more powerful. In a future post, I’ll talk about the highly surreal decision theory that results from application of these numbers.

Infinite ethics

There are a whole bunch of ways in which infinities make decision theory go screwy. I’ve written about some of those ways here. This post is about a thought experiment in which infinities make ethics go screwy.

WaitButWhy has a great description of the thought experiment, and I recommend you check out the post. I’ll briefly describe it here anyway:

Imagine two worlds, World A and World B. Each is an infinite two-dimensional checkerboard, and on each square sits a conscious being that can either be very happy or very sad. At the birth of time, World A is entirely populated by happy beings, and World B entirely by sad beings.

From that moment forwards, World A gradually becomes infected with sadness in a growing bubble, while World B gradually becomes infected with happiness in a growing bubble. Both universes exist forever, so the bubble continues to grow forever.

Picture from WaitButWhy

The decision theory question is: if you could choose to be placed in one of these two worlds in a random square, which should you choose?

The ethical question is: which of the universes is morally preferable? Said another way: if you had to bring one of the two worlds into existence, which would you choose?

On spatial dominance

At every moment of time, World A contains an infinity of happiness and a finite amount of sadness. On the other hand, World B always contains an infinity of sadness and a finite amount of happiness.

This suggests the answer to both the ethical question and the decision theory question: World A is better. Ethically, it seems obvious that infinite happiness minus finite sadness is infinitely better than infinite sadness minus finite happiness. And rationally, given that there are always infinitely more people outside the bubble than inside, at any given moment in time you can be sure that you are on the outside.

A plot of the bubble radius over time in each world would look like this:

Infinite Ethics Plots

In this image, we can see that no matter what moment of time you’re looking at, World A dominates World B as a choice.

On temporal dominance

But there’s another argument.

Let’s look at a person at any given square on the checkerboard. In World A, they start out happy and stay that way for some finite amount of time. But eventually, they are caught by the expanding sadness bubble, and then stay sad forever. In World B, they start out sad for a finite amount of time, but eventually are caught by the expanding happiness bubble and are happy forever.

Plotted, this looks like:

Infinite Ethics Plots 2.png

So which do you prefer? Well, clearly it’s better to be sad for a finite amount of time and happy for an infinite amount of time than vice versa. And ethically, choosing World A amounts to dooming every individual to a lifetime of finite happiness and then infinite sadness, while World B is the reverse.

So no matter which position on the checkerboard you’re looking at, World B dominates World A as a choice!

An impossible synthesis

Let’s summarize: if you look at the spatial distribution for any given moment of time, you see that World A is infinitely preferable to World B. And if you look at the temporal distribution for any given position in space, you find that B is infinitely preferable to A.

Interestingly, I find that the spatial argument seems more compelling when considering the ethical question, while the temporal argument seems more compelling when considering the decision theory question. But both of these arguments apply equally well to both questions. For instance, if you are wondering which world you should choose to be in, then you can think forward to any arbitrary moment of time, and consider your chances of being happy vs being sad in that moment. This will get you the conclusion that you should go with World A, as for any moment in the future, you have a 100% chance of being one of the happy people as opposed to the sad people.

I wonder if the difference is that when we are thinking about decision theory, we are imagining ourselves in the world at a fixed location with time flowing past us, and it is less intuitive to think of ourselves at a fixed time and ask where we likely are.

Regardless, what do we do in the face of these competing arguments? One reasonable thing is to try to combine the two approaches. Instead of just looking at a fixed position for all time, or a fixed time over all space, we look at all space and all time, summing up total happiness moments and subtracting total sadness moments.

But now we have a problem… how do we evaluate this? What we have in both worlds is essentially a +∞ and a -∞ added together, and no clear procedure for how to make sense of this addition.

In fact, it’s worse than this. By cleverly choosing a way of adding up the total amount of the quantity happiness – sadness, we can make the result turn out however we want! For instance, we can reach the conclusion that World A results in a net +33 happiness – sadness by first counting up 33 happy moments, and then ever afterwords switching between counting a happy moment and a sad moment. This summation will eventually end up counting all the happy and sad moments, and will conclude that the total is +33.

But of course, there’s nothing special about +33; we could have chosen to reach any conclusion we wanted by just changing our procedure accordingly. This is unusual. It seems that both the total expected value and moral value are undefined for this problem.

The undefinability of the total happiness – sadness of this universe is a special case of the general rule that you can’t subtract infinity from infinity. This seems fairly harmless… maybe it keeps us from giving a satisfactory answer to this one thought experiment, but surely nothing like this could matter to real-world ethical or decision theory dilemmas?

Wrong! If in fact we live in an infinite universe, then we are faced with exactly this problem. If there are an infinite number of conscious experiencers out there, some suffering and some happy, then the total quantity of happiness – sadness in the universe is undefined! What’s more, a moral system that says that we ought to increase the total happiness of the universe will return an error if asked to evaluate what we ought to do in an in infinite universe!

If you think that you should do your part to make the universe a happier place, then you must have some notion of a total amount of happiness that can be increased. And if the total amount of happiness is unbounded, then there is no sensible way to increase it. This seems like a serious problem for most brands of consequentialism, albeit a very unusual one.

Free will and decision theory (Part 2)

In a previous post, I talked about something that has been confusing me regarding free will and decision theory. I want to revisit this topic and express a different way to frame the issue.

Here goes:

A decision theory is an algorithm used for calculating the expected utilities of the different possible actions you could take: EU(A). It returns a recommendation for you to take the action that maximizes expected utility: A* = argmax EU(A).

I have underlined the word that is the source of the confusion. The question is: how can we make sense of this notion of possible actions given determinism? If we take determinism very seriously, then the set of possible actions is a set with a single member, which is the action that you end up actually taking. There’s an intuitive sense of possibility at play here that looks benign enough, but upon closer examination becomes problematic.

For instance, we obviously want our set of actions to be restricted to some degree – we don’t want our decision theory telling us to snap our fingers and magically turn the world into a utopia. One seemingly clear line we could draw is to say that possibility here just means physical possibility. Actions that require us to exceed the speed of light, violate conservation of energy, or other such physical impossibilities are not allowed to be included in the set of possible actions.

But this is no solution at all! After all, if the physical laws are the things uniquely generating our actual actions, then all other actions must be violations! Determinism dictates that there can’t be multiple different answers to the question of “what happens next?”. We have an intuitive notion of physical possibility that includes things like “wave my hand through the air” and “take a nap”. But upon close examination, these seem to really just be the product of our ignorance of the true workings of the laws of nature. If we could deeply internalize the way that physics generates behaviors like hand-waving and napping, then we would be able to see why in a particular case hand-waving is possible (and thus happens), and why in other cases it cannot happen.

In other words, the claim I am making is that there is no clear distinction on the level of physics between the claim that I can jump to the moon and the claim that I could have waved my hand around in front of me even though I didn’t. The only difference between these two, it seems to me, is in terms of the intuitive obviousness of the impossibility of the first, and the lack of intuitive obviousness for the second.

Let’s say that eventually physicists reduce all of fundamental physics to a single principle, for example the Principle of Minimum Action. Then for any given action, either it is true that this action minimizes Action, or it is false. (sorry for the two meanings of the word ‘action’, it couldn’t be helped) If it is true, then the action is physically possible, and will in fact happen. And if it is false, then the action is physically impossible, and will not happen. We can explicitly lay out an explanation of why me jumping to the moon does not minimize Action, but it is much much much harder to lay out explicitly why me waving my hand in front of my face right now does not minimize Action. The key point is that the only difference here is an epistemic one – some actions are easier for us to diagnose as non-action-minimizing than others, but in reality, they either are or are not.

If this is all true, then physical possibility is hopeless as a source to ground a choice of the set of possible actions, and any formal decision theory will ultimately rest on an unreal distinction between possible and impossible actions. This distinction will not be represented in any real features of the physical world, and will be vulnerable to future discoveries or increases in computational power that expand our knowledge of the causal determinants of our actions.

Are there other notions of possibility that might be more fruitful for grounding the choice of the set of possible actions? I think not. Here’s a general argument for why not.

Ought implies can

(1) If you should do something, then you can do it.
(2) There is only a single thing that you can do.
(3) Therefore, there is at most a single thing that you should do.

This is an argument that I initially saw in the context of morality. I regarded it as a mere intellectual curiosity, fun to ponder but fairly unimportant (given that I didn’t expect much out of ethics in the first place).

But I think that the exact same argument applies for any theory of normative instrumental rationality. This is much more troubling to me! Unlike morality, I actually feel fairly strongly that there are objective facts about instrumental rationality – that is, facts about how an agent should act in order to optimize their values. (This is no longer an ethical should, but an epistemic one)

But I also feel strongly tempted to endorse both premises (1) and (2) with regard to this epistemic should, and want to reject the conclusion. Let’s lay out our options.

Reject (1): But then this means that there are some actions that it is true that you should do, even though you can’t do them. Do we really want a theory of instrumental rationality that tells us that the most rational course of action is one that we definitely cannot take? This seems obviously undesirable, for the same reason that the decision theory that says that the optimal action is to snap your fingers and turn the world into a utopia is undesirable. If this premise is not true of our decision theory, then we might sometimes have to accept that the action we should take is physically impossible, and what’s the use of a decision theory like that?

Reject (2): But this entails an abandonment of our best understanding of physical reality. Even in standard formulations of quantum mechanics, the wave function that describes the state of the universe evolves completely deterministically. (You might now wonder why quantum mechanics is always thought of as a fundamentally probabilistic theory, but this is definitely too big of a topic to go into here.) So it seems likely that this premise is just empirically correct.

Accept (3): But then our theory of rationality is useless, as it tells us nothing besides “Just do what you are going to do”!

This is the puzzle. Do you see any way out?

Paradoxes of infinite decision theory

(Two decision theory puzzles from this paper.)


Donald Trump has just arrived in Purgatory. God visits him and offers him the following deal. If he spends tomorrow in Hell, Donald will be allowed to spend the next two days in Heaven, before returning to Purgatory forever. Otherwise he will spend forever in Purgatory. Since Heaven is as pleasant as Hell is unpleasant, Donald accepts the deal. The next evening, as he runs out of Hell, God offers Donald another deal: if Donald spends another day in Hell, he’ll earn an additional two days in Heaven, for a total of four days in Heaven (the two days he already owed him, plus two new ones) before his return to Purgatory. Donald accepts for the same reason as before. In fact, every time he drags himself out of Hell, God offers him the same deal, and he accepts. Donald spends all of eternity writhing in agony in Hell. Where did he go wrong?

Satan’s Apple

Satan has cut a delicious apple into infinitely many pieces, labeled by the natural numbers. Eve may take whichever pieces she chooses. If she takes merely finitely many of the pieces, then she suffers no penalty. But if she takes infinitely many of the pieces, then she is expelled from the Garden for her greed. Either way, she gets to eat whatever pieces she has taken.

Eve’s first priority is to stay in the Garden. Her second priority is to eat as much apple as possible. She is deciding what to do when Satan speaks. “Eve, you should make your decision one piece at a time. Consider just piece #1. No matter what other pieces you end up taking and rejecting, you do better to take piece #1 than to reject it. For if you take only finitely many other pieces, then taking piece #1 will get you more apple without incurring the greed penalty. On the other hand, if you take infinitely many other pieces, then you will be ejected from the Garden whether or not you take piece #1. So in that case you might as well take it, so that you can console yourself with some additional apple as you are escorted out.”

Eve finds this reasonable, and decides provisionally to take piece #1. Satan continues, “By the way, the same reasoning holds for piece #2.” Eve agrees. “And piece #3, and piece #4 and…” Eve takes every piece, and is ejected from the Garden. Where did she go wrong?

The second thought experiment is sufficiently similar to the first one that I won’t say much about it – just included it in here because I like it.


Let’s assume that Trump is able to keep track of how many days he has been in hell, and can credibly pre-commit to strategies involving only accepting a fixed number of offers before rejecting. Now we can write out all possible strategies for sequences of responses that Trump could make:

Strategy 0 Accept none of the offers, and stay in Purgatory forever.

Strategy N Accept some finite number N of offers, after which you spend 2N days in Heaven and then infinity in Purgatory.

Strategy ∞ Accept all of the offers, and stay in Hell forever.

Assuming that a day in hell is exactly as bad as a day in heaven, Strategy 0 nets you 0 days in Heaven, Strategy N nets you N days in Heaven, and Strategy ∞ nets you ∞ days in Hell.

Obviously Strategy ∞ is the worst option (it is infinitely worse than all other strategies). And for every N, Strategy N is better than Strategy 0. So we have < 0 < N.

So we should choose Strategy N for some N. But which N? Obviously, for any choice of N, there will be arbitrarily better choices that you could have done. The problem is that there is no optimal choice of N. Any reasonable decision theory, when asked to optimize N for utility, is going to just return an error. It’s like asking somebody to tell you the largest integer. This is perhaps something that is difficult to come to terms with, but it is not paradoxical – there is no law of decision theory that every problem has a best solution.

But we still want to answer what we would do if we were in Trump’s shoes. If we actually have to pick an N, what should we do? I think the right answer is that there is no right answer for what we should do. We can say “x is better than y” for different strategies, but cannot say definitively the best answer… because there is no best answer.

One technique that I thought of, however, is the following (inspired by the Saint Petersburg Paradox):

On the first day, Trump should flip a coin. If it lands heads, then he chooses Strategy 1. If it lands tails, then he flips the coin again.

If on the next flip the coin lands heads, then he chooses Strategy 2. And if it lands tails, again he flips the coin.

If on this third flip the coin lands heads, then he chooses Strategy 4. And if not, then he flips again.

Et cetera to infinity.

With this decision strategy, we can calculate the expected number N that Trump will choose. This number is:

E[N] = ½・1 + ¼・2 + ⅛・4 + … = ∞

But at the same time, the coin will certainly eventually land heads, and the process will terminate. The probability that the coin lands tails an infinite number of times is zero! So by leveraging infinities in his favor, Trump gets an infinite positive expected value for days spent in heaven, and is guaranteed to not spend all eternity in Hell.

A weird question now arises: Why should Trump have started at Strategy 1? Or why multiply by 2 each time? Consider the following alternative decision process for the value of N:

On the first day, Trump should flip a coin. If it lands heads, then he chooses Strategy 1,000,000. If it lands tails, then he flips the coin again.

If on the next flip the coin lands heads, then he chooses Strategy 10,000,000. And if it lands tails, again he flips the coin.

If on this third flip the coin lands heads, then he chooses Strategy 100,000,000. And if not, then he flips again.

Et cetera to infinity.

This decision process seems obviously better than the previous one – the minimum number of days in heaven Trump nets is 1 million, which would only have previously happened if the coin had landed tails 20 times in a row. And the growth in number of net days in heaven per tail flip is 5x better than it was originally.

But now we have an analogous problem to the one we started with in choosing N. Any choice of starting strategy or growth rate seems suboptimal – there are always an infinity of arbitrarily better strategies.

At least here we have a way out: All such strategies are equivalent in that they net an infinite number of days. And none of these infinities are any larger than any others. So even if it intuitively seems like one decision process is better than another, on average both strategies will do equally well.

This is weird, and I’m not totally satisfied with it. But as far as I can tell, there isn’t a better alternative response.

Schelling fence

How could a strategy like Strategy N actually be instantiated? One potential way would be for Trump to set up a Schelling fence at a particular value of N. For example, Trump could pre-commit from the first day to only allowing himself to say yes 500 times, and after that saying no.

But there’s a problem with this – if Trump has any doubts about his ability to stick with his plan, and puts any credence in his breezing past Strategy N and staying in hell forever, then this will result in an infinite negative expected value of using a Schelling fence. In other words, use of a Schelling fence seems only advisable if you are 100% sure of your ability to credibly pre-commit.

Here’s an alternative strategy for instantiating Strategy N that smooths out this wrinkle: Each time Trump is given another offer by God, he accepts it with probability N/(N+1), and rejects it with probability 1/(N+1). By doing this, he will on average do Strategy N, but will sometimes do a different strategy M for an M that is close to N.

A harder variant would be if Trump’s memory is wiped clean after every day he spends in Hell, so that each day when he receives God’s offer, it is as if it is the first time. Even if Trump knows that his memory will be wiped clean on subsequent days, he now has a problem: he has no way to remember his Schelling fence, or to even know if he has reached it yet. And if he tries the probabilistic acceptance approach, he has no way to remember the value of N that he decided on.

But there’s still a way for him to get the infinite positive expected utility! He can do so by running a Saint Petersburg Paradox like above not just the first day, but every day! Every day he chooses a value of N using a process with an infinite expected value but a guaranteed finite actual value, and then probabilistically accepts/rejects the offer using this N.

Quick proof that this still ensures finitude: Suppose that he stays in Hell forever, never rejecting the offer. Since there is a finite chance that he selects N = 1, this means that he will select N = 1 an infinite number of times. For each of these times, he has a ½ chance of rejecting and ½ chance of accepting. Since this happens an infinite number of times, he is guaranteed to eventually reject an offer.

Question: what’s the expected number of days in Heaven for this new process? Infinite, just as before! But guaranteed finite. (There should be a name for these types of guaranteed-finite-but-infinite-expected-value quantities.)

Anyway, the conclusion of all of this? Infinite decision theory is really weird.