A failure of Bayesian reasoning

Bayesianism does great when the true model of the world is included in the set of models that we are using. It can run into issues, however, when the true model starts with zero prior probability.

We’d hope that even in these cases, the Bayesian agent ends up doing as well as possible, given their limitations. But this paper presents lots of examples of how a Bayesian thinker can go horribly wrong as a result of accidentally excluding the true model from the set of models they are considering. I’ll present one such example here.

Here’s the setup: A fair coin is being repeatedly flipped, while being watched by a Bayesian agent that wants to predict the bias in the coin. This agent starts off with the correct credence distribution over outcomes: they have a 50% credence in it landing heads and a 50% chance of it landing tails.

However, this agent only has two theories available to them:

T1: The coin lands heads 80% of the time.
T2: The coin lands heads 20% of the time.

Even though the Bayesian doesn’t have access to the true model of reality, they are still able to correctly forecast a 50% chance of the coin landing heads by evenly splitting their credences in these two theories. Given this, we’d hope that they wouldn’t be too handicapped and in the long run would be able to do pretty well at predicting the next flip.

Here’s the punchline, before diving into the math: The Bayesian doesn’t do this. In fact, their behavior becomes more and more unreasonable the more evidence they get.

They end up spending almost all of their time being virtually certain that the coin is biased, and occasionally flip-flopping in their belief about the direction of the bias. As a result of this, their forecast will almost always be very wrong. Not only will it fail to converge to a realistic forecast, but in fact, it will get further and further away from the true value on average. And remember, this is the case even though convergence is possible!

Alright, so let’s see why this is true.

First of all, our agent starts out thinking that T1 and T2 are equally likely. This gives them an initially correct forecast:

P(T1) = 50%
P(T2) = 50%

P(H) = P(H | T1) · P(T1) + P(H | T2) · P(T2)
= 80% · 50% + 20% · 50% = 50%

So even though the Bayesian doesn’t have the correct model in their model set, they are able to distribute their credences in a way that will produce the correct forecast. If they’re smart enough, then they should just stay near this distribution of credences in the long run, and in the limit of infinite evidence converge to it. So do they?

Nope! If they observe n heads and m tails, then their likelihood ratios end up moving exponentially with nm. This means that the credences will almost certainly end up very highly uneven.

In what follows, I’ll write the difference in the number of heads and the number of tails as z.

z = n – m

P(n, m | T1) = .8.2m
P(n, m | T2) = .2n .8m

L(n, m | T1) = 4z
L(n, m | T2) = 1/4z

P(T1 | n, m) = 4z / (4z + 1)
P(T2 | n, m) = 1 / (4z + 1)

Notice that the final credences only depend on z. It doesn’t matter if you’ve done 100 trials or 1 trillion, all that matters is how many more heads than tails there are.

Also notice that the final credences are exponential in z. This means that for positive z, P(T1 | n, m) goes to 100% exponentially quickly, and vice versa.



1 2 3 4 5


P(T1|z) 50% 80% 94.12% 98.46% 99.61% 99.90% 99.97%
P(T2|z) 50% 20% 5.88% 1.54% 0.39% .10% 0.03%

The Bayesian agent is almost always virtually certain in the truth of one of their two theories. But which theory they think is true is constantly flip-flopping, resulting in a belief system that is vacillating helplessly between two suboptimal extremes. This is clearly really undesirable behavior for a supposed model of epistemic rationality.

In addition, as the number of coin tosses increases, it becomes less and less likely that z is exactly 0. At N tosses, the average value of z is √N. This means that the more evidence they receive, the further on average they will be from the ideal distribution.

Sure, you can object that in this case, it would be dead obvious to just include a T3: “The coin lands heads 50% of the time.” But that misses the point.

The Bayesian agent had a way out – they could have noticed after a long time that their beliefs were constantly wavering from extreme confidence in T1 to extreme confidence in T2, and seemed to be doing the opposite of converging to reality. They could have noticed that an even distribution of credences would allow them to do much better at predicting the data. And if they had done so, they they would end up always giving an accurate forecast of the next outcome.

But they didn’t, and they didn’t because the model that exactly fit reality was not in their model set. Their epistemic system didn’t allow them the flexibility needed to realize that they needed to learn from their failures and rethink their priors.

Reality is very messy and complicated and rarely adheres exactly to the nice simple models we construct. It doesn’t seem crazily implausible that we might end up accidentally excluding the true model from our set of possible models, and this example demonstrates a way that Bayesian reasoning can lead you astray in exactly these circumstances.

Bayesianism as natural selection of ideas

There’s a beautiful parallel between Bayesian updating of beliefs and evolutionary dynamics of a population that I want to present.

Let’s start by deriving some basic evolutionary game theory! We’ll describe a population as made up of N different genotypes:

(1, 2, 3, …, N)

Each of these genotypes is represented in some proportion of the population, which we’ll label with an X.

Distribution of genotypes in the population X =  (X1, X2, X3, …, XN)

Each of these fractions will in general change with time. For example, if some ecosystem change occurs that favors genotype 1 over the other genotypes, then we expect X1 to grow. So we’ll write:

Distribution of genotypes over time = (X1(t), X2(t), X3(t), …, XN(t))

Each genotype has a particular fitness that represents how well-adjusted it is to survive onto the next generation in a population.

Fitness of genotypes = (f1, f2, f3, …, fN)

Now, if Genotype 1 corresponds to a predator, and Genotype 2 to its prey, then the fitness of Genotype 2 very much depends on the population of Genotype 1 organisms as well its own population. In general, the fitness function for a particular genotype is going to depend on the distribution of all the genotypes, not just that one. This means that we should write each fitness as a function of all the Xis

Fitness of genotypes = (f1(X), f2(X), f3(X), …, fN(X))

Now, what is relevant to the change of any Xi is not the absolute value of the fitness function fi, but the comparison of fi to the average fitness of the entire population. This reflects the fact that natural selection is competitive. It’s not enough to just be fit, you need to be more fit than your neighbors to successfully pass on your genes.

We can find the average fitness of the population by the standard method of summing over each fitness weighted by the proportion of the population that has that fitness:

favg = X1 f1 + X2 f2 + … + XN fN

And since the fitness of a genotype is relative to the average population genotype the change of Xi is proportional to the ratio of f/ favg. In addition, the change of Xi at time t should be proportional to the size of Xat time t (larger populations grow faster than small populations). Here is the simplest equation we could write with these properties:

Xi(t + 1) = Xi(t) · f/ favg

This is the discrete replicator equation. Each genotype either grows or shrinks over time according to the ratio of its fitness to the average population fitness. If the fitness of a given genotype is exactly the same as the average fitness, then the proportion of the population that has that genotype stays the same.

Now, how does this relate to Bayesian inference? Instead of a population composed of different genotypes, we have a population composed of beliefs in different theories. The fitness function for each theory corresponds to how well it predicts new evidence. And the evolution over time corresponds to the updating of these beliefs upon receiving new evidence.

Xi(t + 1) → P(Ti | E)
Xi(t) → P(Ti)
fi → P(E | Ti)

What does favg become?

favg = X1 f1 + … + XN fN
P(E) = P(T1) P(E | T2) + … + P(TN) P(E | TN)

But now our equation describing evolutionary dynamics just becomes identical to Bayes’ rule!

Xi(t + 1) = Xi(t) · f/ favg
P(Ti | E) = P(Ti) P(E | Ti) / P(E)

This is pretty fantastic. It means that we can quite literally think of Bayesian reasoning as a form of natural selection, where only the best ideas survive and all others are outcompeted. A Bayesian treats their beliefs as if they are organisms in an ecosystem that punishes those that fail to accurately predict what will happen next. It is evolution towards maximum predictive power.

There are some intriguing hints here of further directions for study. For example, the Bayesian fitness function only depended on the particular theory whose fitness was being evaluated, but it could have more generally depended on all of the different theories as in the original replicator equation.

Plus, the discrete replicator equation is only one simple idealized model of patterns of evolutionary change in populations. There is a continuous replicator equation, where populations evolve smoothly as analytic functions of time. There are also generalizations that introduce mutation, allowing a population to spontaneously generate new genotypes and transition back and forth between similar genotypes. Evolutionary graph theory incorporates population structure into the model, allowing for subtleties regarding complex spatial population interactions.

What would an inference system based off of these more general evolutionary dynamics look like? How would it compare to Bayesianism?

Against falsifiability

What if time suddenly stopped everywhere for 5 seconds?

Your first instinct might be to laugh at the question and write it off as meaningless, given that such questions are by their nature unfalsifiable. I think this is a mistaken impulse, and that we can in general have justified beliefs about such questions. Doing so requires moving beyond outdated philosophies of science, and exploring the nature of evidence and probability. Let me present two thought experiments.

The Cyclic Universe

imagine that the universe evolves forward in time in such a way that at one time t1 its state is exactly identical to an earlier state at time t0. I mean exactly identical – the wave function of the universe at time t1 is quantitatively identical to the wave function at time t0.

By construction, we have two states of the universe that cannot be distinguished in any way whatsoever – no observation or measurement that you could make of the one will distinguish it from the other. And yet we still want to say that they are different from one another, in that one was earlier than the other.

But then we are allowing the universe to have a quantity (the ‘time-position’ of events) that is completely undetectable and makes no measurable difference in the universe. This should certainly make anybody that’s read a little Popper uneasy, and should call into question the notion that a question is meaningless if it refers to unfalsifiable events. But let’s leave this there for the moment and consider a stronger reason to take such questions seriously.

The Freezing Rooms

The point of this next thought experiment will be that we can be justified in our beliefs about unobservable and undetectable events. It’s a little subtler, but here we go.

Let’s imagine a bizarre building in which we have three rooms with an unusual property: each room seems to completely freeze at regular intervals. By everything I mean everything – a complete cessation of change in every part of the room, as if time has halted within.

Let’s further imagine that you are inside the building and can freely pass from one room to the other. From your observations, you conclude that Room 1 freezes every other day, Room 2 every fourth day, and Room 3 every third day. You also notice that when you are in any of the rooms, the other two rooms occasionally seem to suddenly “jump forward” in time by a day, exactly when you expect that your room would be frozen.

Room 1

Room 2 Room 3

So you construct this model of how these bizarre rooms work, and suddenly you come to a frightening conclusion – once every twelve days, all three rooms will be frozen at the same time! So no matter what room you are in, there will be a full day that passes without anybody noticing it in the building, and with no observable consequences in any of the rooms.

Sure, you can just step outside the building and observe it for yourself. But let’s expand our thought experiment: instead of a building with three rooms, let’s imagine that the entire universe is partitioned into three regions of space, in which the same strange temporal features exist. You can go from one region of the universe to another, allowing you to construct an equivalent model of how things work. And you will come to a justified belief that there are periods of time in which absolutely NOTHING is changing in the universe, and yet time is still passing.

Let’s just go a tiny bit further with this line of thought – imagine that suddenly somehow the other two rooms are destroyed (or the other two regions of space become causally disconnected in the extended case). Now the beings in one region will truly have no ability to do the experiments that allowed them to conclude that time is frozen on occasion in their own universe – and yet they are still justified in this belief. They are justified in the same way that somebody that observed a beam of light heading towards the event horizon of the universe is justified in continuing to believe in the existence of the beam of light, even thought it is entirely impossible to ‘catch up’ to the light and do an experiment that verifies that no, it hasn’t gone out of existence.

This thought experiment demonstrates that questions that refer to empirically indistinguishable states of the universe can be meaningful. This is a case that is not easy for Popperian falsifiability or old logical positivists to handle, but can be analyzed through the lens of modern epistemology.

Compare the following two theories of the time patterns of the building, where the brackets indicate a repeating pattern:

Theory 1
Room 1: [ ✓,  ]
Room 2: [ ✓, ✓, ✓, ]
Room 3: [ ✓, ✓, ]

Theory 2
Room 1: [ ✓, , ✓, , ✓, , ✓, , ✓, , ✓ ]
Room 2:  [ ✓, ✓, ✓, , ✓, ✓, ✓, ✗, ✓, ✓, ✓ ]
Room 3: [ ✓, ✓, ✗, ✓, ✓, ✗, ✓, ✓, ✗, ✓, ✓ ]

Notice that these two theories make all the same predictions about what everybody in each room will observe. But Theory 2 denies the existence of the the total freeze every 12 days, while Theory 1 accepts it.

Notice also that Theory 2 requires a much more complicated description to describe the pattern that it postulates. In Theory 1, you only need 9 bits to specify the pattern, and the days of total freeze are entailed as natural consequences of the pattern.

In Theory 2, you need 33 bits to be able to match the predictions of Theory 1 while also removing the total freeze!

Since observational evidence does not distinguish between these theories, this difference in complexity must be accounted for in the prior probabilities for Theory 1 and Theory 2, and would give us a rational reason to prefer Theory 1, even given the impossibility of falsification of Theory 2. This preference wouldn’t go away even in the limit of infinite evidence, and could in fact become stronger.

For instance, suppose that the difference in priors is proportional to the ratio of information required to specify the theory. In addition, suppose that all other theories of the universe that are empirically distinguishable from Theory 1 and Theory 2 starts with a total prior of 50%. If in the limit of infinite evidence we find that all other theories have been empirically ruled out, then we’ll see:

P(Theory 1) = 39.29%
P(Theory 2) = 10.71%
P(All else) = 50%

Infinite evidence limit
P(Theory 1) = 78.57%
P(Theory 2) =21.43%
P(All else) = 0%

The initial epistemic tax levied on Theory 2 due to its complexity has functionally doubled, as it is now two times less likely that Theory 1! Notice how careful probabilistic thinking does a great job of dealing with philosophical subtleties that are too much for obsolete frameworks of philosophy of science based on the concept of falsifiability. The powers of Bayesian reasoning are on full display here.

Metaphysics and fuzziness: Why tables don’t exist and nobody’s tall

  • The tallest man in the world is tall.
  • If somebody is one nanometer shorter than a tall person, then they are themselves tall.

If the word tall is to mean anything, then it must imply at least these two premises. But from the two it follows by mathematical induction that a two-foot infant is tall, that a one-inch bug is tall, and worst, that a zero-inch tall person is tall. Why? If the tallest man is the world is tall (let’s name him Fred), then he would still be tall if he was shrunk by a single nanometer. We can call this new person ‘Fred – 1 nm’. And since ‘Fred – 1 nm’ is tall, so is ‘Fred – 2 nm’. And then so is ‘Fred – 3 nm’. Et cetera until absurdity ensues.

So what went wrong? Surely the first premise can’t be wrong – who could the word apply to if not the tallest man in the world?

The second seems to be the only candidate for denial. But this should make us deeply uneasy; the implication of such a denial is that there is a one-nanometer wide range of heights, during which somebody makes the transition from being completely not tall to being completely tall. Somebody exactly at this line could be wavering back and forth between being tall and not every time a cell dies or divides, and every time a tiny draft rearranges the tips of their hairs.

Let’s be clear just how tiny a nanometer really is: A sheet of paper is about a hundred thousand nanometers thick. That’s more than the number of inches that make up a mile. If the word ‘tall’ means anything at all, this height difference just can’t make a difference in our evaluation of tallness.

Tall: Not.png

So we are led to the conclusion: Fred is not tall. And if the tallest man on the planet isn’t tall, then nobody is tall. Our concept of tallness is just a useful idea that falls apart on any close examination.

This is the infamous Sorites paradox. What else is vulnerable to versions of the Sorites paradox? Almost every concept that we use in our day to day life! Adulthood, intelligence, obesity, being cold, personhood, wealthiness, and on and on. It’s harder to look for concepts that aren’t affected than those that are!

The Sorites paradox is usually seen in discussions of properties, but it can equally well be applied to discussions of objects. This application leads us to a view of the world that differs wildly from our common sense view. Let’s take a standard philosophical case study: the table. What is it for something to be a table? What changes to a table make it no longer a table?

Whatever answers these questions about tables have, they will hopefully embody our common sense notions about tables and allow us to make the statements that we ordinarily want to make about tables. One such common sense notion involves what it takes for a table to cease being a table; presumably little changes in the table are allowed, while big changes (cleaving it into small pieces) are not. But here we run into the problem of vagueness.

If X is a table, then X would still be a table if it lost a tiny bit of the matter constituting it. Like before, we’ll take this to the extreme to maximize its intuitive plausibility: If a single atom is shed from a table, it’s still a table. Denial of this is even worse than it was before; if changes by single atoms could change table-hood, we would be in a position where we should be constantly skeptical of whether objects are tables, given the microscopic changes that are happening to ordinary tables all the time.


And so we are led inevitably to the conclusion that single atoms are tables, and even that empty space is a table. (Iteratively remove single atoms from a table until it has become arbitrarily small.) Either that, or there are no tables. I take this second option to be preferable.


How far do these arguments reach? It seems like most or all macroscopic objects are vulnerable to them. After all, we don’t change our view of macroscopic objects that undergo arbitrarily small losses of constituent material. And this leads us to a worldview in which the things that actually exist match up with almost none of the things that our common-sense intuitions tell us exist: tables, buildings, trees, planets, computers, people, and so on.

But is everything eliminated? Plausibly not. What can be said about a single electron, for instance, that would lead to a continuity premise? Probably nothing; electrons are defined by a set of intrinsic properties, none of which can differ to any degree while the particle still remains an electron. In general, all of the microscopic entities that are thought to fundamentally compose everything else in our macroscopic world will be (seemingly) invulnerable to attack by a version of the Sorites paradox.

The conclusion is that some form of eliminativism is true (objects don’t exist, but their lowest-level constituents do). I think that this is actually the right way to look at the world, and is supported by a host of other considerations besides those in this post.

Closing comments

  • The subjectivity of ‘tall’ doesn’t remove the paradox. What’s in question isn’t the agreement between multiple people about what tall means, but the coherency of the concept as used by a single person. If a single person agrees that Fred is tall, and that arbitrarily small height differences can’t make somebody go from not tall to tall, then they are led straight into the paradox.
  • The most common response to this puzzle I’ve noticed is just to balk and laugh it off as absurd, while not actually addressing the argument. Yes, the conclusion is absurd, which is exactly why the paradox is powerful! If you can resolve the paradox and erase the absurdity, you’ll be doing more than 2000 years of philosophers and mathematicians have been able to do!

Is [insert here] a religion?

(Everything I’m saying here is based on experiences in a few religious studies classes I’ve taken, some papers that I’ve read, and some conversations with religious studies people. The things I say might not be actually be representative of the aggregate of religious studies scholars, though Google Scholar would seem to provide some evidence for it.)

Religious studies people tend to put a lot of emphasis on the fact that ‘religion’ is a fuzzy word. That is, while there are some organizations that everybody will agree are religions (Judaism, Christianity, Islam), there are edge cases that are less clear (Unitarian Universalism, Hare Krishnas, Christian Science). In addition, attempts to lay out a set of necessary and sufficient conditions for membership in the category “religion” tend to either let in too many things or not enough things.

For some reason this is taken to be a very significant fact, and people solemnly intone things like “Is nationalism a type of religion?” and “Isn’t atheism really just the new popular religion for the young?”. Sociologists spend hours arguing with each other about different definitions of religion, and invoking new typologies to distinguish between religions and non-religions.

The strange thing about this is that religion is not at all unique in this regard. Virtually every word that we use is similarly vague, with fuzzy edges and ambiguities. That’s just how language works. Words don’t attain meanings through careful systematic processes of defining necessary and sufficient conditions. Words attain meanings by being attached to clusters of concepts that intuitively feel connected, and evolve over time as these clusters shift and reshape themselves.

There is a cluster of important ideas about language, realization of which can keep you from getting stuck in philosophical dead ends. The vagueness inherent to much of natural language is one of these ideas. Another is that semantic prescriptivism is wrong. Humans invent the mapping of meanings to words, we don’t pluck it out of an objective book of the Universe’s Preferred Definitions of Terms. When two people are arguing about what the word religion means, they aren’t arguing about a matter of fact. There are some reasons why such an argument might be productive – for instance, there might be pragmatic reasons for redefining words. But there is no sense in which the argument is getting closer to the truth about what the actual meaning of ‘religion’ is.

Similarly, every time somebody says that football fans are really engaging in a type of religious ritual, because look, football matches their personal favorite list of sufficient conditions for being a religion, they are confused about semantic prescriptivism. At best, such comparisons might reveal previously unrecognized features of football fanaticism. But these comparisons can also end up serving to cause mistaken associations to carry over to the new term from the old. (Hm, so football is a religion? Well, religions are about supernatural deities, so Tom Brady must be a supernatural deity of the football religion. And religious belief tends to be based on faith, so football fans must be irrationally hanging on to their football-shaped worldview.)

It seems to me that scholars of religious studies have accepted the first of these ideas, but are still in need of recognizing the second. It also seems like there is a similar phenomenon going on in sociological discussions of racial terms and gender terms, where the ordinary fuzziness of language is treated as uniquely applying to these terms, taken as exceptionally important, and analyzed to death. I would be interested to hear hypotheses for why this type of thing happens where it does.

Hyperreal decision theory

I’ve recently been writing a lot about how infinities screw up decision theory and ethics. In his paper Infinite Ethics, Nick Bostrom talks about attempts to apply nonstandard analysis to try to address the problems of infinities. I’ll just briefly describe nonstandard analysis in this post, as well as the types of solutions that he sketches out.


So first of all, what is nonstandard analysis? It is a mathematical formalism that extends the ordinary number system to include infinitely large numbers and infinitely small numbers. It doesn’t do so just by adding the symbol ∞ to the set of real numbers ℝ. Instead, it adds an infinite amount of different infinitely large numbers, as well as an infinity of infinitesimal numbers, and proceeds to extend the ordinary operations of addition and multiplication to these numbers. The new number system is called the hyperreals.

So what actually are hyperreals? How does one do calculations with them?

A hyperreal number is an infinite sequence of real numbers. Here are some examples of them:

(3, 3, 3, 3, 3, …)
(1, 2, 3, 4, 5, …)
(1, ½, ⅓, ¼, ⅕, …)

It turns out that the first of these examples is just the hyperreal version of the ordinary number 3, the second is an infinitely larger hyperreal, and the third is an infinitesimal hyperreal. Weirded out yet? Don’t worry, we’ll explain how to make sense of this in a moment.

So every ordinary real number is associated with a hyperreal in the following way:

N = (N, N, N, N, N, …)

What if we just switch the first number in this sequence? For instance:

N’ = (1, N, N, N, N, …)

It turns out that this change doesn’t change the value of the hyperreal. In other words:

N = N’
(N, N, N, N, N, …) = (1, N, N, N, N, …)

In general, if you take any hyperreal number and change a finite amount of the numbers in its sequence, you end up with the same number you started with. So, for example,

3 = (3, 3, 3, 3, 3, …)
= (1, 5, 99, 3, 3, 3, …)
= (3, 3, 0, 0, 0, 0, 3, 3, …)

The general rule for when two hyper-reals are equal relies on the concept of a free ultrafilter, which is a little above the level that I want this post to be at. Intuitively, however, the idea is that for two hyperreals to be equal, the number of ways in which their sequences differ must be either finite or certain special kinds of infinities (I’ll leave this “special kinds” vague for exposition purposes).

Adding and multiplying hyperreals is super simple:

(a1, a2, a3, …) + (b1, b2, b3, …) = (a1 + b1, a2 + b2, a3 + b3, …)
(a1, a2, a3, …) · (b1, b2, b3, …) = (a1 · b1, a2 · b2, a3 · b3, …)

Here’s something that should be puzzling:

A = (0, 1, 0, 1, 0, 1, …)
B = (1, 0, 1, 0, 1, 0, …)
A · B = ?

Apparently, the answer is that A · B = 0. This means that at least one of A or B must also be 0. But both of them differ from 0 in an infinity of places! Subtleties like this are why we need to introduce the idea of a free ultrafilter, to allow certain types of equivalencies between infinitely differing sequences.

Anyway, let’s go on to the last property of hyperreals I’ll discuss:

(a1, a2, a3, …) < (b1, b2, b3, …)
an ≥ bn for only a finite amount of values of n

(This again has the same weird infinite exceptions as before, which we’ll ignore for now.)

Now at last we can see why (1, 2, 3, 4, …) is an infinite number:

Choose any real number N
N = (N, N, N, N, …)
ω = (1, 2, 3, 4, …)
So ωn ≤ Nn for n = 1, 2, 3, …, floor(N)
and ωn > Nn for all other n

This means that ω is larger than N, because there are only a finite amount of members of the sequence for which ωn is greater than ωn. And since this is true for any real number N, then ω must be larger than every real number! In other words, you can now give an answer to somebody who asks you to name a number that is bigger than every real number!

ε = (1, ½, ⅓, ¼, ⅕, …) is an infinitesimal hyperreal for a similar reason:

Choose any real number N > 0
N = (N, N, N, N, …)
ε = (1, ½, ⅓, ¼, ⅕, …)
So εn ≥ Nn for n = 1, 2, …, ceiling(1/N)
and εn < Nn for all other n

Once again, ε is only larger than N in a finite number of places, and is smaller in the other infinity. So ε is smaller than every real number greater than 0.

In addition, the sequence (0, 0, 0, …) is smaller than ω for every value of its sequence, so ε is larger than 0. A number that is smaller than every positive real and greater than 0 is an infinitesimal.

Okay, done introducing hyperreals! Let’s now see how this extended number system can help us with our decision theory problems.

Saint Petersburg Paradox

One standard example of weird infinities in decision theory is the St Petersburg Paradox, which I haven’t talked about yet on this blog. I’ll use this thought experiment as a template for the discussion. Briefly, then, imagine a game that works as follows:

Game 1

Round 1: Flip a coin.
If it lands H, then you get $2 and the game ends.
If it lands T, then you move on to Round 2.

Round 2: Flip the coin again.
If it lands H, then you get $4 and the game ends.
If T, then move on to Round 3.

Round 3: Flip a coin.
If it lands H, then you get $8 and the game ends.
If it lands T, then you move on to Round 4.

(et cetera to infinity)

This game looks pretty nice! You are guaranteed at least $2, and your payout doubles every time the coin lands H. The question is, how nice really is the game? What’s the maximum amount that you should be willing to pay in to play?

Here we run into a problem. To calculate this, we want to know what the expected value of the game is – how much you make on average. We do this by adding up the product of each outcome and the probability of that outcome:

EV = ½ · $2 + ¼ · $4 + ⅛ · $8 + …
= $1 + $1 + $1 + …
= ∞

Apparently, the expected payout of this game is infinite! This means that in order to make a profit, you should be willing to give literally all of your money in order to play just a single round of the game! This should seem wrong… If you pay $1,000,000 to play the game, then the only way that you make a profit is if the coin lands heads twenty times in a row. Does it really make sense to risk all of this money on such a tiny chance?

The response to this is that while the chance that this happens is of course tiny, the payout if it does happen is enormous – you stand to double, quadruple, octuple, (et cetera) your money. In this case, the paradox seems to really be a result of the failure of our brains to intuitively comprehend exponential growth.

There’s an even stronger reason to be unhappy with the St Petersburg Paradox. Say that instead of starting with a payout of $2 and doubling each time from there, you had started with a payout of $2000 and doubled from there.

Game 2

Round 1: Flip a coin.
If it lands H, then you get $2000 and the game ends.
If it lands T, then you move on to Round 2.

Round 2: Flip the coin again.
If it lands H, then you get $4000 and the game ends.
If T, then move on to Round 3.

Round 3: Flip a coin.
If it lands H, then you get $8000 and the game ends.
If it lands T, then you move on to Round 4.

(et cetera to infinity)

This alternative game must be better than the initial game – after all, no matter how many times the coin lands T before finally landing H, your payout is 1000 better than it would have been previously. So if you’re playing the first of the two games, then you should always wish that you were playing the second, no matter how many times the coin ends up landing T.

But the expected value comparison doesn’t grant you this! Both games have an infinite expected value, and infinity is infinity. We can’t have one infinity being larger than another infinity, right?

Enter the hyperreals! We’ll turn the expected value of the first game into a hyperreal as follows:

EV1 = ½ · $2 = $1
EV2 = ½ · $2 + ¼ · $4 = $1 + $1 = $2
EV3 = ½ · $2 + ¼ · $4 + ⅛ · $8 = $1 + $1 + $1 = $3

EV = (EV1, EV2, EV3, …)
= $(1, 2, 3, …)

Now we can compare it to the second game:

Game 1: $(1, 2, 3, …) = ω
Game 2: $(1000, 2000, 3000, …) = $1000 · ω

So hyperreals allow us to compare infinities, and justify why Game 2 has a 1000 times larger expected value than Game 1!

Let me give another nice result of this type of analysis. Imagine Game 1′, which is identical to Game 1 except for the first payout, which is $4 instead of $2. We can calculate the payouts:

Game 1: $(1, 2, 3, …) = ω
Game 1′: $(2, 3, 4, …) = $1 + ω

The result is that Game 1′ gives us an expected increase of just $1. And this makes perfect sense! After all, the only difference between the games is if they end in the first round, which happens with probability ½. And in this case, you get $4 instead of $2. The expected difference between the games should therefore be ½ · $2 = $1! Yay hyperreals!

Of course, this analysis still ends up concluding that the St Petersburg game does have an infinite expected payout. Personally, I’m (sorta) okay with biting this bullet and accepting that if your goal is to maximize money, then you should in principle give any arbitrary amount to play the game.

But what I really want to talk about are variants of the St Petersburg paradox where things get even crazier.

Getting freaky

For instance, suppose that instead of the initial game setup, we have the following setup:

Game 3

Round 1: Flip a coin.
If it lands H, then you get $2 and the game ends.
If it lands T, then you move on to Round 2.

Round 2: Flip the coin again.
If it lands H, then you pay $4 and the game ends.
If T, then move on to Round 3.

Round 3: Flip the coin again.
If it lands H, then you get $8 and the game ends.
If it lands T, then you move on to Round 4.

Round 4: Flip the coin again.
If it lands H, then you pay $16 and the game ends.
If it lands T, then you move on to Round 5.

(et cetera to infinity)

The only difference now is that if the coin lands H on any even round, then instead of getting money that round, you have to pay that money back to the dealer! Clearly this is a less fun game than the last one. How much less fun?

Here things get really weird. If we only looked at the odd rounds, then the expected value is ∞.

EV = ½ · $2 + ⅛ · $8 + …
= $1 + $1 + …
= ∞

But if we look at the odd rounds, then we get an expected value of -∞!

EV = ¼ · -$4 + 1/16 · -$16 + …
= -$1 + -$1 + …
= -∞

We find the total expected value by adding together these two. But can we add ∞ to -∞? Not with ordinary numbers! Let’s convert our numbers to hyperreals instead, and see what happens.

EV = $(1, -1, 1, -1, …)

This time, our result is a bit less intuitive than before. As a result of the ultrafilter business we’ve been avoiding talking about, we can use the following two equalities:

(1, -1, 1, -1, …) = 1
(-1, 1, -1, 1, …) = -1

This means that the expected value of Game 3 is $1. In addition, if Game 3 had started with you having to pay $2 for the first round rather than getting $2, then the expected value would be -$1.

So hyperreal decision theory recommends that you play the game, but only buy in if it costs you less than $1.

Now, the last thought experiment I’ll present is the weirdest of them.

Game 4

Round 1: Flip a coin.
If it lands H, then you pay $2 and the game ends.
If it lands T, then you move on to Round 2.

Round 2: Flip the coin again.
If it lands H, then you get $2 and the game ends.
If T, then move on to Round 3.

Round 3: Flip the coin again.
If it lands H, then you pay $2.67 and the game ends.
If it lands T, then you move on to Round 4.

Round 4: Flip the coin again.
If it lands H, then you get $4 and the game ends.
If it lands T, then you move on to Round 4.

Round 4: Flip the coin again.
If it lands H, then you pay $6.40 and the game ends.
If it lands T, then you move on to Round 4.

(et cetera to infinity)

The pattern is that the payoff on the nth round is (-2)n / n. From this, we see that the expected value of the nth round is 1/n. This sum converges as follows:

n=1 (-1)/ n = -ln(2) ≈ -.69

But by Cauchy’s rearrangement theorem, it turns out that by rearranging the terms of this sum, we can make it add up to any amount that we want! (this follows from the fact that the sum of the absolute values of the term is infinite)

This means that not only is the expected value for this game undefined, but it can be justified having every possible value. Not only do we not know the expected value of the game, but we don’t know whether it’s a positive game or a negative game. We can’t even figure out if it’s a finite game or an infinite game!

Let’s apply hyperreal numbers.

EV1 = -$1
EV2 = $(-1 + ½) = -$0.50
EV3 = $(-1 + ½ – ⅓) = -$0.83
EV4 = $(-1 + ½ – ⅓ + ¼) = -$0.58

So EV = $(-1.00, -0.50, -0.83, -0.58, …)

Since this series converges from above and below to -ln(2) ≈ -$0.69, the expected value is -$0.69 + ε, where ε is a particular infinitesimal number. So we get a precisely defined expectation value! One could imagine just empirically testing this value by running large numbers of simulations.

A weirdness about all of this is that the order in which you count up your expected value is extremely important. This is a general property of infinite summation, and seems like a requirement for consistent reasoning about infinities.

We’ve seen that hyperreal numbers can be helpful in providing a way to compare different infinities. But hyperreal numbers are only the first step into the weird realm of the infinite. The surreal number system is a generalization of the hyperreals that is much more powerful. In a future post, I’ll talk about the highly surreal decision theory that results from application of these numbers.

Infinite ethics

There are a whole bunch of ways in which infinities make decision theory go screwy. I’ve written about some of those ways here. This post is about a thought experiment in which infinities make ethics go screwy.

WaitButWhy has a great description of the thought experiment, and I recommend you check out the post. I’ll briefly describe it here anyway:

Imagine two worlds, World A and World B. Each is an infinite two-dimensional checkerboard, and on each square sits a conscious being that can either be very happy or very sad. At the birth of time, World A is entirely populated by happy beings, and World B entirely by sad beings.

From that moment forwards, World A gradually becomes infected with sadness in a growing bubble, while World B gradually becomes infected with happiness in a growing bubble. Both universes exist forever, so the bubble continues to grow forever.

Picture from WaitButWhy

The decision theory question is: if you could choose to be placed in one of these two worlds in a random square, which should you choose?

The ethical question is: which of the universes is morally preferable? Said another way: if you had to bring one of the two worlds into existence, which would you choose?

On spatial dominance

At every moment of time, World A contains an infinity of happiness and a finite amount of sadness. On the other hand, World B always contains an infinity of sadness and a finite amount of happiness.

This suggests the answer to both the ethical question and the decision theory question: World A is better. Ethically, it seems obvious that infinite happiness minus finite sadness is infinitely better than infinite sadness minus finite happiness. And rationally, given that there are always infinitely more people outside the bubble than inside, at any given moment in time you can be sure that you are on the outside.

A plot of the bubble radius over time in each world would look like this:

Infinite Ethics Plots

In this image, we can see that no matter what moment of time you’re looking at, World A dominates World B as a choice.

On temporal dominance

But there’s another argument.

Let’s look at a person at any given square on the checkerboard. In World A, they start out happy and stay that way for some finite amount of time. But eventually, they are caught by the expanding sadness bubble, and then stay sad forever. In World B, they start out sad for a finite amount of time, but eventually are caught by the expanding happiness bubble and are happy forever.

Plotted, this looks like:

Infinite Ethics Plots 2.png

So which do you prefer? Well, clearly it’s better to be sad for a finite amount of time and happy for an infinite amount of time than vice versa. And ethically, choosing World A amounts to dooming every individual to a lifetime of finite happiness and then infinite sadness, while World B is the reverse.

So no matter which position on the checkerboard you’re looking at, World B dominates World A as a choice!

An impossible synthesis

Let’s summarize: if you look at the spatial distribution for any given moment of time, you see that World A is infinitely preferable to World B. And if you look at the temporal distribution for any given position in space, you find that B is infinitely preferable to A.

Interestingly, I find that the spatial argument seems more compelling when considering the ethical question, while the temporal argument seems more compelling when considering the decision theory question. But both of these arguments apply equally well to both questions. For instance, if you are wondering which world you should choose to be in, then you can think forward to any arbitrary moment of time, and consider your chances of being happy vs being sad in that moment. This will get you the conclusion that you should go with World A, as for any moment in the future, you have a 100% chance of being one of the happy people as opposed to the sad people.

I wonder if the difference is that when we are thinking about decision theory, we are imagining ourselves in the world at a fixed location with time flowing past us, and it is less intuitive to think of ourselves at a fixed time and ask where we likely are.

Regardless, what do we do in the face of these competing arguments? One reasonable thing is to try to combine the two approaches. Instead of just looking at a fixed position for all time, or a fixed time over all space, we look at all space and all time, summing up total happiness moments and subtracting total sadness moments.

But now we have a problem… how do we evaluate this? What we have in both worlds is essentially a +∞ and a -∞ added together, and no clear procedure for how to make sense of this addition.

In fact, it’s worse than this. By cleverly choosing a way of adding up the total amount of the quantity happiness – sadness, we can make the result turn out however we want! For instance, we can reach the conclusion that World A results in a net +33 happiness – sadness by first counting up 33 happy moments, and then ever afterwords switching between counting a happy moment and a sad moment. This summation will eventually end up counting all the happy and sad moments, and will conclude that the total is +33.

But of course, there’s nothing special about +33; we could have chosen to reach any conclusion we wanted by just changing our procedure accordingly. This is unusual. It seems that both the total expected value and moral value are undefined for this problem.

The undefinability of the total happiness – sadness of this universe is a special case of the general rule that you can’t subtract infinity from infinity. This seems fairly harmless… maybe it keeps us from giving a satisfactory answer to this one thought experiment, but surely nothing like this could matter to real-world ethical or decision theory dilemmas?

Wrong! If in fact we live in an infinite universe, then we are faced with exactly this problem. If there are an infinite number of conscious experiencers out there, some suffering and some happy, then the total quantity of happiness – sadness in the universe is undefined! What’s more, a moral system that says that we ought to increase the total happiness of the universe will return an error if asked to evaluate what we ought to do in an in infinite universe!

If you think that you should do your part to make the universe a happier place, then you must have some notion of a total amount of happiness that can be increased. And if the total amount of happiness is unbounded, then there is no sensible way to increase it. This seems like a serious problem for most brands of consequentialism, albeit a very unusual one.

More on random sampling from Nature’s Urn

In a previous post, I developed an analogy between patterns of reasoning and sampling procedures. I want to go a little further with two expansions on this idea.

Scientific laws and domains of validity

First, different sampling procedures can focus on sampling from different regions of the urn. This is analogous to how scientific theories have specific domains of validity that they were built to explain, and in general their conclusions do not spread beyond this domain.

Classical Newtonian mechanics is a great theory to explain slowly swinging pendulums and large gravitating bodies, but if you apply it to particles that are too small, or moving too fast, or too massive, then you’ll get bad results. In general, any scientific law will be known to work within a certain range of energies or sizes or speeds.

By analogy, the Super Persuader was not a good source of evidence, because its sampling procedure was to scour the urn for any black balls it could find, and ignore all white balls. Ideally, we want our truth-seeking enterprises to function like random sampling of balls from an urn. But of course, the way that scientists seek out evidence is not analogous to randomly sampling from the entire urn consisting of all pieces of evidence as to the structure of reality. Instead, a psychologist will focus on one region of the urn, a biologist another, and a physicist another.

In this way, a psychologist can say that the evidence they receive is representative of the general state of evidence in a certain region of the urn. The region of the urn being sampled by the scientist represents the domain of validity of the laws they develop.

Developing this line further, we might imagine that there is a general positioning of pieces of evidence or good arguments in terms of accessibility to humans. Some arguments or ideas or pieces of evidence about reality will lie near the top of the urn, and will be low-hanging fruits for any investigators. (Mixing metaphors!) Others will lie deeper down, requiring more serious thought and dedicated investigation to come across.

Advances in tech can allow scientists to dig deeper into Nature’s urn, expanding the domains of validity of their theories and becoming better acquainted with the structure of reality.

Cognitive biases and generalized distortions of reasoning

Second, a taxonomy of different ways in which reasoning can go wrong naturally arises from the metaphor. Some of these correspond nicely to well-known cognitive biases.

For instance, the sampling procedure used by the Super Persuader involved selectively choosing evidence to support a certain hypothesis. In general, this corresponds to selection biases. A special case of this is motivated reasoning. When we strongly desire a hypothesis to be true, we are more likely to find, remember, and fairly judge evidence in its favor than evidence against it. Selection biases are in general just non-random sampling procedures.

Another class of error is misjudgment, where we draw a black ball, but see it as a white ball. This would correspond to things like the backfire effect (where evidence against a proposition we favor serves to strengthen our belief in it) or just failure to understand an argument or a piece of evidence.

A third class of error is bad extrapolation, where we are sampling randomly from one region of the urn, but then act as if we are sampling from some other region. This would include hasty generalizations and all forms of irrational stereotyping.

Generalizing argument strength

Finally, a weakness of the urn analogy is that it treats all arguments as equally strong. We can fix this by imagining that some balls come clustered together as a single, stronger argument. Additionally, we could imagine argument strength as ball density, and suppose that we actually want to estimate the ratio of mass of black balls to mass of white balls. In this way, denser balls effect our judgment of the ratio more severely than less dense ones.

Free will and decision theory (Part 2)

In a previous post, I talked about something that has been confusing me regarding free will and decision theory. I want to revisit this topic and express a different way to frame the issue.

Here goes:

A decision theory is an algorithm used for calculating the expected utilities of the different possible actions you could take: EU(A). It returns a recommendation for you to take the action that maximizes expected utility: A* = argmax EU(A).

I have underlined the word that is the source of the confusion. The question is: how can we make sense of this notion of possible actions given determinism? If we take determinism very seriously, then the set of possible actions is a set with a single member, which is the action that you end up actually taking. There’s an intuitive sense of possibility at play here that looks benign enough, but upon closer examination becomes problematic.

For instance, we obviously want our set of actions to be restricted to some degree – we don’t want our decision theory telling us to snap our fingers and magically turn the world into a utopia. One seemingly clear line we could draw is to say that possibility here just means physical possibility. Actions that require us to exceed the speed of light, violate conservation of energy, or other such physical impossibilities are not allowed to be included in the set of possible actions.

But this is no solution at all! After all, if the physical laws are the things uniquely generating our actual actions, then all other actions must be violations! Determinism dictates that there can’t be multiple different answers to the question of “what happens next?”. We have an intuitive notion of physical possibility that includes things like “wave my hand through the air” and “take a nap”. But upon close examination, these seem to really just be the product of our ignorance of the true workings of the laws of nature. If we could deeply internalize the way that physics generates behaviors like hand-waving and napping, then we would be able to see why in a particular case hand-waving is possible (and thus happens), and why in other cases it cannot happen.

In other words, the claim I am making is that there is no clear distinction on the level of physics between the claim that I can jump to the moon and the claim that I could have waved my hand around in front of me even though I didn’t. The only difference between these two, it seems to me, is in terms of the intuitive obviousness of the impossibility of the first, and the lack of intuitive obviousness for the second.

Let’s say that eventually physicists reduce all of fundamental physics to a single principle, for example the Principle of Minimum Action. Then for any given action, either it is true that this action minimizes Action, or it is false. (sorry for the two meanings of the word ‘action’, it couldn’t be helped) If it is true, then the action is physically possible, and will in fact happen. And if it is false, then the action is physically impossible, and will not happen. We can explicitly lay out an explanation of why me jumping to the moon does not minimize Action, but it is much much much harder to lay out explicitly why me waving my hand in front of my face right now does not minimize Action. The key point is that the only difference here is an epistemic one – some actions are easier for us to diagnose as non-action-minimizing than others, but in reality, they either are or are not.

If this is all true, then physical possibility is hopeless as a source to ground a choice of the set of possible actions, and any formal decision theory will ultimately rest on an unreal distinction between possible and impossible actions. This distinction will not be represented in any real features of the physical world, and will be vulnerable to future discoveries or increases in computational power that expand our knowledge of the causal determinants of our actions.

Are there other notions of possibility that might be more fruitful for grounding the choice of the set of possible actions? I think not. Here’s a general argument for why not.

Ought implies can

(1) If you should do something, then you can do it.
(2) There is only a single thing that you can do.
(3) Therefore, there is at most a single thing that you should do.

This is an argument that I initially saw in the context of morality. I regarded it as a mere intellectual curiosity, fun to ponder but fairly unimportant (given that I didn’t expect much out of ethics in the first place).

But I think that the exact same argument applies for any theory of normative instrumental rationality. This is much more troubling to me! Unlike morality, I actually feel fairly strongly that there are objective facts about instrumental rationality – that is, facts about how an agent should act in order to optimize their values. (This is no longer an ethical should, but an epistemic one)

But I also feel strongly tempted to endorse both premises (1) and (2) with regard to this epistemic should, and want to reject the conclusion. Let’s lay out our options.

Reject (1): But then this means that there are some actions that it is true that you should do, even though you can’t do them. Do we really want a theory of instrumental rationality that tells us that the most rational course of action is one that we definitely cannot take? This seems obviously undesirable, for the same reason that the decision theory that says that the optimal action is to snap your fingers and turn the world into a utopia is undesirable. If this premise is not true of our decision theory, then we might sometimes have to accept that the action we should take is physically impossible, and what’s the use of a decision theory like that?

Reject (2): But this entails an abandonment of our best understanding of physical reality. Even in standard formulations of quantum mechanics, the wave function that describes the state of the universe evolves completely deterministically. (You might now wonder why quantum mechanics is always thought of as a fundamentally probabilistic theory, but this is definitely too big of a topic to go into here.) So it seems likely that this premise is just empirically correct.

Accept (3): But then our theory of rationality is useless, as it tells us nothing besides “Just do what you are going to do”!

This is the puzzle. Do you see any way out?

Paradoxes of infinite decision theory

(Two decision theory puzzles from this paper.)


Donald Trump has just arrived in Purgatory. God visits him and offers him the following deal. If he spends tomorrow in Hell, Donald will be allowed to spend the next two days in Heaven, before returning to Purgatory forever. Otherwise he will spend forever in Purgatory. Since Heaven is as pleasant as Hell is unpleasant, Donald accepts the deal. The next evening, as he runs out of Hell, God offers Donald another deal: if Donald spends another day in Hell, he’ll earn an additional two days in Heaven, for a total of four days in Heaven (the two days he already owed him, plus two new ones) before his return to Purgatory. Donald accepts for the same reason as before. In fact, every time he drags himself out of Hell, God offers him the same deal, and he accepts. Donald spends all of eternity writhing in agony in Hell. Where did he go wrong?

Satan’s Apple

Satan has cut a delicious apple into infinitely many pieces, labeled by the natural numbers. Eve may take whichever pieces she chooses. If she takes merely finitely many of the pieces, then she suffers no penalty. But if she takes infinitely many of the pieces, then she is expelled from the Garden for her greed. Either way, she gets to eat whatever pieces she has taken.

Eve’s first priority is to stay in the Garden. Her second priority is to eat as much apple as possible. She is deciding what to do when Satan speaks. “Eve, you should make your decision one piece at a time. Consider just piece #1. No matter what other pieces you end up taking and rejecting, you do better to take piece #1 than to reject it. For if you take only finitely many other pieces, then taking piece #1 will get you more apple without incurring the greed penalty. On the other hand, if you take infinitely many other pieces, then you will be ejected from the Garden whether or not you take piece #1. So in that case you might as well take it, so that you can console yourself with some additional apple as you are escorted out.”

Eve finds this reasonable, and decides provisionally to take piece #1. Satan continues, “By the way, the same reasoning holds for piece #2.” Eve agrees. “And piece #3, and piece #4 and…” Eve takes every piece, and is ejected from the Garden. Where did she go wrong?

The second thought experiment is sufficiently similar to the first one that I won’t say much about it – just included it in here because I like it.


Let’s assume that Trump is able to keep track of how many days he has been in hell, and can credibly pre-commit to strategies involving only accepting a fixed number of offers before rejecting. Now we can write out all possible strategies for sequences of responses that Trump could make:

Strategy 0 Accept none of the offers, and stay in Purgatory forever.

Strategy N Accept some finite number N of offers, after which you spend 2N days in Heaven and then infinity in Purgatory.

Strategy ∞ Accept all of the offers, and stay in Hell forever.

Assuming that a day in hell is exactly as bad as a day in heaven, Strategy 0 nets you 0 days in Heaven, Strategy N nets you N days in Heaven, and Strategy ∞ nets you ∞ days in Hell.

Obviously Strategy ∞ is the worst option (it is infinitely worse than all other strategies). And for every N, Strategy N is better than Strategy 0. So we have < 0 < N.

So we should choose Strategy N for some N. But which N? Obviously, for any choice of N, there will be arbitrarily better choices that you could have done. The problem is that there is no optimal choice of N. Any reasonable decision theory, when asked to optimize N for utility, is going to just return an error. It’s like asking somebody to tell you the largest integer. This is perhaps something that is difficult to come to terms with, but it is not paradoxical – there is no law of decision theory that every problem has a best solution.

But we still want to answer what we would do if we were in Trump’s shoes. If we actually have to pick an N, what should we do? I think the right answer is that there is no right answer for what we should do. We can say “x is better than y” for different strategies, but cannot say definitively the best answer… because there is no best answer.

One technique that I thought of, however, is the following (inspired by the Saint Petersburg Paradox):

On the first day, Trump should flip a coin. If it lands heads, then he chooses Strategy 1. If it lands tails, then he flips the coin again.

If on the next flip the coin lands heads, then he chooses Strategy 2. And if it lands tails, again he flips the coin.

If on this third flip the coin lands heads, then he chooses Strategy 4. And if not, then he flips again.

Et cetera to infinity.

With this decision strategy, we can calculate the expected number N that Trump will choose. This number is:

E[N] = ½・1 + ¼・2 + ⅛・4 + … = ∞

But at the same time, the coin will certainly eventually land heads, and the process will terminate. The probability that the coin lands tails an infinite number of times is zero! So by leveraging infinities in his favor, Trump gets an infinite positive expected value for days spent in heaven, and is guaranteed to not spend all eternity in Hell.

A weird question now arises: Why should Trump have started at Strategy 1? Or why multiply by 2 each time? Consider the following alternative decision process for the value of N:

On the first day, Trump should flip a coin. If it lands heads, then he chooses Strategy 1,000,000. If it lands tails, then he flips the coin again.

If on the next flip the coin lands heads, then he chooses Strategy 10,000,000. And if it lands tails, again he flips the coin.

If on this third flip the coin lands heads, then he chooses Strategy 100,000,000. And if not, then he flips again.

Et cetera to infinity.

This decision process seems obviously better than the previous one – the minimum number of days in heaven Trump nets is 1 million, which would only have previously happened if the coin had landed tails 20 times in a row. And the growth in number of net days in heaven per tail flip is 5x better than it was originally.

But now we have an analogous problem to the one we started with in choosing N. Any choice of starting strategy or growth rate seems suboptimal – there are always an infinity of arbitrarily better strategies.

At least here we have a way out: All such strategies are equivalent in that they net an infinite number of days. And none of these infinities are any larger than any others. So even if it intuitively seems like one decision process is better than another, on average both strategies will do equally well.

This is weird, and I’m not totally satisfied with it. But as far as I can tell, there isn’t a better alternative response.

Schelling fence

How could a strategy like Strategy N actually be instantiated? One potential way would be for Trump to set up a Schelling fence at a particular value of N. For example, Trump could pre-commit from the first day to only allowing himself to say yes 500 times, and after that saying no.

But there’s a problem with this – if Trump has any doubts about his ability to stick with his plan, and puts any credence in his breezing past Strategy N and staying in hell forever, then this will result in an infinite negative expected value of using a Schelling fence. In other words, use of a Schelling fence seems only advisable if you are 100% sure of your ability to credibly pre-commit.

Here’s an alternative strategy for instantiating Strategy N that smooths out this wrinkle: Each time Trump is given another offer by God, he accepts it with probability N/(N+1), and rejects it with probability 1/(N+1). By doing this, he will on average do Strategy N, but will sometimes do a different strategy M for an M that is close to N.

A harder variant would be if Trump’s memory is wiped clean after every day he spends in Hell, so that each day when he receives God’s offer, it is as if it is the first time. Even if Trump knows that his memory will be wiped clean on subsequent days, he now has a problem: he has no way to remember his Schelling fence, or to even know if he has reached it yet. And if he tries the probabilistic acceptance approach, he has no way to remember the value of N that he decided on.

But there’s still a way for him to get the infinite positive expected utility! He can do so by running a Saint Petersburg Paradox like above not just the first day, but every day! Every day he chooses a value of N using a process with an infinite expected value but a guaranteed finite actual value, and then probabilistically accepts/rejects the offer using this N.

Quick proof that this still ensures finitude: Suppose that he stays in Hell forever, never rejecting the offer. Since there is a finite chance that he selects N = 1, this means that he will select N = 1 an infinite number of times. For each of these times, he has a ½ chance of rejecting and ½ chance of accepting. Since this happens an infinite number of times, he is guaranteed to eventually reject an offer.

Question: what’s the expected number of days in Heaven for this new process? Infinite, just as before! But guaranteed finite. (There should be a name for these types of guaranteed-finite-but-infinite-expected-value quantities.)

Anyway, the conclusion of all of this? Infinite decision theory is really weird.