If in the future we develop the ability to make accurate simulations of other humans, a lot of things will change for the weirder. In many situations where agents with access to simulations of each other interact, a strange type of apparent backward causality will arise. For instance…

When you have two agents that both have access to simulations of the other, things get weird. In such situations, there’s no clear notion of whose decision is a response to the other (as both are responding to each other’s future decision), and so there’s no clear notion of whose decision is causally first. But the question of “who comes first” (in this strange non-temporal sense) turns out to be very important to what strategy the various agents should take!

Let’s consider some examples.

# Chicken

Two agents are driving head-on towards each other. Each has a choice to swerve or to stay driving straight ahead. If they both stay, then they crash and die, the worst outcome for all. If one stays and the other swerves, then the one that swerves pays a reputational cost and the one that stays gains some reputation. And if both swerve, then neither gains or loses any reputation. To throw some numerical values on these outcomes, here’s a payoff matrix:

This is the game of chicken. It is an anti-cooperation game, in that if one side knows what the other is going to do, then they want to do the opposite. The (swerve, swerve) outcome is unstable, as both players are incentivized to stay if they know that their opponent will swerve. But so is the (stay, stay) outcome, as this is the worst possible outcome for both players and they both stand to gain by switching to swerve. There are two pure strategy Nash equilibria (swerve, stay) and (stay, swerve), and one mixed strategy equilibria (with the payoff matrix above, it corresponds to swerving with probability 90% and staying with probability 10%).

That’s all standard game theory, in a setting where you don’t have access to your opponent’s algorithm. But now let’s take this thought experiment to the future, where each player is able to simulate the other. Imagine that you’re one of the agents. What should you do?

The first thought might be the following: you have access to a simulation of your opponent. So you can just observe what the simulation of your opponent does, and do the opposite. If you observe the simulation swerving you stay, and if you observe the simulation staying you swerve. This has the benefit of avoiding the really bad (stay, stay) outcomes, while also exploiting opponents that decide to swerve.

The issue is that this strategy is exploitable. While you’re making use of your ability to simulate your opponent, you are neglecting the fact that your opponent is also simulating you. Your opponent can see that this is your strategy, so they know that whatever they decide to play, you’ll play the opposite. So if they decide to tear off their steering wheel to ensure that they will not swerve no matter what, they know that you’ll fall in line and swerve, thus winning them +1 utility and losing you -1 utility. This is a precommitment: a strategy that an agent uses that restricts the number of future choices available to them. It’s quite unintuitive and cool that this sort of tying-your-hands ends up being an enormously powerful weapon for those that have access to it.

In other words, if Agent 1 sees that Agent 2 is treating their decision as a fixed fact and responding to it accordingly, then Agent 1 gets an advantage, as they can precommit to staying and force Agent 2 to yield to them. But if Agent 2 now sees Agent 1 as responding to their algorithm rather than the other way around, then Agent 2 benefits by precommitting to stay. If there’s a fact about which agent precommits “first”, then we can conclusively say that this agent does better, as they can force the outcome they want. But again, this is not a temporal first. Suppose that Agent 2 is asleep at the wheel, about to wake up, and Agent 1 is trying to decide what to do. Agent 1 simulates them and sees that once they wake up they will tear out their steering wheel without even considering what Agent 2 does. Now Agent 1’s hand is forced; he will swerve in response to Agent 2’s precommitment, even though it hasn’t yet been made. It appears that for two agents in a chicken-like scenario, with access to simulations of one another, the best action is to precommit as quickly and firmly as possible, with as little regard for their opponents’ precommitments as they can manage (the best-performing agent is the one that tears off their steering wheel without even simulating their opponent and seeing their precommitments, as this agent puts themselves fully causally behind anybody that simulates them). But this obviously just leads straight to the (stay, stay) outcome!

This pattern of precommitting, then precommitting to not respond to precommitments, then precommiting to not respond to precommitments to not respond to precommitments, and so on, shows up all over the place. Let’s have another example, from the realm of economics.

# Company Coordination and Boycotts

In my last post, I talked about the Cournot model of firms competing to produce a homogenous good. We saw that competing firms face a coordination problem with respect to the prices they set: every firm sees it in their rational self-interest to undercut other firms to take their customers, but then other firms follow suit, ending up with the price dropping for everybody. That’s good for consumers, but bad for producers! The process of undercutting and then re-equilibrating continues until the price is at the bare minimum that it takes for a producer to be willing to make the good – essentially just minutely above the cost of production. At this point, producers are making virtually no profit and consumer surplus is maximized.

This coordination problem, like all coordination problems, could be solved if only the firms had the ability to precommit. Imagine that the heads of all the companies meet up at some point. They all see the problem that they’re facing, and recognize that if they can stop the undercutting, they’ll all be much richer. So they sign on to a vow to never undercut each other. Of course, signing a piece of paper doesn’t actually restrict your future options. Every company is still just as incentivized as before to break the agreement and undercut their competitors. It helps if they have plausible deniability; the ability to say that their price drop was actually not intended to undercut, but a response to some unrelated change in the market. All that the meeting does is introduce some social cost to undercutting and breaking the vow that wasn’t there before.

To actually permanently fix the coordination problem, the companies need to be able to sign on to something that truly and irrevocably ties their hands, giving them no ability to back out later on (equivalent to the tearing-off-the-steering-wheel as a credible precommitment). Maybe they all decide to put some money towards the creation of a final-check mechanism that looks over all price changes and intervenes to stop any changes that it detects to be intended to undercut opponents. This is precommitment in the purer sense of literally removing an option that the firms previously had. And if this type of tying-of-hands was actually possible, then each company would be rationally incentivized to sign on! (Of course, they’d all be looking for ways to cheat the system and break the mechanism at every step, which would make its actual creation a tad bit difficult.)

So, if you give all companies the ability to jointly sign on to a credible precommitment to not undercut their opponents, then they will take that opportunity. This will keep prices high and keep profits flowing in to the companies. Producer surplus will be maximized, and consumers will get the short end of the stick. Is there any way for the consumers to fight back?

Sure there is! All they need is the ability to precommit as well. Suppose that all consumers are now given the opportunity to come together and boycott any and all companies that precommit to not undercutting each other. If every consumer signs on, and if producers know this, then it’s no longer worth it for them to put in place the price-monitoring mechanism, as they’d just lose all their customers! Of course, the consumers now face their own coordination problem; many of them will still value the product at a price higher than that which is being offered by the companies, even if they’re colluding. And each individual reasons that as long as everybody else is still boycotting the companies, it makes little difference if just one mutually beneficial trade is made with them. So the consumers will themselves face the problem of how to enforce the boycott. But let’s assume that the consumers work this out so that they credibly precommit to never buying from a company that credibly precommits to not undercutting its competitors. Now the market price swings back in their favor, dropping to the cost of production! The consumers win! Whoohoo!

But we’re not done yet. It was only worth it for the consumers to sign on to this precommitment because they predicted that the companies would respond to their precommitment. But what if the companies, seeing the boycott-tactic coming, credibly precommit to never yielding to boycotters? Then the consumers, responding to this precommitment, will realize that boycotting will have no effect on prices, and will just cause them all to lose out on mutually beneficial trades! So they won’t boycott, and therefore the producers get the surplus once more. And just like before, this swings back and forth, with the outcome at each stage depending on which agent treats the other agent’s precommitment as being more primal. But if they each run their apparently-best strategy (that is, making their precommitments with no regard to the precommitments of the other party so as to force their hand and place their own precommitments at the beginning of the causal chain), then we end up with the worst possible outcome for all: producers don’t produce anything and consumers don’t consume, and everybody loses out.

This question of how agents that can simulate one another AND precommit to courses of action should ultimately behave is something that I find quite puzzling and am not sure how to resolve.

# Solving the Linear Cournot Model

The Cournot model is a simple economic model used to describe what happens when multiple companies compete with one another to produce some homogenous product. I’ve been playing with it a bit and ended up solving the general linear case. I assume that this solution is already known by somebody, but couldn’t find it anywhere. So I will post it here! It gives some interesting insight into the way that less-than-perfectly-competitive markets operate. First let’s talk about the general structure of the Cournot model.

Suppose we have n firms. Each produces some quantity of the product, which we’ll label as $q_1, q_2, ..., q_n$. The total amount of product on the market will be given the label $Q = q_1 + q_2 + ... + q_n$. Since the firms are all selling identical products, it makes sense to assume that the consumer demand function $P(q_1, q_2, ..., q_n)$ will just be a function of the total quantity of the product that is on the market: $P(q_1, q_2, ..., q_n) = P(q_1 + q_2 + ... + q_n) = P(Q)$. (This means that we’re also disregarding effects like customer loyalty to a particular company or geographic closeness to one company location over another. Essentially, the only factor in a consumer’s choice of which company to go to is the price at which that company is selling the product.)

For each firm, there is some cost to producing the good. We capture this by giving each firm a cost function $C_1(q_1), C_2(q_2), ..., C_n(q_n)$. Now we can figure out the profit of each firm for a given set of output values $q_1, q_2, ..., q_n$. We’ll label the profit of the kth firm as $\Pi_k$. This profit is just the amount of money they get by selling the product minus the cost of producing the product: $\Pi_k = q_k P(Q) - C_k(q_k)$.

If we now assume that all firms are maximizing profit, we can find the outputs of each firm by taking the derivative of the profit and setting it to zero. $\frac{d\Pi_k}{dq_k} = P(Q) + q_k \frac{dP}{dQ} - \frac{dC_k}{dq_k} = 0$. This is a set of n equations with n unknown, so solving this will fully specify the behavior of all firms!

Of course, without any more assumptions about the functions $P$ and $C_k$, we can’t go too much further with solving this equation in general. To get some interesting general results, we’ll consider a very simple set of assumptions. Our assumptions will be that both consumer demand and producer costs are linear. This is the linear Cournot model, as opposed to the more general Cournot model.

In the linear Cournot model, we write that $P(Q) = a - bQ$ (for some a and b) and $C_k(q_k) = c_k q_k$. As an example, we might have that P(Q) = $100 –$2 × Q, which would mean that at a price of $40, 30 units of the good will be bought total. The constants $c_k$ represent the marginal cost of production for each firm, and the linearity of the cost function means that the cost of producing the next unit is always the same, regardless of how many have been produced before. (This is unrealistic, as generally it’s cheaper per unit to produce large quantities of a good than to produce small quantities.) Now we can write out the profit-maximization equations for the linear Cournot model. $\frac{d\Pi_k}{dq_k} = P(Q) + q_k \frac{dP}{dQ} - \frac{dC_k}{dq_k} = a - bQ - b q_k - c_k = 0$. Rewriting, we get $q_k + Q = \frac{a - c_k}{b}$. We can’t immediately solve this for $q_k$, because remember that Q is the sum of all the quantities produced. All n of the quantities we’re trying to solve are in each equation, so to solve the system of equations we have to do some linear algebra! $2q_1 + q_2 + q_3 + ... + q_n = \frac{a - c_1}{b} \\ q_1 + 2q_2 + q_3 + ... + q_n = \frac{a - c_2}{b} \\ q_1 + q_2 + 2q_3 + ... + q_n = \frac{a - c_2}{b} \\ \ldots \\ q_1 + q_2 + q_3 +... + 2q_n = \frac{a - c_n}{b}$ Translating this to a matrix equation… $\begin{bmatrix} 2 & 1 & 1 & 1 & 1 & \ldots \\ 1 & 2 & 1 & 1 \\ 1 & 1 & 2 & & \ddots \\ 1 & 1 & & \ddots \\ 1 & & \ddots \\ \vdots \end{bmatrix} \begin{bmatrix} q_1 \\ q_2 \\ q_3 \\ \vdots \\ q_{n-2} \\ q_{n-1} \\ q_n \end{bmatrix} = \frac{1}{b} \begin{bmatrix} a - c_1 \\ a - c_2 \\ a - c_3 \\ \vdots \\ a - c_{n-2} \\ a - c_{n-1} \\ a - c_n \end{bmatrix}$ Now if we could only find the inverse of the first matrix, we’d have our solution! $\begin{bmatrix} q_1 \\ q_2 \\ q_3 \\ \vdots \\ q_{n-2} \\ q_{n-1} \\ q_n \end{bmatrix} = \begin{bmatrix} 2 & 1 & 1 & 1 & 1 & \ldots \\ 1 & 2 & 1 & 1 \\ 1 & 1 & 2 & & \ddots \\ 1 & 1 & & \ddots \\ 1 & & \ddots \\ \vdots \end{bmatrix} ^{-1} \frac{1}{b} \begin{bmatrix} a - c_1 \\ a - c_2 \\ a - c_3 \\ \vdots \\ a - c_{n-2} \\ a - c_{n-1} \\ a - c_n \end{bmatrix}$ I found the inverse of this matrix by using the symmetry in the matrix to decompose it into two matrices that were each easier to work with: $\mathcal{I} = \begin{bmatrix} 1 & 0 & 0 & 0 \ldots \\ 0 & 1 & 0 \\ 0 & 0 & \ddots \\ 0 \\ \vdots \end{bmatrix}$ $\mathcal{J} = \begin{bmatrix} 1 & 1 & 1 & 1 \ldots \\ 1 & 1 & 1 \\ 1 & 1 & \ddots \\ 1 \\ \vdots \end{bmatrix}$ $\begin{bmatrix} 2 & 1 & 1 & 1 & 1 & \ldots \\ 1 & 2 & 1 & 1 \\ 1 & 1 & 2 & & \ddots \\ 1 & 1 & & \ddots \\ 1 & & \ddots \\ \vdots \end{bmatrix} = \mathcal{I} + \mathcal{J}$ As a hypothesis, suppose that the inverse matrix has a similar form (one value for the diagonal elements, and another value for all off-diagonal elements). This allows us to write an equation for the inverse matrix: $(\mathcal{I} + \mathcal{J}) (A \mathcal{I} + B \mathcal{J}) = \mathcal{I}$ To solve this, we’ll use the following easily proven identities. $\mathcal{I} \cdot \mathcal{I} = \mathcal{I} \\ \mathcal{I} \cdot \mathcal{J} = \mathcal{J} \\ \mathcal{J} \cdot \mathcal{I} = \mathcal{J} \\ \mathcal{J} \cdot \mathcal{J} = n \mathcal{J} \\$ $(\mathcal{I} + \mathcal{J}) (A \mathcal{I} + B \mathcal{J}) \\ = A \mathcal{I} + A \mathcal{J} + B \mathcal{J} + nB \mathcal{J} \\ = A \mathcal{I} + \left( A + B(n+1) \right) \mathcal{J} \\ = \mathcal{I}$ $A = 1 \\ A + B(n+1) = 0$ $A = 1 \\ B = - \frac{1}{n+1}$ $(\mathcal{I} + \mathcal{J})^{-1} = \mathcal{I} - \frac{1}{n+1} \mathcal{J} = \frac{1}{n+1} \begin{bmatrix} n & -1 & -1 & -1 & -1 & \ldots \\ -1 & n & -1 & -1 \\ -1 & -1 & n & & \ddots \\ -1 & -1 & & \ddots \\ -1 & & \ddots \\ \vdots \end{bmatrix}$ Alright awesome! Our hypothesis turned out to be true! (And it would have even if the entries in our matrix hadn’t been 1s and 2s. This is a really cool general method to find inverses of this family of matrices.) Now we just use this inverse matrix to solve for the output from each firm! $\begin{bmatrix} q_1 \\ q_2 \\ q_3 \\ \vdots \\ q_{n-2} \\ q_{n-1} \\ q_n \end{bmatrix} = (\mathcal{I} - \frac{1}{n+1} \mathcal{J}) \ \frac{1}{b} \begin{bmatrix} a - c_1 \\ a - c_2 \\ a - c_3 \\ \vdots \\ a - c_{n-2} \\ a - c_{n-1} \\ a - c_n \end{bmatrix}$ Define: $\mathcal{C} = \sum_{i=1}^{n}{c_i}$ $q_k = \frac{1}{b} (a - c_k - \frac{1}{n+1} \sum_{i=1}^{n}{(a - c_i)}) \\ ~~~~ = \frac{1}{b} (a - c_k - \frac{1}{n+1} (an - \mathcal{C})) \\ ~~~~ = \frac{1}{b} (\frac{a + \mathcal{C}}{n+1} - c_k)$ $Q^* = \sum_{k=1}^n {q_k} \\ ~~~~~ = \frac{1}{b} \sum_{k=1}^n ( \frac{a + \mathcal{C}}{n+1} - c_k ) \\ ~~~~~ = \frac{1}{b} \left( \frac{n}{n+1} (a + \mathcal{C}) - \mathcal{C} \right) \\ ~~~~~ = \frac{1}{b} \left( \frac{n}{n+1} a - \frac{\mathcal{C}}{n+1} \right) \\ ~~~~~ = \frac {an - \mathcal{C}} {b(n+1)}$ $P^* = a - bQ^* \\ ~~~~~ = a - \frac{an - \mathcal{C}}{n+1} \\ ~~~~~ = \frac{a + \mathcal{C}}{n+1}$ $\Pi_k^* = q_k^* P^* - c_k q_k^* \\ ~~~~~ = \frac{1}{b} (\frac{a+\mathcal{C}}{n+1} - c_k) \frac{a + \mathcal{C}}{n+1} - \frac{c_k}{b} (\frac{a + \mathcal{C}}{n+1} - c_k) \\ ~~~~~ = \frac{1}{b} \left( \left( \frac{a + \mathcal{C}}{n+1} \right)^2 - 2c_k\left( \frac{a + \mathcal{C}}{n+1} \right) + c_k^2 \right) \\ ~~~~~ = \frac{1}{b} \left( \frac{a + \mathcal{C}}{n+1} - c_k \right)^2$ And there we have it, the full solution to the general linear Cournot model! Let’s discuss some implications of these results. First of all, let’s look at the two extreme cases: monopoly and perfect competition. Monopoly: n = 1 $Q^* = \frac{1}{2b} (a - c) \\ P^* = \frac{1}{2} (a + c) \\ \Pi^* = \frac{1}{b} \left( \frac{a - c}{2} \right)^2$ Perfect Competition: n → ∞ $q_k^* \rightarrow \frac{1}{b} (\bar c - c_k) \\ Q^* \rightarrow \frac{1}{b} (a - \bar c) \\ P^* \rightarrow \bar c \\ \Pi^* \rightarrow \frac{1}{b} (\bar c - c_k)^2$ The first observation is that the behavior of the market under monopoly looks very different from the case of perfect competition. For one thing, notice that the price under perfect competition is always going to be lower than the price under monopoly. This is a nice demonstration of the so-called monopoly markup. The quantity $a$ intuitively corresponds to the highest possible price you could get for the product (the most that the highest bidder would pay). And the quantity $c$, the production cost, is the lowest possible price at which the product would be sold. So the monopoly price is the average of the highest price you could get for the good and the lowest price at which it could be sold. The flip side of the monopoly markup is that less of the good is produced and sold under a monopoly than under perfect competition. There are trades that could be happening (trades which would be mutually beneficial!) which do not occur. Think about it: the monopoly price is halfway between the cost of production and the highest bidder’s price. This means that there are a bunch of people that would buy the product at above the cost of production but below the monopoly price. And since the price they would buy it for is above the cost of production, this would be a profitable exchange for both sides! But alas, the monopoly doesn’t allow these trades to occur, as it would involve lowering the price for everybody, including those who are willing to pay a higher price, and thus decreasing net profit. Things change as soon as another firm joins the market. This firm can profitably sell the good at a lower price than the monopoly price and snatch up all of their business. This introduces a downward pressure on the price. Here’s the exact solution for the case of duopoly. Duopoly: n = 2 $q_1 = \frac{1}{3b} (a - 2c_1 + c_2) \\ q_2 = \frac{1}{3b} (a + c_1 - 2c_2) \\ Q^* = \frac{1}{3b} (2a - c_1 - c_2) \\ P^* = \frac{1}{3} (a + c_1 + c_2) \\ \Pi_1^* = \frac{1}{3b} (a - 2c_1 + c_2)^2 \\ \Pi_2^* = \frac{1}{3b} (a + c_1 - 2c_2)^2 \\$ Interestingly, in the duopoly case the market price still rests at a value above the marginal cost of production for either firm. As more and more firms enter the market, competition pushes the price down further and further until, in the limit of perfect competition, it converges to the cost of production. The implication of this is that in the limit of perfect competition, firms do not make any profit! This may sound a little unintuitive, but it’s the inevitable consequence of the line of argument above. If a bunch of companies were all making some profit, then their price is somewhere above the cost of production. But this means that one company could slightly lower its price, thus snatching up all the customers and making massively more money than its competitors. So its competitors will all follow suit, pushing down their prices to get back their customers. And in the end, all the firms will have just decreased their prices and their profits, even though every step in the sequence appeared to be the rational and profitable action by each firm! This is just an example of a coordination problem. If the companies could all just agree to hold their price fixed at, say, the monopoly price, then they’d all be better off. But each individual has a strong monetary incentive to lower their price and gather all the customers. So the price will drop and drop until it can drop no more (that is, until it has reached the cost of production, at which point it is no longer profitable for a company to lower their price). This implies that in some sense, the limit of perfect competition is the best possible outcome for consumers and the worst outcome for producers. Every consumer that values the product above the cost of its production will get it, and they will all get it at the lowest possible price. So the consumer surplus will be enormous. And companies producing the product make no net profit; any attempt to do so immediately loses them their entire customer base. (In which case, what is the motivation for the companies to produce the product in the first place? This is known as the Bertrand paradox.) We can also get the easier-to-solve special case where all firms have the same cost of production. Equal Production Costs $\forall k (c_k = c)$ $q_k^* = \frac{1}{n+1} \frac{a - c}{b} \\ Q^* = \frac{n}{n+1} \frac{a - c}{b} \\ P^* = \frac{a + nc}{n + 1} \\ \Pi^* = \frac{1}{b} \left( \frac{a - c}{n+1} \right)^2$ It’s curious that in the Cournot model, prices don’t immediately drop to production levels as soon you go from a monopoly to a duopoly. After all, the intuitive argument I presented before works for two firms: if both firms are pricing the goods at any value above zero, then each stands to gain by lowering the price a slight bit and getting all the customers. And this continues until the price settles at the cost of production. We didn’t build in any ability of the firms to collude to the model, so what gives? What the Cournot model tells us is certainly more realistic (we don’t expect a duopoly to behave like a perfectly competitive market), but where does this realism come from? The answer is that in a certain sense we did build in collusion between firms from the start, in the form of agreement on what price to sell at. Notice that our model did not allow different firms to set different prices. In this model, firms compete only on quantity of goods sold, not prices. The price is set automatically by the consumer demand function, and no single individual can unilaterally change their price. This constraint is what gives us the more realistic-in-character results that we see, and also what invalidates the intuitive argument I’ve made here. One final observation. Consider the following procedure. You line up a representative from each of the n firms, as well as the highest bidder for the product (representing the highest price at which the product could be sold). Each of the firms states their cost of production (the lowest they could profitably bring the price to), and the highest bidder states the amount that he values the product (the highest price at which he would still buy it). Now all of the stated costs are averaged, and the result is set as the market price of the good. Turns out that this procedure gives exactly the market price that the linear Cournot model predicts! This might be meaningful or just a curious coincidence. But it’s quite surprising to me that the slope of the demand curve ($b$) doesn’t show up at all in the ultimate market price, only the value that the highest bidder puts on the product! # Backwards induction and rationality A fun problem I recently came across: Consider two players: Alice and Bob. Alice moves first. At the start of the game, Alice has two piles of coins in front of her: one pile contains 4 coins and the other pile contains 1 coin. Each player has two moves available: either “take” the larger pile of coins and give the smaller pile to the other player or “push” both piles across the table to the other player. Each time the piles of coins pass across the table, the quantity of coins in each pile doubles. For example, assume that Alice chooses to “push” the piles on her first move, handing the piles of 1 and 4 coins over to Bob, doubling them to 2 and 8. Bob could now use his first move to either “take” the pile of 8 coins and give 2 coins to Alice, or he can “push” the two piles back across the table again to Alice, again increasing the size of the piles to 4 and 16 coins. The game continues for a fixed number of rounds or until a player decides to end the game by pocketing a pile of coins. (from the wiki) (Assume that if the game gets to the final round and the last player decides to “push”, the pot is doubled and they get the smaller pile.) Assuming that they are self-interested, what do you think is the rational strategy for each of Alice and Bob to adopt? What is the rational strategy if they each know that the other reasons about decision-making in the same way that they themselves do? And what happens if two updateless decision theorists are pitted against each other? If you have some prior familiarity with game theory, you might have seen the backwards induction proof right away. It turns out that standard game theory teaches us that the Nash equilibrium is to defect as soon as you can, thus never exploiting the “doubling” feature of the setup. Why? Supposing that you have made it to the final round of the game, you stand to get a larger payout by “defecting” and taking the larger pile rather than the doubled smaller pile. But your opponent knows that you’ll reason this way, so they reason that they are better off defecting the round before… and so on all the way to the first round. This sucks. The game ends right away, and none of that exponential goodness gets taken advantage of. If only Alice and Bob weren’t so rational! We can show that this conclusion follows as long as the three things are true of Alice and Bob: 1. Given a choice between a definite value A and a smaller value B, both Alice and Bob will choose the larger value (A). 2. Both Alice and Bob can accurately perform deductive reasoning. 3. Both (1.) and (2.) are common knowledge to Alice and Bob. It’s pretty hard to deny the reasonableness of any of these three assumptions! Here’s a related problem: An airline loses two suitcases belonging to two different travelers. Both suitcases happen to be identical and contain identical antiques. An airline manager tasked to settle the claims of both travelers explains that the airline is liable for a maximum of$100 per suitcase—he is unable to find out directly the price of the antiques.

To determine an honest appraised value of the antiques, the manager separates both travelers so they can’t confer, and asks them to write down the amount of their value at no less than $2 and no larger than$100. He also tells them that if both write down the same number, he will treat that number as the true dollar value of both suitcases and reimburse both travelers that amount. However, if one writes down a smaller number than the other, this smaller number will be taken as the true dollar value, and both travelers will receive that amount along with a bonus/malus: $2 extra will be paid to the traveler who wrote down the lower value and a$2 deduction will be taken from the person who wrote down the higher amount. The challenge is: what strategy should both travelers follow to decide the value they should write down?

(again, from the wiki)

Suppose you put no value on honesty, and only care about getting the most money possible. Further, suppose that both travelers reason the same way about decision problems, and that they both know this fact (and that they both know that they both know this fact, and so on).

The first intuition you might have is that both should just write down $100. But if you know that your partner is going to write down$100, then you stand to gain one whole dollar by defecting and writing $99 (thus collecting the$2 bonus for a total of $101). But if they know that you’re going to write$99, then they stand to gain one whole dollar by defecting and writing $98 (thus netting$100). And so on.

In the end both of these unfortunate “rational” individuals end up writing down $2. Once again, we see the tragedy of being a rational individual. Of course, we could take these thought experiments to be an indication not of the inherent tragedy of rationality, but instead of the need for a better theory of rationality. For instance, you might have noticed that the arguments we used in both cases relied on a type of reasoning where each agent assumes that they can change their decision, holding fixed the decision of the other agent. This is not a valid move in general, as it assumes independence! It might very well be that the information about what decision you make is relevant to your knowledge about what the other agent’s decision will be. In fact, when we stipulated that you reason similarly to the other agent, we are in essence stipulating an evidential relationship between your decision and theirs! So the arguments we gave above need to be looked at more closely. If the agents do end up taking into account their similarity, then their behavior is radically different. For example, we can look at the behavior of updateless decision theory: two UDTs playing each other in the Centipede game “push” every single round (including the final one!), thus ending up with exponentially higher rewards (on the order of$2N, where N is the number of rounds). And two UDTs in the Traveller’s Dilemma would write down $100, thus both ending up roughly$98 better off than otherwise. So perhaps we aren’t doomed to a gloomy view of rationality as a burden eternally holding us back!

One final problem.

Two players, this time with just one pile of coins in front of them. Initially this pile contains just 1 coin. The players take turns, and each turn they can either take the whole pile or push it to the other side, in which case the size of the pile will double. This will continue for a fixed number of rounds or until a player ends the game by taking the pile.

On the final round, the last player has a choice of either taking all the coins or pushing them over, thus giving the entire doubled pile to their opponent. Both players are perfectly self-interested, and this fact is common knowledge. And finally, suppose that who goes first is determined by a coin flip.

Standard decision theory obviously says that the first person should just take the 1 coin and the game ends there. What would UDT do here? What do you think is the rational policy for each player?

# Consistently reflecting on decision theory

There are many principles of rational choice that seem highly intuitively plausible at first glance. Sometimes upon further reflection we realize that a principle we initially endorsed is not quite as appealing as we first thought, or that it clashes with other equally plausible principles, or that it requires certain exceptions and caveats that were not initially apparent. We can think of many debates in philosophy as clashes of these general principles, where the thought experiments generated by philosophers serve as the data that put on display their relative merits and deficits. In this essay, I’ll explore a variety of different principles for rational decision making and consider the ways in which they satisfy and frustrate our intuitions. I will focus in especially on the notion of reflective consistency, and see what sort of decision theory results from treating this as our primary desideratum.

I want to start out by illustrating the back-and-forth between our intuitions and the general principles we formulate. Consider the following claim, known as the dominance principle: If a rational agent believes that doing A is better than not doing A in every possible world, then that agent should do A even if uncertain about which world they are in. Upon first encountering this principle, it seems perfectly uncontroversial and clearly valid. But now consider the following application of the dominance principle:

“A student is considering whether to study for an exam. He reasons that if he will pass the exam, then studying is wasted effort. Also, if he will not pass the exam, then studying is wasted effort. He concludes that because whatever will happen, studying is wasted effort, it is better not to study.” (Titelbaum 237)

The fact that this is clearly a bad argument casts doubt on the dominance principle. It is worth taking a moment to ask what went wrong here. How did this example turn our intuitions on their heads so completely? Well, the flaw in the student’s line of reasoning was that he was ignoring the effect of his studying on whether or not he ends up passing the exam. This dependency between his action and the possible world he ends up in should be relevant to his decision, and it apparently invalidates his dominance reasoning.

A restricted version of the dominance principle fixes this flaw: If a rational agent prefers doing A to not doing A in all possible worlds and which world they are in is independent of whether they do A or not, then that agent should do A even if they are uncertain about which world they are in. I’ll call this the simple dominance principle. This principle is much harder to disagree with than our starting principle, but the caveat about independence greatly limits its scope. It applies only when our uncertainty about the state of the world is independent of our decision, which is not the case in most interesting decision problems. We’ll see by the end of this essay that even this seemingly obvious principle can be made to conflict with another intuitively plausible principle of rational choice.

The process of honing our intuitions and fine-tuning our principles like this is sometimes called seeking reflective consistency, where reflective consistency is the hypothetical state you end up in after a sufficiently long period of consideration. Reflective consistency is achieved when you have reached a balance between your starting intuitions and other meta-level desiderata like consistency and simplicity, such that your final framework is stable against further intuition-pumping. This process has parallels in other areas of philosophy such as ethics and epistemology, but I want to suggest that it is particularly potent when applied to decision theory. The reason for this is that a decision theory makes recommendations for what action to take in any given setup, and we can craft setups where the choice to be made is about what decision theory to adopt. I’ll call these setups self-reflection problems. By observing what choices a decision theory makes in self-reflection problems, we get direct evidence about whether the decision theory is reflectively consistent or not. In other words, we don’t need to do all the hard work of allowing thought experiments to bump our intuitions around; we can just take a specific decision algorithm and observe how it behaves upon self-reflection!

What we end up with is the following principle: Whatever decision theory we end up endorsing should be self-recommending. We should not end up in a position where we endorse decision theory X as the final best theory of rational choice, but then decision theory X recommends that we abandon it for some other decision theory that we consider less rational.

The connection between self-recommendation and reflective consistency is worth fleshing out in a little more detail. I am not saying that self-recommendation is sufficient for reflective consistency. A self-recommending decision theory might be obviously in contradiction with our notion of rational choice, such that any philosopher considering this decision theory would immediately discard it as a candidate. Consider, for instance, alphabetical decision theory, which always chooses the option which comes alphabetically first in its list of choices. When faced with a choice between alphabetical decision theory and, say, evidential decision theory, alphabetical decision theory will presumably choose itself, reasoning that ‘a’ comes before ‘e’. But we don’t want to call this a virtue of alphabetical decision theory. Even if it is uniformly self-recommending, alphabetical decision theory is sufficiently distant from any reasonable notion of rational choice that we can immediately discard it as a candidate.

On the other hand, even though not all self-recommending theories are reflectively consistent, any reflectively consistent decision theory must be self-recommending. Self-recommendation is a necessary but not sufficient condition for an adequate account of rational choice.

Now, it turns out that this principle is too strong as I’ve phrased it and requires a few caveats. One issue with it is what I’ll call the problem of unfair decision problems. For example, suppose that we are comparing evidential decision theory (henceforth EDT) to causal decision theory (henceforth CDT). (For the sake of time and space, I will assume as background knowledge the details of how each of these theories work.) We put each of them up against the following self reflection problem:

An omniscient agent peeks into your brain. If they see that you are an evidential decision theorist, they take all your money. Otherwise they leave you alone. Before they peek into your brain, you have the ability to modify your psychology such that you become either an evidential or causal decision theorist. What should you do?

EDT reasons as follows: If I stay an EDT, I lose all my money. If I self-modify to CDT, I don’t. I don’t want to lose all my money, so I’ll self-modify to CDT. So EDT is not self-recommending in this setup. But clearly this is just because the setup is unfairly biased against EDT, not because of any intrinsic flaw in EDT. In fact, it’s a virtue of a decision theory to not be self-recommending in such circumstances, as doing so indicates a basic awareness of the payoff structure of the world it faces.

While this certainly seems like the right thing to say about this particular decision problem, we need to consider how exactly to formalize this intuitive notion of “being unfairly biased against a decision theory.” There are a few things we might say here. For one, the distinguishing feature of this setup seems to be that the payout is determined not based off the decision made by an agent, but by their decision theory itself. This seems to be at the root of the intuitive unfairness of the problem; EDT is being penalized not for making a bad decision, but simply for being EDT. A decision theory should be accountable for the decisions it makes, not for simply being the particular decision theory that it happens to be.

In addition, by swapping “evidential decision theory” and “causal decision theory” everywhere in the setup, we end up arriving at the exact opposite conclusion (evidential decision theory looks stable, while causal decision theory does not). As long as we don’t have any a priori reason to consider one of these setups more important to take into account than the other, then there is no net advantage of one decision theory over the other. If a decision problem belongs to a set of equally a priori important problems obtained by simply swapping out the terms for different decision theories, and no decision theory comes out ahead on the set as a whole, then perhaps we can disregard the entire set for the purposes of evaluating decision theories.

The upshot of all of this is that what we should care about is decision problems that don’t make any direct reference to a particular decision theory, only to decisions. We’ll call such problems decision-determined. Our principle then becomes the following: Whatever decision theory we end up endorsing should be self-recommending in all decision-determined problems.

There’s certainly more to be said about this principle and if any other caveats need be applied to it, but for now let’s move on to seeing what we end up with when we apply this principle in its current state. We’ll start out with an analysis of the infamous Newcomb problem.

You enter a room containing two boxes, one opaque and one transparent. The transparent box contains $1,000. The opaque box contains either$0 or $1,000,000. Your choice is to either take just the opaque box (one-box) or to take both boxes (two-box). Before you entered the room, a predictor scanned your brain and created a simulation of you to see what you would do. If the simulation one-boxed, then the predictor filled the opaque box with$1,000,000. If the simulation two-boxed, then the opaque box was left empty. What do you choose?

EDT reasons as follows: If I one-box, then this gives me strong evidence that I have the type of brain that decides to one-box, which gives me strong evidence that the predictor’s simulation of me one-boxed, which in turn gives me strong evidence that the opaque box is full. So if I one-box, I expect to get $1,000,000. On the other hand, if I two-box, then this gives me strong evidence that my simulation two-boxed, in which case the opaque box is empty. So if I two-box, I expect to get only$1,000. Therefore one-boxing is better than two-boxing.

CDT reasons as follows: Whether the opaque box is full or empty is already determined by the time I entered the room, so my decision has no causal effect upon the box’s contents. And regardless of the contents of the box, I always expect to leave $1,000 richer by two-boxing than by one-boxing. So I should two-box. At this point it’s important to ask whether Newcomb’s problem is a decision determined problem. After all, the predictor decides whether to fill the transparent box by scanning your brain and stimulating you. Isn’t that suspiciously similar to our earlier example of penalizing agents based off their decision theory? No. The simulator decides what to do not by evaluating your decision theory, but by its prediction about your decision. You aren’t penalized for being a CDT, just for being the type of agent that one-boxes. To see this you only need to observe that any decision theory that one-boxes would be treated identically to CDT in this problem. The determining factor is the decision, not the decision theory. Now, let’s make the Newcomb problem into a test of reflective consistency. Instead of your choice being about whether to one-box or to two-box while in the room, your choice will now take place before you enter the room, and will be about whether to be an evidential decision theorist or a causal decision theorist when in the room. What does each theory do? EDT’s reasoning: If I choose to be an evidential decision theorist, then I will one-box when in the room. The predictor will simulate me as one-boxing, so I’ll end up walking out with$1,000,000. If I choose to be a causal decision theorist, then I will two-box when in the room, the predictor will predict this, and I’ll walk out with only $1,000. So I will stay an EDT. Interestingly, CDT agrees with this line of reasoning. The decision to be an evidential or causal decision theorist has a causal effect on how the predictor’s simulation behaves, so a causal decision theorist sees that the decision to stay a causal decision theorist will end up leaving them worse off than if they had switched over. So CDT switches to EDT. Notice that in CDT’s estimation, the decision to switch ends up making them$999,000 better off. This means that CDT would pay up to $999,000 just for the privilege of becoming an evidential decision theorist! I think that looking at an actual example like this makes it more salient why reflective consistency and self-recommendation is something that we actually care about. There’s something very obviously off about a decision theory that knows beforehand that it will reliably perform worse than its opponent, so much so that it would be willing to pay up to$999,000 just for the privilege of becoming its opponent. This is certainly not the type of behavior that we associate with a rational agent that trusts itself to make good decisions.

Classically, this argument has been phrased in the literature as the “why ain’tcha rich?” objection to CDT, but I think that the objection goes much deeper than this framing would suggest. There are several plausible principles that all apply here, such as that a rational decision maker shouldn’t regret having the decision theory they have, a rational decision maker shouldn’t pay to limit their future options, and a rational decision maker shouldn’t pay to decrease the values in their payoff matrix. The first of these is fairly self-explanatory. One influential response to it has been from James Joyce, who said that the causal decision theorist does not regret their decision theory, just the situation they find themselves in. I’d suggest that this response makes little sense when the situation the find themselves in is a direct result of their decision theory. As for the second and third of these, we could imagine giving a causal decision theorist the choice to pay money to remove the future possibility of two-boxing, or to hire a gunman that would shoot them if they try to take the transparent box. In each of these cases, CDT would endorse the decision to pay. I mention these other principles just to suggest that we have sound philosophical reason to care about self-recommendation independent of the reflective consistency considerations that we started out by discussing.

The takeaway from this modified Newcomb problem is that CDT is not reflectively consistent, and that it will self-modify to EDT in the right circumstances. Does this mean that EDT is more reflectively consistent than CDT? It turns out that no, this is not the case. We can construct another thought experiment in which EDT self-modifies to become CDT, called the Newcomb’s Soda problem:

You were just given either Soda 1 or Soda 2, which are indistinguishable. There was a 50% chance of each being the type you were given. You are now given a choice as to whether to choose chocolate ice cream or vanilla ice cream. Those that had Soda 1 have a strong unconscious inclination to choose chocolate ice cream and will be given $1,000,000 after they choose the ice cream. Those that had Soda 2 have a strong unconscious inclination to choose vanilla ice cream and are given nothing. If you choose vanilla ice cream, you get$1,000. What do you choose?

EDT reasons that choosing chocolate ice cream gives them strong evidence that they were given Soda 1, in which case they will be given $1,000,000. So they would be willing to give up the$1000 reward for choosing vanilla ice cream, in order to increase their chances of getting the million. CDT reasons that what soda they were given is a fixed fact that isn’t causally influenced by the decision they make. So they have a 50% chance of getting the million regardless of their choice, but choosing to take the vanilla guarantees them an additional $1000. Therefore they take vanilla. This is clearly a decision-determined problem, so let’s see how EDT and CDT do upon self-reflection. Given the chance to modify themselves to EDT before being given the soda, a causal decision theorist would see no advantage to doing so. After all, being an evidential decision theorist doesn’t make you more or less likely to get the$1,000,000, it just ensures that you won’t get the $1000. And the evidential decision theorist agrees! By changing to causal decision theory, they guarantee themselves an extra$1000 while not making it any more or less likely that they’ll be given the chocolate soda (and accordingly, the million dollars).

The upshot of this is that both CDT and EDT are reflective inconsistent in the class of decision determined problems. What we seek, then, is a new decision theory that behaves like EDT in the Newcomb problem and like CDT in Newcomb’s Soda. One such theory was pioneered by machine learning researcher Eliezer Yudkowsky, who named it timeless decision theory (henceforth TDT). To deliver different verdicts in the two problems, we must find some feature that allows us to distinguish between their structure. TDT does this by distinguishing between the type of correlation arising from ordinary common causes (like the soda in Newcomb’s Soda) and the type of correlation arising from faithful simulations of your behavior (as in Newcomb’s problem).

This second type of correlation is called logical dependence, and is the core idea motivating TDT. The simplest example of this is the following: two twins, physically identical down to the atomic level, raised in identical environments in a deterministic universe, will have perfectly correlated behavior throughout the lengths of their lives, even if they are entirely causally separated from each other. This correlation is apparently not due to a common cause or to any direct causal influence. It simply arises from the logical fact that two faithful instantiations of the same function will return the same output when fed the same input. Considering the behavior of a human being as an instantiation of an extremely complicated function, it becomes clear why you and your parallel-universe twin behave identically: you are instantiations of the same function! We can take this a step further by noting that two functions can have a similar input-output structure, in which case the physical instantiations of each function will have correlated input-output behavior. This correlation is what’s meant by logical dependence.

To spell this out a bit further, imagine that in a far away country, there are factories that sell very simple calculators. Each calculator is designed to only run only one specific computation. Some factories are multiplication-factories; they only sell calculators that compute 713*291. Others are addition-factories; they only sell calculators that compute 713+291. You buy two calculators from one of these factories, but you’re not sure which type of factory you purchased from. Your credences are 50/50 split between the factory you purchased from being a multiplication-factory and it being an addition-factory. You also have some logical uncertainty regarding what the value of 713*291 is. You are evenly split between the value being 207,481 and the value being 207,483. On the other hand, you have no uncertainty about what the value of 713+291 is; you know that it is 1004.

Now, you press “ENTER” on one of the two calculators you purchased, and find that the result is 207,483. For a rational reasoner, two things should now happen: First, you should treat this result as strong evidence that the factory from which both calculators were bought was a multiplication-factory, and therefore that the other calculator is also a multiplier. And second, you should update strongly on the other calculator outputting 207,483 rather than 207,481, since two calculators running the same computation will output the same result.

The point of this example is that it clearly separates out ordinary common cause correlation from a different type of dependence. The common cause dependence is what warrants you updating on the other calculator being a multiplier rather than an adder. But it doesn’t warrant you updating on the result on the other calculator being specifically 207,483; to do this, we need the notion of logical dependence, which is the type of dependence that arises whenever you encounter systems that are instantiating the same or similar computations.

Connecting this back to decision theory, TDT treats our decision as the output of a formal algorithm, which is our decision-making process. The behavior of this algorithm is entirely determined by its logical structure, which is why there are no upstream causal influences such as the soda in Newcomb’s Soda. But the behavior of this algorithm is going to be correlated with the parts of the universe that instantiate a similar function (as well as the parts of the universe it has a causal influence on). In Newcomb’s problem, for example, the predictor generates a detailed simulation of your decision process based off of a brain scan. This simulation of you is highly logically correlated with you, in that it will faithfully reproduce your behavior in a variety of situations. So if you decide to one-box, you are also learning that your simulation is very likely to one-box (and therefore that the opaque box is full).

Notice that the exact mechanism by which the predictor operates becomes very important for TDT. If the predictor operates by means of some ordinary common cause where no logical dependence exists, TDT will treat its prediction as independent of your choice. This translates over to why TDT behaves like CDT on Newcomb’s Soda, as well as other so-called “medical Newcomb problems” such as the smoking lesion problem. When the reason for the correlation between your behavior and the outcome is merely that both depend on a common input, TDT treats your decision as an intervention and therefore independent of the outcome.

One final way to conceptualize TDT and the difference between the different types of correlation is using structural equation modeling:

Direct causal dependence exists between A and B when A is a function of B or when B is a function of A.
> A = f(B) or B = g(A)

Common cause dependence exists between A and B when A and B are both functions of some other variable C.
> A = f(C) and B = g(C)

Logical dependence exists between A and B when A and B depend on their inputs in similar ways.
> A = f(C) and B = f(D)

TDT takes direct causal dependence and logical dependence seriously, and ignores common cause dependence. We can formally express this by saying that TDT calculates the expected utility of a decision by treating it like a causal intervention and fixing the output of all other instantiations of TDT to be identical interventions. Using Judea Pearl’s do-calculus notation for causal intervention, this looks like:

Here K is the TDT agent’s background knowledge, D is chosen from a set of possible decisions, and the sum is over all possible worlds. This equation isn’t quite right, since it doesn’t indicate what to do when the computation a given system instantiates is merely similar to TDT but not logically identical, but it serves as a first approximation to the algorithm.

You might notice that the notion of logical dependence depends on the idea of logical uncertainty, as without it the result of the computations would be known with certainty as soon as you learn that the calculators came out of a multiplication-factory, without ever having to observe their results. Thus any theory that incorporates logical dependence into its framework will be faced with a problem of logical omniscience, which is to say, it will have to give some account of how to place and update reasonable probability distributions over tautologies.

The upshot of all of this is that TDT is reflectively consistent on a larger class of problems than both EDT and CDT. Both EDT and CDT would self-modify into TDT in Newcomb-like problems if given the choice. Correspondingly, if you throw a bunch of TDTs and EDTs and CDTs into a world full of Newcomb and Newcomb-like problems, the TDTs will come out ahead. However, it turns out that TDT is not itself reflectively consistent on the whole class of decision-determined problems. Examples like the transparent Newcomb problem, Parfit’s hitchhiker, and counterfactual mugging all expose reflective inconsistency in TDT.

Let’s look at the transparent Newcomb problem. The structure is identical to a Newcomb problem (you walk into a room with two boxes, $1000 in one and either$1000000 or $0 in the other, determined based on the behavior of your simulation), except that both boxes are transparent. This means that you already know with certainty the contents of both boxes. CDT two-boxes here like always. EDT also two-boxes, since any dependence between your decision and the box’s contents is made irrelevant as soon as you see the contents. TDT agrees with this line of reasoning; even though it sees a logical dependence between your behavior and your simulation’s behavior, knowing whether the box is full or empty fully screens off this dependence. Two-boxing feels to many like the obvious rational choice here. The choice you face is simply whether to take$1,000,000 or $1,001,000 if the box is full. If it’s empty, your choice is between taking$1,000 or walking out empty-handed. But two-boxing also has a few strange consequences. For one, imagine that you are placed, blindfolded, in a transparent Newcomb problem. You can at any moment decide to remove your blindfold. If you are an EDT, you will reason that if you don’t remove your blindfold, you are essentially in an ordinary Newcomb problem, so you will one-box and correspondingly walk away with $1,000,000. But if you do remove your blindfold, you’ll end up two-boxing and most likely walking away with only$1000. So an EDT would pay up to $999,000, just for the privilege of staying blindfolded. This seems to conflict with an intuitive principle of rational choice, which goes something like: A rational agent should never expect to be worse off by simply gaining information. Paying money to keep yourself from learning relevant information seems like a sure sign of a pathological decision theory. Of course, there are two ways out of this. One way is to follow the causal decision theorist and two-box in both the ordinary Newcomb problem and the transparent problem. This has all the issues that we’ve already discussed, most prominently that you end up systematically and predictably worse off by doing so. If you pit a causal decision theorist against an agent that always one-boxes, even in transparent Newcomb problems, CDT ends up the poorer. And since CDT can reason this through beforehand, they would willingly self-modify to become the other type of agent. What type of agent is this? None of the three decision theories we’ve discussed give the reflectively consistent response here, so we need to invent a new decision theory. The difficulty with any such theory is that it has to be able to justify sticking to its guns and one-boxing even after conditioning on the contents of the box. In general, similar issues will arise whenever the recommendations made by a decision theory are not time-consistent. For instance, the decision that TDT prescribes for an agent with background knowledge K depends heavily on the information that TDT has at the time of prescription. This means that at different times, TDT will make different recommendations for what to do in the same situation (before entering the room TDT recommends one-boxing once in the room, while after entering the room TDT recommends two-boxing). This leads to suboptimal performance. Agents that can decide on one course of action and credibly precommit to it get certain benefits that aren’t available to agents that don’t have this ability. I think the clearest example of this is Parfit’s hitchhiker: You are stranded in the desert, running out of water, and soon to die. A Predictor approaches and tells you that they will drive you to town only if they predict you will pay them$100 once you get there.

All of EDT, CDT, and TDT wish that they could credibly precommit to paying the money once in town, but can’t. Once they are in town they no longer have any reason to pay the $100. The fact that EDT, CDT, and TDT all have time-sensitive recommendations makes them worse off, leaving them all stranded in the desert to die. Each of these agents would willingly switch to a decision theory that doesn’t change their recommendations over time. How would such a decision theory work? It looks like we need a decision theory that acts as if it doesn’t know whether they’re in town even once in town, and acts as if it doesn’t know the contents of the box even after seeing them. One strategy for achieving this behavior is simple; you just decide on your strategy without ever conditioning on the fact that you are in town! The decision theory that arises from this choice is appropriately named updateless decision theory (henceforth UDT). UDT is peculiar in that it never actually updates on any information when determining how to behave. That’s not to say that for UDT, the decision you make does not depend on the information you get through your lifetime. Instead, UDT tells you to choose a policy – a mapping from the possible pieces of information you might receive to possible decisions you could make – that maximizes the expected utility, calculated using your prior on possible worlds. This policy is set from time zero and never changes, and it determines how the UDT agent responds to any information they might receive at later points. So, for instance, a UDT agent reasons that adopting a policy of one-boxing in the transparent Newcomb case regardless of what you see maximizes expected utility as calculated using your prior. So once the UDT agent is in the room with the transparent box, it one-boxes. We can formalize this this by analogy with TDT: One concern with this approach is that a UDT agent might end up making silly decisions as a result of not taking into account information that is relevant to their decisions. But once again, the UDT agent does take into account the information they learn in their lifetime. It’s merely that they decide what to do with that information before receiving it and never update this prescription. For example, suppose that a UDT agent anticipates facing exactly one decision problem in their life, regarding whether to push a button or not. They have a 50% prior credence that pushing the button will result in the loss of$10, and 50% that it will result in gaining $10. Now, at some point before they decide to push the button, they are given the information about whether pushing the button causes you to gain$10 or to lose $10. UDT deals with this by choosing a policy for how to respond to that information in either case. The expected utility maximizing policy here would be to push the button if you learn that pushing the button leads to gaining$10, and to not push the button if you learn the opposite.

Since UDT chooses its preferred policy based on its prior, this recommendation never changes throughout a UDT agent’s lifetime. This seems to indicate that UDT will be self-recommending in the class of all decision-determined problems, although I’m not aware of a full proof of this. If this is correct, then we have reached our goal of finding a self-recommending decision theory. It is interesting to consider what other principles of rational choice ended up being violated along the way. The simple dominance principle that we started off by discussing appears to be an example of this. In the transparent Newcomb problem, there is only one possible world that the agent considers when in the room (the one in which the box is full, say), and in this one world, two-boxing dominates one-boxing. Given that the box is full, your decision to one-box or to two-box is completely independent of the box’s contents. So the simple dominance principle recommends two-boxing. But UDT disagrees.

Another example of a deeply intuitive principle that UDT violates is the irrelevance of impossible outcomes. This principles says that impossible outcomes should not factor into your decision-making process. But UDT seems to often recommend acting as if some impossible world might come to be. For instance, suppose a predictor walks up to you and gives you a choice to either give them $10 or to give them$100. You will not face any future consequences on the basis of your decision (besides whether you’re out $100 or only$10). However, you learn that the predictor only approached you because it predicted that you would give the $10. Do you give the$10 or the $100? UDT recommends giving the$100, because agents that do so are less likely to have been approached by the predictor. But if you’ve already been approached, then you are letting considerations about an impossible world influence your decision process!

Our quest for reflective consistency took us from EDT and CDT to timeless decision theory. TDT used the notion of logical dependence to get self-recommending behavior in the Newcomb problem and medical Newcomb cases. But we found that TDT was itself reflectively inconsistent in problems like the Transparent Newcomb problem. This led us to create a new theory that made its recommendations without updating on information, which we called updateless decision theory. UDT turned out to be a totally static theory, setting forth a policy determining how to respond to all possible bits of information and never altering this policy. The unchanging nature of UDT indicates the possibility that we have found a truly self-recommending decision theory, while also leading to some quite unintuitive consequences.

# Decision Theory

Everywhere below where a Predictor is mentioned, assume that their predictions are made by scanning your brain to create a highly accurate simulation of you and then observing what this simulation does.

All the below scenarios are one-shot games. Your action now will not influence the future decision problems you end up in.

Newcomb’s Problem
Two boxes: A and B. B contains $1,000. A contains$1,000,000 if the Predictor thinks you will take just A, and $0 otherwise. Do you take just A or both A and B? Transparent Newcomb, Full Box Newcomb problem, but you can see that box A contains$1,000,000. Do you take just A or both A and B?

Transparent Newcomb, Empty Box
Newcomb problem, but you can see that box A contains nothing. Do you take just A or both A and B?

Newcomb with Precommitment
Newcomb’s problem, but you have the ability to irrevocably resolve to take just A in advance of the Predictor’s prediction (which will still be just as good if you do precommit). Should you precommit?

Take Opaque First
Newcomb’s problem, but you have already taken A and it has been removed from the room. Should you now also take B or leave it behind?

Smoking Lesion
Some people have a lesion that causes cancer as well as a strong desire to smoke. Smoking doesn’t cause cancer and you enjoy it. Do you smoke?

Smoking Lesion, Unconscious Inclination
Some people have a lesion that causes cancer as well as a strong unconscious inclination to smoke. Smoking doesn’t cause cancer and you enjoy it. Do you smoke?

Smoking and Appletinis
Drinking a third appletini is the kind of act much more typical of people with addictive personalities, who tend to become smokers. I’d like to drink a third appletini, but I really don’t want to be a smoker. Should I order the appletini?

Expensive Hospital
You just got into an accident which gave you amnesia. You need to choose to be treated at either a cheap hospital or an expensive one. The quality of treatment in the two is the same, but you know that billionaires, due to unconscious habit will be biased towards using the expensive one. Which do you choose?

Rocket Heels and Robots
The world contains robots and humans, and you don’t know which you are. Robots rescue people whenever possible and have rockets in their heels that activate whenever necessary. Your friend falls down a mine shaft and will die soon without robotic assistance. Should you jump in after them?

Death in Damascus
If you and Death are in the same city tomorrow, you die. Death is a perfect predictor, and will come where he predicts you will be. You can stay in Damascus or pay $1000 to ﬂee to Aleppo. Do you stay or ﬂee? Psychopath Button If you press a button, all psychopaths will be killed. Only a psychopath would press such a button. Do you press the button? Parfit’s Hitchhiker You are stranded in the desert, running out of water, and soon to die. A Predictor will drive you to town only if they predict you will pay them$1000 once you get there. You have been brought into town. Do you pay?

XOR Blackmail
An honest predictor sends you this letter: “I sent this letter because I predicted that you have termites iﬀ you won’t send me $100. Send me$100.” Do you send the money?

Twin Prisoner’s Dilemma
You are in a prisoner’s dilemma with a twin of yourself. Do you cooperate or defect?

Predictor Extortion
A Predictor approaches you and threatens to torture you unless you hand over $100. They only approached you because they predicted beforehand that you would hand over the$100. Do you pay up?

Counterfactual Mugging
Predictor ﬂips coin which lands heads, and approaches you and asks you for $100. If the coin had landed tails, it would have tortured you if it predicted you wouldn’t give the$100. Do you give?

Newcomb’s Soda
You have 50% credence that you were given Soda 1, and 50% that you were given Soda 2. Those that had Soda 1 have a strong unconscious inclination to choose chocolate ice cream and will be given $1,000,000. Those that had Soda 2 have a strong unconscious inclination to choose vanilla ice cream and are given nothing. If you choose vanilla ice cream, you get$1000. Do you choose chocolate or vanilla ice cream?

Meta-Newcomb Problem
Two boxes: A and B. A contains $1,000. Box B will contain either nothing or$1,000,000. What B will contain is (or will be) determined by a Predictor just as in the standard Newcomb problem. Half of the time, the Predictor makes his move before you by predicting what you’ll do. The other half, the Predictor makes his move after you by observing what you do. There is a Metapredictor, who has an excellent track record of predicting Predictor’s choices as well as your own. The Metapredictor informs you that either (1) you choose A and B and Predictor will make his move after you make your choice, or else (2) you choose only B, and Predictor has already made his choice. Do you take only B or both A and B?

# Sapiens: How Shared Myths Change the World

I recently read Yuval Noah Harari’s book Sapiens and loved it. In additional to fascinating and disturbing details about the evolutionary history of Homo sapiens and a wonderful account of human history, he has a really interesting way of talking about the cognitive abilities that make humans distinct from other species. I’ll dive right into this latter topic in this post.

Imagine two people in a prisoner’s dilemma. To try to make it relevant to our ancestral environment, let’s say that they are strangers running into one another, and each see that the other has some resources. There are four possible outcomes. First, they could both cooperate and team up to catch some food that neither would be able to get on their own, and then share the food. Second, they could both defect, attacking each other and both walking away badly injured. And third and fourth, one could cooperate while the other defects, corresponding to one of them stabbing the other in the back and taking their resources. (Let’s suppose that each of the two are currently holding resources of more value than they could obtain by teaming up and hunting.)

Now, the problem is that on standard accounts of rational decision making, the decision that maximizes expected reward for each individual is to defect. That’s bad! The best outcome for everybody is that the two team up and share the loot, and neither walks away injured!

You might just respond “Well, who cares about what our theory of rational decision making says? Humans aren’t rational.” We’ll come back to this in a bit. But for now I’ll say that the problem is not just that our theory of rationality says that we should defect. It’s that this line of reasoning implies that cooperating is an unstable strategy. Imagine a society fully populated with cooperators. Now suppose an individual appears with a mutation that causes them to defect. This defector outperforms the cooperators, because they get to keep stabbing people in the back and stealing their loot and never have to worry about anybody doing the same to them. The result is then that the “gene for defecting” (speaking very metaphorically at this point; the behavior doesn’t necessarily have to be transmitted genetically) spreads like a virus through the population, eventually transforming our society of cooperators to a society of defectors. And everybody’s worse off.

One the other hand, imagine a society full of defectors. What if a cooperator is born into this society? Well, they pretty much right away get stabbed in the back and die out. So a society of defectors stays a society of defectors, and a society of cooperators degenerates into a society of defectors. The technical way of speaking about this is to say that in prisoner’s dilemmas, cooperation is not a Nash equilibrium – a strategy that is stable against mutations when universally adopted. The only Nash equilibrium is universal defection.

Okay, so this is all bad news. We have good game theoretic reasons to expect society to degenerate into a bunch of people stabbing each other in the back. But mysteriously, the record of history has humans coming together to form larger and larger cooperative institutions. What Yuval Noah Harari and many others argue is that the distinctively human force that saves us from these game theoretic traps and creates civilizations is the power of shared myths.

For instance, suppose that the two strangers happened to share a belief in a powerful all-knowing God that punishes defectors in the afterlife and rewards cooperators. Think about how this shifts the reasoning. Now each person thinks “Even if I successfully defect and loot this other person’s resources, I still will have hell to pay in the afterlife. It’s just not worth it to risk incurring God’s wrath! I’ll cooperate.” And thus we get a cooperative equilibrium!

Still you might object “Okay, but what if an atheist is born into this society of God-fearing cooperative people? They’ll begin defecting and successfully spread through the population, right? And then so much for your cooperative equilibrium.”

The superbly powerful thing about these shared myths is the way in which they can restructure society around them. So for instance, it would make sense for a society with the cooperator-punishing God myth to develop social norms around punishing defectors. The mythical punishment becomes an actual real-world punishment by the myth’s adherents. And this is enough to tilt the game-theoretic balance even for atheists.

The point being: The spreading of a powerful shared myth can shift the game theoretic structure of the world, altering the landscape of possible social structures. What’s more, such myths can increase the overall fitness of a society. And we need not rely on group selection arguments here; the presence of the shared myth increases the fitness of every individual.

A deeper point is that the specific way in which the landscape is altered depends on the details of the shared myth. So if we contrast the God myth above to a God that punishes defectors but also punishes mortals who punish defectors, we lose the stability property that we sought. The suggestion being: different ideas alter the game theoretic balance of the world in different ways, and sometimes subtle differences can be hugely important.

Another take-away from this simple example is that shared myths can become embodied within us, both in our behavior and in our physiology. Thus we come back to the “humans aren’t rational” point: The cooperator equilibrium becomes more stable if the God myth somehow becomes hardwired into our brains. These ideas take hold of us and shape us in their image.

Let’s go further into this. In our sophisticated secular society, it’s not too controversial to refer to the belief in all-good and all-knowing gods as a myth. But Yuval Noah Harari goes further. To him, the concept of the shared myth goes much deeper than just our ideas about the supernatural. In fact, most of our native way of viewing the world consists of a network of shared myths and stories that we tell one another.

After all, the universe is just physics. We’re atoms bumping into one another. There are no particles of fairness or human rights, no quantum fields for human meaning or karmic debts. These are all shared myths. Economic systems consist of mostly shared stories that we tell each other, stories about how much a dollar bill is worth and what the stock price of Amazon is. None of these things are really out there in the world. They are in our brains, and they are there for an important reason: they open up the possibility for societal structures that would otherwise be completely impossible. Imagine having a global trade network without the shared myth of the value of money. Or a group of millions of humans living packed together in a city that didn’t all on some level believe in the myths of human value and respect.

Just think about this for a minute. Humans have this remarkable ability to radically change our way of interacting with one another and our environments by just changing the stories that we tell one another. We are able to do this because of two features of our brains. First, we are extraordinarily creative. We can come up with ideas like money and God and law and democracy and whole-heartedly believe in them, to the point that we are willing to sacrifice our lives for them. Second, we are able to communicate these ideas to one another. This allows the ideas to spread and become shared myths. And most remarkably, all of these ideas (capitalism and communism, democracy and fascism) are running on essentially the same hardware! In Harari’s words:

While the behaviour patterns of archaic humans remained fixed for tens of thousands of years, Sapiens could transform their social structures, the nature of their interpersonal relations, their economic activities and a host of other behaviours within a decade or two. Consider a resident of Berlin, born in 1900 and living to the ripe age of one hundred. She spent her childhood in the Hohenzollern Empire of Wilhelm II; her adult years in the Weimar Republic, the Nazi Third Reich and Communist East Germany; and she died a citizen of a democratic and reunited Germany. She had managed to be a part of five very different sociopolitical systems, though her DNA remained exactly the same.

# Anthropic reasoning in everyday life

Thought experiment from a past post:

A stranger comes up to you and offers to play the following game with you: “I will roll a pair of dice. If they land snake eyes (i.e. they both land 1), you give me one dollar. Otherwise, if they land anything else, I give you a dollar.”

Do you play this game?

[…]

Now imagine that the stranger is playing the game in the following way: First they find one person and offer to play the game with them. If the dice land snake eyes, then they collect a dollar and stop playing the game. Otherwise, they find ten new people and offer to play the game with them. Same as before: snake eyes, the stranger collects $1 from each and stops playing, otherwise he moves on to 100 new people. Et cetera forever. When we include this additional information about the other games the stranger is playing, then the thought experiment becomes identical in form to the dice killer thought experiment. Thus updating on the anthropic information that you have been kidnapped gives a 90% chance of snake-eyes, which means you have a 90% chance of losing a dollar and only a 10% chance of gaining a dollar. Apparently you should now not take the offer! This seems a little weird. Shouldn’t it be irrelevant if the game if being offered to other people? To an anthropic reasoner, the answer is a resounding no. It matters who else is, or might be, playing the game, because it gives us additional information about our place in the population of game-players. Thus far this is nothing new. But now we take one more step: Just because you don’t know the spatiotemporal distribution of game offers doesn’t mean that you can ignore it! So far the strange implications of anthropic reasoning have been mostly confined to bizarre thought experiments that don’t seem too relevant to the real world. But the implication of this line of reasoning is that anthropic calculations bleed out into ordinary scenarios. If there is some anthropically relevant information that would affect your probabilities, then you need to consider the probability that this information In other words, if somebody comes up to you and makes you the offer described above, you can’t just calculate the expected value of the game and make your decision. Instead, you have to consider all possible distributions of game offers, calculate the probability of each, and average over the implied probabilities! This is no small order. For instance, suppose that you have a 50% credence that the game is being offered only one time to one person: you. The other 50% is given to the “dice killer” scenario: that the game is offered in rounds to a group that decuples in size each round, and that this continues until the dice finally land snake-eyes. Presumably you then have to average over the expected value of playing the game for each scenario. $EV_1 = - \1 \cdot \frac{35}{36} + \1 \cdot \frac{1}{36} = \ \frac{34}{36} \approx \0.94 \\~\\ EV_2 = \1 \cdot 0.1 + - \1 \cdot 0.9 = - \ 0.80 \\~\\ EV = 0.50 \cdot EV_1 + 0.50 \cdot EV_2 \approx \ .07$ In this case, the calculation wasn’t too bad. But that’s because it was highly idealized. In general, representing your knowledge of the possible distributions of games offered seems quite difficult. But the more crucial point is that it is apparently not enough to go about your daily life calculating the expected value of the decisions facing you. You have to also consider who else might be facing the same decisions, and how this influences your chances of winning. Can anybody think of a real-life example where these considerations change the sign of the expected value calculation? # Pushing anti-anthropic intuitions A stranger comes up to you and offers to play the following game with you: “I will roll a pair of dice. If they land snake eyes (i.e. they both land 1), you give me one dollar. Otherwise, if they land anything else, I give you a dollar.” Do you play this game? Here’s an intuitive response: Yes, of course you should! You have a 35/36 chance of gaining$1, and only a 1/36 chance of losing $1. You’d have to be quite risk averse to refuse those odds. What if the stranger tells you that they are giving this same bet to many other people? Should that change your calculation? Intuitively: No, of course not! It doesn’t matter what else the stranger is doing with other people. What if they tell you that they’ve given this offer to people in the past, and might give the offer to others in the future? Should that change anything? Once again, it seems intuitively not to matter. The offers given to others simply have nothing to do with you. What matters are your possible outcomes and the probabilities of each of these outcomes. And what other people are doing has nothing to do with either of these. … Right? Now imagine that the stranger is playing the game in the following way: First they find one person and offer to play the game with them. If the dice land snake eyes, then they collect a dollar and stop playing the game. Otherwise, they find ten new people and offer to play the game with them. Same as before: snake eyes, the stranger collects$1 from each and stops playing, otherwise he moves on to 100 new people. Et cetera forever.

We now ask the question: How does the average person given the offer do if they take the offer? Well, no matter how many rounds of offers the stranger gives, at least 90% of people end up in his last round. That means that at least 90% of people end up giving over $1 and at most 10% gain$1. This is clearly net negative for those that hand over money!

Think about it this way: Imagine a population of individuals who all take the offer, and compare them to a population that all reject the offer. Which population does better on average?

For the population who takes the offer, the average person loses money. An upper bound on how much they lose is 10% ($1) + 90% (-$1) = -$.80. For the population that reject the offer, nobody gains money or loses It either: the average case is exactly$0. $0 is better than -$.80, so the strategy of rejecting the offer is better, on average!

This thought experiment is very closely related to the dice killer thought experiment. I think of it as a variant that pushes our anti-anthropic-reasoning intuitions. It just seems really wrong to me that if somebody comes up to you and offers you this deal that has a 35/36 chance of paying out you should reject it. The details of who else is being offered the deal seem totally irrelevant.

But of course, all of the previous arguments I’ve made for anthropic reasoning apply here as well. And it is just true that the average person that rejects the offer does better than the average person that accepts it. Perhaps this is just another bullet that we have to bite in our attempt to formalize rationality!

# An expected value puzzle

Consider the following game setup:

Each round of the game starts with you putting in all of your money. If you currently have $10, then you must put in all of it to play. Now a coin is flipped. If it lands heads, you get back 10 times what you put in ($100). If not, then you lose it all. You can keep playing this game until you have no more money.

What does a perfectly rational expected value reasoner do?

Supposing that this reasoner’s sole goal is to maximize the quantity of money that they own, then the expected value for putting in the money is always greater than 0. If you put in $X, then you stand a 50% chance of getting$10X back and a 50% chance of losing \$X. Thus, your expected value is 5X – X/2 = 9X/2.

This means that the expected value reasoner that wants to maximize their winnings would keep putting in their money until, eventually, they lose it all.

What’s wrong with this line of reasoning (if anything)? Does it serve as a reductio ad absurdum of expected value reasoning?

# The Anthropic Dice Killer

Today we discuss anthropic reasoning.

## The Problem

Imagine the following scenario:

One piece of information that you have is that you are aware of the maniacal schemes of your captor. His plans began by capturing one random person. He then rolled a pair of dice to determine their fate. If the dice landed snake eyes (both 1), then the captive would be killed. If not, then they would be let free.

But if they are let free, the killer will search for new victims, and this time bring back ten new people and lock them alone in rooms. He will then determine their fate just as before, with a pair of dice. Snake eyes means they die, otherwise they will be let free and he will search for new victims (ten times as many as he just let free).

His murder spree will continue until the first time he rolls snake eyes. Then he will kill the group that he currently has imprisoned and retire from the serial-killer life.

Now. You become aware of a risky way out of the room you are locked in and to freedom. The chances of surviving this escape route are only 50%. Your choices are thus either (1) to traverse the escape route with a 50% chance of survival or (2) to just wait for the killer to roll his dice, and hope that it doesn’t land snake eyes.

What should you do?

Your chance of dying if you stay and wait is just the chance that the dice lands snake eyes. The probability of snake eyes is just 1/36 (1/6 for each dice landing 1).

So your chance of death is only 1/36 (≈ 3%) if you wait, and it’s 50% if you try to run for it. Clearly, you are better off waiting!

## But…

You guessed it, things aren’t that easy. You have extra information about your situation besides just how the dice works, and you should use it. In particular, the killing pattern of your captor turns out to be very useful information.

Ask the following question: Out of all of the people that have been captured or will be captured at some point by this madman, how many of them will end up dying? This is just the very last group, which, incidentally, is the largest group.

Consider: if the dice land snake eyes the first time they are rolled, then only one person is ever captured, and this person dies. So the fraction of those captured that die is 100%.

If they lands snake eyes the second time they are rolled, then 11 people total are captured, 10 of whom die. So the fraction of those captured that die is 10/11, or ≈ 91%.

If it’s the third time, then 111 people total are captured, 100 of whom die. Now the fraction is just over 90%.

In general, no matter how many times the dice rolls before landing snake eyes, it always ends up that over 90% of those captured end up being in the last round, and thus end up dying.

So! This looks like bad news for you… you’ve been captured, and over 90% of those that are captured always die. Thus, your chance of death is guaranteed to be greater than 90%.

The escape route with a 50% survival chance is looking nicer now, right?

## Wtf is this kind of reasoning??

What we just did is called anthropic reasoning. Anthropic reasoning really just means updating on all of the information available to you, including indexical information (information about your existence, age, location, and so on). In this case, the initial argument neglected the very crucial information that you are one of the people that were captured by the killer. When updating on this information, we get an answer that is very very different from what we started with. And in this life-or-death scenario, this is an important difference!

You might still feel hesitant about the answer we got. After all, if you expect a 90% chance of death, this means that you expect a 90% chance for the dice to land snake eyes. But it’s not that you think the dice are biased or anything… Isn’t this just blatantly contradictory?

This is a convincing-sounding rebuttal, but it’s subtly wrong. The key point is that even though the dice are fair, there is a selection bias in the results you are seeing. This selection bias amounts to the fact that when the dice inevitably lands snake-eyes, there are more people around to see it. The fact that you are more likely than 1/36 to see snake-eyes is kind of like the fact that if you are given the ticket of a random concert-goer, you have a higher chance of ending seeing a really popular band than if you just looked at the current proportion of shows performed by really popular bands.

It’s kind of like the fact that in your life you will spend more time waiting in long lines than short lines, and that on average your friends have more friends than you. This all seems counterintuitive and wrong until you think closely about the selection biases involved.

Anyway, I want to impress upon you that 90% really is the right answer, so I’ll throw some math at you. Let’s calculate in full detail what fraction of the group ends up surviving on average.

By the way, the discrepancy between the baseline chance of death (1/36) and the anthropic chance of death (90%) can be made as large as you like by manipulating the starting problem. Suppose that instead of 1/36, the chance of the group dying was 1/100, and instead of the group multiplying by 10 in size each round, it grew by a factor of 100. Then the baseline chance of death would be 1%, and the anthropic probability would be 99%.

We can find the general formula for any such scenario:

IF ANYBODY CAN SOLVE THIS, PLEASE TELL ME! I’ve been trying for too long now and would really like an analytic general solution. 🙂

There is a lot more to be said about this thought experiment, but I’ll leave it there for now. In the next post, I’ll present a slight variant on this thought experiment that appears to give us a way to get direct Bayesian evidence for different theories of consciousness! Stay tuned.