If in the future we develop the ability to make accurate simulations of other humans, a lot of things will change for the weirder. In many situations where agents with access to simulations of each other interact, a strange type of apparent backward causality will arise. For instance…
Imagine that you’re competing in a prisoner’s dilemma against an agent that you know has access to a scarily accurate simulation of you. Prior to your decision to either defect or cooperate, you’re mulling over arguments for one course of action over the other. In such a situation, you have to continuously face the fact that for each argument you come up with, your opponent has already anticipated and taken measures to respond to it. As soon as you think up a new line of reasoning, you must immediately update as if your opponent has just heard your thoughts and adjusted their strategy accordingly. Even if your opponent’s decision has already been made and is set in stone, you have no ability to do anything that your opponent hasn’t already incorporated into their decision. Though you are temporally ahead of them, they are in some sense causally ahead of you. Their decision is a response to your (yet-to-be-made) decision, and your decision is not a response to theirs. (I don’t actually believe and am not claiming that this apparent backwards causality is real backwards causality. The arrow of time still points in only one direction and causality follows suit. But it’s worth it in these situations to act as if your opponent has backwards causation abilities.)
When you have two agents that both have access to simulations of the other, things get weird. In such situations, there’s no clear notion of whose decision is a response to the other (as both are responding to each other’s future decision), and so there’s no clear notion of whose decision is causally first. But the question of “who comes first” (in this strange non-temporal sense) turns out to be very important to what strategy the various agents should take!
Let’s consider some examples.
Two agents are driving head-on towards each other. Each has a choice to swerve or to stay driving straight ahead. If they both stay, then they crash and die, the worst outcome for all. If one stays and the other swerves, then the one that swerves pays a reputational cost and the one that stays gains some reputation. And if both swerve, then neither gains or loses any reputation. To throw some numerical values on these outcomes, here’s a payoff matrix:
This is the game of chicken. It is an anti-cooperation game, in that if one side knows what the other is going to do, then they want to do the opposite. The (swerve, swerve) outcome is unstable, as both players are incentivized to stay if they know that their opponent will swerve. But so is the (stay, stay) outcome, as this is the worst possible outcome for both players and they both stand to gain by switching to swerve. There are two pure strategy Nash equilibria (swerve, stay) and (stay, swerve), and one mixed strategy equilibria (with the payoff matrix above, it corresponds to swerving with probability 90% and staying with probability 10%).
That’s all standard game theory, in a setting where you don’t have access to your opponent’s algorithm. But now let’s take this thought experiment to the future, where each player is able to simulate the other. Imagine that you’re one of the agents. What should you do?
The first thought might be the following: you have access to a simulation of your opponent. So you can just observe what the simulation of your opponent does, and do the opposite. If you observe the simulation swerving you stay, and if you observe the simulation staying you swerve. This has the benefit of avoiding the really bad (stay, stay) outcomes, while also exploiting opponents that decide to swerve.
The issue is that this strategy is exploitable. While you’re making use of your ability to simulate your opponent, you are neglecting the fact that your opponent is also simulating you. Your opponent can see that this is your strategy, so they know that whatever they decide to play, you’ll play the opposite. So if they decide to tear off their steering wheel to ensure that they will not swerve no matter what, they know that you’ll fall in line and swerve, thus winning them +1 utility and losing you -1 utility. This is a precommitment: a strategy that an agent uses that restricts the number of future choices available to them. It’s quite unintuitive and cool that this sort of tying-your-hands ends up being an enormously powerful weapon for those that have access to it.
In other words, if Agent 1 sees that Agent 2 is treating their decision as a fixed fact and responding to it accordingly, then Agent 1 gets an advantage, as they can precommit to staying and force Agent 2 to yield to them. But if Agent 2 now sees Agent 1 as responding to their algorithm rather than the other way around, then Agent 2 benefits by precommitting to stay. If there’s a fact about which agent precommits “first”, then we can conclusively say that this agent does better, as they can force the outcome they want. But again, this is not a temporal first. Suppose that Agent 2 is asleep at the wheel, about to wake up, and Agent 1 is trying to decide what to do. Agent 1 simulates them and sees that once they wake up they will tear out their steering wheel without even considering what Agent 2 does. Now Agent 1’s hand is forced; he will swerve in response to Agent 2’s precommitment, even though it hasn’t yet been made. It appears that for two agents in a chicken-like scenario, with access to simulations of one another, the best action is to precommit as quickly and firmly as possible, with as little regard for their opponents’ precommitments as they can manage (the best-performing agent is the one that tears off their steering wheel without even simulating their opponent and seeing their precommitments, as this agent puts themselves fully causally behind anybody that simulates them). But this obviously just leads straight to the (stay, stay) outcome!
This pattern of precommitting, then precommitting to not respond to precommitments, then precommiting to not respond to precommitments to not respond to precommitments, and so on, shows up all over the place. Let’s have another example, from the realm of economics.
Company Coordination and Boycotts
In my last post, I talked about the Cournot model of firms competing to produce a homogenous good. We saw that competing firms face a coordination problem with respect to the prices they set: every firm sees it in their rational self-interest to undercut other firms to take their customers, but then other firms follow suit, ending up with the price dropping for everybody. That’s good for consumers, but bad for producers! The process of undercutting and then re-equilibrating continues until the price is at the bare minimum that it takes for a producer to be willing to make the good – essentially just minutely above the cost of production. At this point, producers are making virtually no profit and consumer surplus is maximized.
This coordination problem, like all coordination problems, could be solved if only the firms had the ability to precommit. Imagine that the heads of all the companies meet up at some point. They all see the problem that they’re facing, and recognize that if they can stop the undercutting, they’ll all be much richer. So they sign on to a vow to never undercut each other. Of course, signing a piece of paper doesn’t actually restrict your future options. Every company is still just as incentivized as before to break the agreement and undercut their competitors. It helps if they have plausible deniability; the ability to say that their price drop was actually not intended to undercut, but a response to some unrelated change in the market. All that the meeting does is introduce some social cost to undercutting and breaking the vow that wasn’t there before.
To actually permanently fix the coordination problem, the companies need to be able to sign on to something that truly and irrevocably ties their hands, giving them no ability to back out later on (equivalent to the tearing-off-the-steering-wheel as a credible precommitment). Maybe they all decide to put some money towards the creation of a final-check mechanism that looks over all price changes and intervenes to stop any changes that it detects to be intended to undercut opponents. This is precommitment in the purer sense of literally removing an option that the firms previously had. And if this type of tying-of-hands was actually possible, then each company would be rationally incentivized to sign on! (Of course, they’d all be looking for ways to cheat the system and break the mechanism at every step, which would make its actual creation a tad bit difficult.)
So, if you give all companies the ability to jointly sign on to a credible precommitment to not undercut their opponents, then they will take that opportunity. This will keep prices high and keep profits flowing in to the companies. Producer surplus will be maximized, and consumers will get the short end of the stick. Is there any way for the consumers to fight back?
Sure there is! All they need is the ability to precommit as well. Suppose that all consumers are now given the opportunity to come together and boycott any and all companies that precommit to not undercutting each other. If every consumer signs on, and if producers know this, then it’s no longer worth it for them to put in place the price-monitoring mechanism, as they’d just lose all their customers! Of course, the consumers now face their own coordination problem; many of them will still value the product at a price higher than that which is being offered by the companies, even if they’re colluding. And each individual reasons that as long as everybody else is still boycotting the companies, it makes little difference if just one mutually beneficial trade is made with them. So the consumers will themselves face the problem of how to enforce the boycott. But let’s assume that the consumers work this out so that they credibly precommit to never buying from a company that credibly precommits to not undercutting its competitors. Now the market price swings back in their favor, dropping to the cost of production! The consumers win! Whoohoo!
But we’re not done yet. It was only worth it for the consumers to sign on to this precommitment because they predicted that the companies would respond to their precommitment. But what if the companies, seeing the boycott-tactic coming, credibly precommit to never yielding to boycotters? Then the consumers, responding to this precommitment, will realize that boycotting will have no effect on prices, and will just cause them all to lose out on mutually beneficial trades! So they won’t boycott, and therefore the producers get the surplus once more. And just like before, this swings back and forth, with the outcome at each stage depending on which agent treats the other agent’s precommitment as being more primal. But if they each run their apparently-best strategy (that is, making their precommitments with no regard to the precommitments of the other party so as to force their hand and place their own precommitments at the beginning of the causal chain), then we end up with the worst possible outcome for all: producers don’t produce anything and consumers don’t consume, and everybody loses out.
This question of how agents that can simulate one another AND precommit to courses of action should ultimately behave is something that I find quite puzzling and am not sure how to resolve.