# Six Case Studies in Consequentialist Reasoning

Consequentialism is a family of moral theories that say that an act is moral or immoral based on its consequences. If an act has overall good consequences then it is moral, and if it has bad consequences then it is immoral. What precisely counts as a “good” or “bad” consequence is what distinguishes one consequentialist theory from another. For instance, act utilitarians say that the only morally relevant feature of the consequences of our actions is the aggregate happiness and suffering produced, while preference utilitarians say that the relevant feature of the consequences is the number and strength of desires satisfied. Another form of consequentialism might strike a balance between aggregate happiness and social equality.

What all these different consequentialist theories have in common is that the ultimate criteria being used to evaluate the moral status of an action is only a function of the consequences of that action, as opposed to, say, the intentions behind the action, or whether the action is an instance of a universalizable Kantian rule.

In this essay, we’ll explore some puzzles in consequentialist theories that force us to take a more nuanced and subtle view of consequentialism. These puzzles are all adapted from Derek Parfit’s Reasons and Persons, with very minor changes.

First, we’ll consider a simple puzzle regarding how exactly to evaluate the consequences of one’s actions, when one is part of a collective that jointly accomplishes some good.

Case 1: There are 100 miners stuck in a mineshaft with flood waters rising. These men can be brought to the surface in a lift raised by weights on long levers. The leverage is such that just four people can stand on a platform and provide sufficient weight to raise the lift and save the lives of the hundred men. But if any fewer than four people stand on the platform, it will not be enough to raise the lift. As it happens, you and three other people happen to be standing there. The four of you stand on the platform, raising the lift and saving the lives of the hundred men.

The question for us to consider is, how many lives did you save by standing on the platform? The answer to this question matters, because to be a good consequentialist, each individual needs to be able to compare their contribution here to the contribution they might make by going elsewhere. As a first thought, we might say that you saved 100 lives by standing on the platform. But the other three people were in the same position as you, and it seems a little strange to say that all four of you saved 100 lives each (since there weren’t 400 lives saved total). So perhaps we want to say that each of you saved one quarter of the total: 25 lives each.

Parfit calls this the Share-of-the-Total View. We can characterize this view as saying that in general, if you are part of a collective of N people who jointly save M lives, then your share of lives saved is M/N.

There are some big problems with this view. To see this, let’s amend Case 1 slightly by adding an opportunity cost.

Case 2: Just as before, there are 100 miners stuck in a mineshaft with flood waters rising, and they can be saved by four or more people standing on a platform. This time though, you and four other people happen to be standing there. The other four are going to stand on the platform no matter what you do. Your choice is either to stand on the platform, or to go elsewhere to save 10 lives. What should you do?

The correct answer here is obviously that you should leave to save the 10 lives. The 100 miners will be saved whether you stay or leave, and the 10 lives will be lost if you stick around. But let’s consider what the Share-of-the-Total View says. According to this view, if you stand on the platform, your share of the lives saved is 100/5 = 20. And if you leave to go elsewhere, you only save 10 lives. So you save more lives by staying and standing on the platform!

This is a reductio of the Share-of-the-Total View. We must revise this view to get a sensible consequentialist theory. Parfit’s suggestion is that we say that when you join others who are doing good, the good that you do is not just your own share of the total benefit. You should also add to your share the change that you caused in the shares of the benefits produced by each other by joining. On their own, the four would each have a share of 25 lives. So by joining, you have a share of 20 lives, minus the 5 lives that have been reduced from the share of each of the other four. In other words, by joining, you have saved 20 – 5(4) lives, in other words, 0 lives. And of course, this is the right answer, because you have done nothing at all by stepping onto the platform!

Applying our revised view to Case 1, we see that if you hadn’t stepped onto the platform, zero lives would be saved. By stepping onto the platform, 100 lives are saved. So your share of those lives is 25, plus 25 lives for each of the others that would have had zero without you. So your share is actually 100 lives! The same applies to the others, so in our revised view, each of the four is responsible for saving all 100 lives. Perhaps on reflection this is not so unintuitive; after all, it’s true for each of them that if they change their behavior, 100 lives are lost.

Case 3: Just as in Case 2, there are 100 miners stuck in a mineshaft. You and four others are standing on the platform while the miners are slowly being raised up. Each of you know of an opportunity to save 10 lives elsewhere (a different 10 lives for each of you), but to successfully save the lives you have to leave immediately, before the miners are rescued. The five of you have to make your decision right away, without communicating with each other.

We might think that if each of the five of you reasons as before, each of you will go off and save the other 10 lives (as by staying, they see that they are saving zero lives). In the end, 50 lives will be saved and 100 lost. This is not good! But in fact, it’s not totally clear that this is the fault of our revised view. The problem here is lack of information. If each of the five knew what the other four planned on doing, then they would make the best decision (if all four planned to stay then the fifth would leave, and if one of the other four planned to leave then the fifth would stay). As things stand, perhaps the best outcome would be that all five stay on the platform (losing the opportunity to save 10 extra lives, but ensuring the safety of the 100). If they can use a randomized strategy, then the optimal strategy is to each stay on the platform with probability 97.2848% (saving an expected 100.66 lives)

Let’s move on to another type of scenario.

Case 4: X and Y simultaneously shoot and kill me. Either shot, by itself, would have killed.

The consequence of X’s action is not that I die, because if X had not shot, I would have died by Y’s bullet. And the same goes for Y. So if we’re evaluating the morality of X or Y’s action based on its consequences, it seems that we have to say that neither one did anything immoral. But of course, the two of them collectively did do something immoral by killing me. What this tells us that the consequentialist’s creed cannot be “an act is immoral if its consequences are bad”, as an act can also be immoral if it is part of a set of acts whose collective consequences are bad.

Inheriting immorality from group membership has some problems, though. X and Y collectively did something immoral. But what about the group X, Y, and Barack Obama, who was napping at home when this happened? The collective consequences of their actions were bad as well. So did Obama do something immoral too? No. We need to restrict our claim to the following:

“When some group together harm or benefit other people, this group is the smallest group of whom it is true that, if they had all acted differently, the other people would not have been harmed, or benefited.” -Parfit

A final scenario involves the morality of actions that produce imperceptible consequences.

Case 5: One million torturers stand in front of one million buttons. Each button, if pushed, induces a tiny stretch in each of a million racks, each of which has a victim on it. The stretch induced by a single press of the button is so minuscule that it is imperceptible. But the stretch induced by a million button presses produces terrible pain in all the victims.

Clearly we want to say that each torturer is acting immorally. But the problem is that the consequences of each individual torturer’s action are imperceptible! It’s only when enough of the torturers press the button that the consequence becomes perceptible. So what we seem to be saying is that it’s possible to act immorally, even though your action produces no perceptible change in anybody’s conscious experience, if your action is part of a collection of actions that together produce negative changes in conscious experiences.

This is already unintuitive. But we can make it even worse.

Case 6: Consider the final torturer of the million. At the time that he pushes his button, the victims are all in terrible agony, and his press doesn’t make their pain any perceptibly worse. Now, imagine that instead of there being 999,999 other torturers, there are zero. There is just the one torturer, and the victims have all awoken this morning in immense pain, caused by nobody in particular. The torturer presses the button, causing no perceptible change in the victims’ conditions. Has the torturer done something wrong?

It seems like we have to say the same thing about the torturer in Case 6 as we did in Case 5. The only change is that Nature has done the rest of the harm instead of other human beings, but this can’t matter for the morality of the torturer’s action. But if we believe this, then the scope of our moral concerns is greatly expanded, to a point that seems nonsensical. My temptation here is to say “all the worse for consequentialism, then!” and move to a theory that inherently values intentions, but I am curious if there is a way to make a consequentialist theory workable in light of these problems.

# Diversification is magic

“The only free lunch in finance is diversification”

Harry Markowitz

Suppose you have two assets that you can invest in, and $1000 dollars total to split between them. By the end of year 1, Asset A doubles in price and Asset B halves in price. And by the end of year 2, A halves and B doubles (so that they each return to their starting point). If you had initially invested all$1000 in Asset A, then after year 1 you would have $2000 total. But after year 2, that$2000 is cut in half and you end up back at $1000. If you had invested the$1000 in Asset B, then you would go down to $500 after year 1 and then back up to$1000 after year 2. Either way, you don’t get any profit.

Now, the question is: can you find some way to distribute the $1000 dollars across Assets A and B such that by the end of year 2 you have made a profit? For fairness sake, you cannot change your distribution at the end of year 1 (as that would allow you to take advantage of the advance knowledge of how the prices will change). Whatever weights you choose initially for Assets A and B, at the end of year 1 you must move around your money so that you ensure it’s still distributed with the exact same weights as before. So what do you think? Does it seem impossible to make a profit without changing your distribution of money between years 1 and 2? Amazingly, the answer is that it’s not impossible; you can make a profit! Consider a 50/50 mix of Assets A and B. So$500 initially goes into Asset A and $500 to Asset B. At the end of year 1, A has doubled in price (netting you$500) and B has halved (losing you $250). So at the end of year 1 you have gained 25%. To keep the weights the same at the start of year 2 as they were at the start of year 1, you redistribute your new total of$1250 across A and B according to the same 50/50 mix ($625 in each). What happens now? Now Asset A halves (losing you$312.50) and Asset B doubles (gaining you $625). And by the end of year 2, you end up with$312.50 in A and $1250 in B, for a total of$1562.50! You’ve gained $562.50 by investing in two assets whose prices ultimately are the same as where they started! This is the magic of diversification. By investing in multiple independent assets instead of just one, you can up your rate of return and decrease your risk, sometimes dramatically. Another example: In front of you are two fair coins, A and B. You have in your hand$100 that you can distribute between the two coins any way you like, so long as all $100 is on the table. Now, each of the coins will be tossed. If a coin lands heads, the amount of money beside it will be doubled and returned to you. But if the coin lands tails, then all the money beside it will be destroyed. In front of you are two coins, A and B. You have$100 dollars to distribute between these two coins. You get to choose how to distribute the $100, but at the end every dollar bill must be beside one coin or the other. Coin A has a 60% chance of landing H. If it lands H, the amount of money placed beside it will be multiplied by 2.1 and returned to you. But if it lands T, then the money placed beside it will be lost. Coin B has only a 40% chance of landing H. But the H outcome also has a higher reward! If the coin lands H, then the amount of money beside it will be multiplied by 2.5 and returned to you. And just like Coin A, if it lands T then the money beside it will be destroyed. Coin A: .6 chance of getting 2.1x return Coin B: .4 chance of getting 2.5x return The coins are totally independent. How should you distribute your money in order to maximize your return, given a specific level of risk? (Think about it for a moment before reading on.) If you looked at the numbers for a few moments, you might have noticed that Coin A has a higher expected return than Coin B (126% vs 100%) and is also the safer of the two. So perhaps your initial guess was that putting everything into Coin A would minimize risk and maximize return. Well, that’s incorrect! Let’s do the math. We’ll start by giving the relevant quantities some names. X = amount of money that is put by Coin A Y = amount of money that is put by Coin B (All your money is put down, so Y = 100 – X) We can easily compute the expected amount of money you end up with, as a function of X: Expected return (X) = (0.6)(0.4)(2.1X + 2.5Y) + (0.6)(0.6)(2.1X) + (0.4)(0.4)(2.5Y) + (0.4)(0.6)(0) = 100 + .26 X Alright, so clearly your expected return is maximized by making X as large as possible (by putting all of your money by Coin A). This makes sense, since Coin A’s expected return is higher than Coin B’s. But we’re not just interested in return, we’re also interested in risk. Could we possibly find a better combination of risk and reward by mixing our investments? It might initially seem like the answer is no; after all Coin A is the safer of the two. How could we possibly decrease risk by mixing in a riskier asset to our investments? The key insight is that even though Coin A is the safer of the two, the risk of Coin B is uncorrelated with the risk of Coin A. If you invest everything in Coin A, then you have a 40% chance of losing it all. But if you split your investments between Coin A and Coin B, then you only lose everything if both coins come up heads (which happens with probability .6*.4 = 24%, much lower than 40%!) Let’s go through the numbers. We’ll measure risk by the standard deviation of the possible outcomes. Risk(X)2 = (0.6)(0.4)(2.1X + 2.5Y – 100 – .26X)2 + (0.6)(0.6)(2.1X – 100 – .26X)2 + (0.4)(0.4)(2.5Y – 100 – .26X)2 + (0.4)(0.6)(0 – 100 – .26X)2 This function is just a parabola (to be precise, Risk2 is a parabola, which means that Risk(x) is a hyperbola). Here’s a plot of Risk(X) vs X (amount placed beside A): Looking at this plot, you can see that risk is actually minimized at a roughly even mix of A and B, with slightly more in B. You can also see this minimum risk on a plot of return vs risk: Notice that somebody that puts most of their money on coin B (these mixes are in the bottom half of the curve) is making a strategic choice is strictly dominated. That is, they could choose a different mix that has a higher rate of return for the same risk! Little did you know, but you’ve actually just gotten an introduction to modern portfolio theory! Instead of putting money beside coins, portfolio managers consider investing in assets with various risks and rates of return. The curve of reward vs risk is famous in finance as the Markowitz Bullet. The upper half of the curve is the set of portfolios that are not strictly dominated. This section of the curve is known as the efficient frontier, the basic idea being that no rational investor would put themselves on the lower half. Let’s reframe the problem in terms that would be familiar to somebody in finance. We have two assets, A and B. We’ll model our knowledge of the rate of return of each asset as a normal distribution with some known mean and standard deviation. The mean of the distribution represents the expected rate of return on a purchase of the asset, and the standard deviation represents the risk of purchasing the asset. Asset A has an expected rate of return of 1.2, which means that for every dollar you put in you expect (on average) to get back$0.20 a year from now. Asset B’s expected rate of return is 1.3, so it has a higher average payout. But Asset B is riskier; the standard deviation for A is 0.5, while B’s standard deviation is 0.8. There’s also a risk-free asset that you can invest in, which we’ll call Asset F. This asset has an expected rate of return of 1.1.

Asset F: R = 1.1, σ = 0
Asset A: R = 1.2, σ = 0.5
Asset B: R = 1.3, σ = 0.8

Suppose that you have \$1000 that you want to invest in some combination of these three assets in such a way as to maximize your expected rate of return and minimize your risk. Since rate of return and risk will in general be positively correlated, you have to decide the highest risk that you’re comfortable with. Let’s say that you decide that the highest risk you’ll accept is 0.6. Now, how much of each asset should you purchase?

First of all, let’s disregard the risk-free asset and just consider combinations of A and B. A portfolio of A and B is represented by the weighted sum of A and B. wA is the percentage of your investment in the portfolio that goes to just A, and wB is the percentage that goes towards B. Since we’re just considering a combination of A and B for now, wA + wB = 1. The mean and standard deviation of the new distribution for this portfolio will in general depend on the correlation between A and B, which we’ll call ρ. Perfectly correlated assets have ρ = 1, uncorrelated assets have ρ = 0, and perfectly anti-correlated assets have ρ = -1. Correlation between assets is bad for investors, because it destroys the benefit of diversification. If two assets are perfectly correlated, then you don’t get any lower risk by combining them. On the other hand, if they are perfectly anti-correlated, you can entirely cancel the risk from one with the risk from the other, and get fantastically low risks for great rates of return.

RP = wARA + wBRB
σP2 = wA2σA2 + wB2σB2 + 2ρwAwBσAσB

Let’s suppose that the correlation between A and B is ρ = .2. Since both RP and σP are functions of wA and wB, we can visualize the set of all possible portfolios as a curve on a plot of return-vs-risk:

The Markowitz Bullet again! Each point on the curve represents the rate of return and risk of a particular portfolio obtained by mixing A and B. Just like before, some portfolios dominate others and thus should never be used, regardless of your desired level of risk. In particular, for any portfolio that weights asset A (the less risky one) too highly, there are other portfolios that give a higher rate of return with the exact same risk.

In other words, you should pretty much never purchase only a low-risk low-return item. If your portfolio consists entirely of Asset A, then by mixing in a little bit of the higher-risk item, you can actually end up massively decreasing your risk and upping your rate of return. Of course, this drop in risk is only because the two assets are not perfectly correlated. And it would be even more extreme if we had negatively correlated assets; indeed with perfect negative correlation (as we saw in the puzzle I started this post with), your risk can drop to zero!

Now, we can get our desired portfolio with a risk of 0.6 by just looking at the point on this curve that has σP = 0.6 and calculating which values of wA and wB give us this value. But notice that we haven’t yet used our riskless asset! Can we do better by adding in a little of Asset F to the mix? It turns out that yes, we can. In fact, every portfolio is weakly dominated by some mix of that portfolio and a riskless asset!

We can easily calculate what we get by combining a riskless asset with some other asset X (which can in general be a portfolio consisting of multiple assets):

RP = wXRX + wFRF
σP2 = wX2σX2 + wF2σF2 + 2ρwXwFσXσF = wX2σX2

So σP = wXσX, from which we get that RP = (σPX)RX + (1 – σPX)RF = RF + σP (RX – RF)/σX

What we find is that RPP) is just a line whose slope depends on the rate of return and risk of asset X. So essentially, for any risky asset or portfolio you choose, you can easily visualize all the possible ways it can be combined with a risk-free asset by stretching a line from (0, RF) – the risk and return of the risk-free asset – to (sX, RX) – the risk and return of the risky asset. We can even stretch the line beyond this second point by borrowing some of the risk-free asset in order to buy more of the risky asset, which corresponds to a negative weighting wF.

So, we have a quadratic curve representing the possible portfolios obtained from two risky assets, and a line representing the possible portfolios obtained from a risky asset and a risk-free asset. What we can do now is consider the line that starts at (0, RF) and just barely brushes against the quadratic curve – the tangent line to the curve that passes through (0, RF). This point where the curves meet is known as the tangency portfolio.

Every point on this line is a possible combination of Assets A, B, and F. Why? Well, because the points on the line can be thought of as portfolios consisting of Asset F and the tangency portfolio. And here’s the crucial point: this line is above the curve everywhere except at that single point! What this means is that the combination of A, B, and F dominates combinations of just A and B. For virtually any level of desired risk, you do better by choosing a portfolio on the line than by choosing the portfolio on the quadratic curve! (The only exception to this is the point at which the two curves meet, and in that case you do equally well.)

And that is how you optimize your rate of return for a desired level of risk! First generate the hyperbola for portfolios made from your risky assets, then find the tangent to that curve that passes through the point representing the risk-free asset, and then use that line to calculate the optimal portfolio at your chosen level of risk!

If in the future we develop the ability to make accurate simulations of other humans, a lot of things will change for the weirder. In many situations where agents with access to simulations of each other interact, a strange type of apparent backward causality will arise. For instance…

When you have two agents that both have access to simulations of the other, things get weird. In such situations, there’s no clear notion of whose decision is a response to the other (as both are responding to each other’s future decision), and so there’s no clear notion of whose decision is causally first. But the question of “who comes first” (in this strange non-temporal sense) turns out to be very important to what strategy the various agents should take!

Let’s consider some examples.

# Chicken

Two agents are driving head-on towards each other. Each has a choice to swerve or to stay driving straight ahead. If they both stay, then they crash and die, the worst outcome for all. If one stays and the other swerves, then the one that swerves pays a reputational cost and the one that stays gains some reputation. And if both swerve, then neither gains or loses any reputation. To throw some numerical values on these outcomes, here’s a payoff matrix:

This is the game of chicken. It is an anti-cooperation game, in that if one side knows what the other is going to do, then they want to do the opposite. The (swerve, swerve) outcome is unstable, as both players are incentivized to stay if they know that their opponent will swerve. But so is the (stay, stay) outcome, as this is the worst possible outcome for both players and they both stand to gain by switching to swerve. There are two pure strategy Nash equilibria (swerve, stay) and (stay, swerve), and one mixed strategy equilibria (with the payoff matrix above, it corresponds to swerving with probability 90% and staying with probability 10%).

That’s all standard game theory, in a setting where you don’t have access to your opponent’s algorithm. But now let’s take this thought experiment to the future, where each player is able to simulate the other. Imagine that you’re one of the agents. What should you do?

The first thought might be the following: you have access to a simulation of your opponent. So you can just observe what the simulation of your opponent does, and do the opposite. If you observe the simulation swerving you stay, and if you observe the simulation staying you swerve. This has the benefit of avoiding the really bad (stay, stay) outcomes, while also exploiting opponents that decide to swerve.

The issue is that this strategy is exploitable. While you’re making use of your ability to simulate your opponent, you are neglecting the fact that your opponent is also simulating you. Your opponent can see that this is your strategy, so they know that whatever they decide to play, you’ll play the opposite. So if they decide to tear off their steering wheel to ensure that they will not swerve no matter what, they know that you’ll fall in line and swerve, thus winning them +1 utility and losing you -1 utility. This is a precommitment: a strategy that an agent uses that restricts the number of future choices available to them. It’s quite unintuitive and cool that this sort of tying-your-hands ends up being an enormously powerful weapon for those that have access to it.

In other words, if Agent 1 sees that Agent 2 is treating their decision as a fixed fact and responding to it accordingly, then Agent 1 gets an advantage, as they can precommit to staying and force Agent 2 to yield to them. But if Agent 2 now sees Agent 1 as responding to their algorithm rather than the other way around, then Agent 2 benefits by precommitting to stay. If there’s a fact about which agent precommits “first”, then we can conclusively say that this agent does better, as they can force the outcome they want. But again, this is not a temporal first. Suppose that Agent 2 is asleep at the wheel, about to wake up, and Agent 1 is trying to decide what to do. Agent 1 simulates them and sees that once they wake up they will tear out their steering wheel without even considering what Agent 2 does. Now Agent 1’s hand is forced; he will swerve in response to Agent 2’s precommitment, even though it hasn’t yet been made. It appears that for two agents in a chicken-like scenario, with access to simulations of one another, the best action is to precommit as quickly and firmly as possible, with as little regard for their opponents’ precommitments as they can manage (the best-performing agent is the one that tears off their steering wheel without even simulating their opponent and seeing their precommitments, as this agent puts themselves fully causally behind anybody that simulates them). But this obviously just leads straight to the (stay, stay) outcome!

This pattern of precommitting, then precommitting to not respond to precommitments, then precommiting to not respond to precommitments to not respond to precommitments, and so on, shows up all over the place. Let’s have another example, from the realm of economics.

# Company Coordination and Boycotts

In my last post, I talked about the Cournot model of firms competing to produce a homogenous good. We saw that competing firms face a coordination problem with respect to the prices they set: every firm sees it in their rational self-interest to undercut other firms to take their customers, but then other firms follow suit, ending up with the price dropping for everybody. That’s good for consumers, but bad for producers! The process of undercutting and then re-equilibrating continues until the price is at the bare minimum that it takes for a producer to be willing to make the good – essentially just minutely above the cost of production. At this point, producers are making virtually no profit and consumer surplus is maximized.

This coordination problem, like all coordination problems, could be solved if only the firms had the ability to precommit. Imagine that the heads of all the companies meet up at some point. They all see the problem that they’re facing, and recognize that if they can stop the undercutting, they’ll all be much richer. So they sign on to a vow to never undercut each other. Of course, signing a piece of paper doesn’t actually restrict your future options. Every company is still just as incentivized as before to break the agreement and undercut their competitors. It helps if they have plausible deniability; the ability to say that their price drop was actually not intended to undercut, but a response to some unrelated change in the market. All that the meeting does is introduce some social cost to undercutting and breaking the vow that wasn’t there before.

To actually permanently fix the coordination problem, the companies need to be able to sign on to something that truly and irrevocably ties their hands, giving them no ability to back out later on (equivalent to the tearing-off-the-steering-wheel as a credible precommitment). Maybe they all decide to put some money towards the creation of a final-check mechanism that looks over all price changes and intervenes to stop any changes that it detects to be intended to undercut opponents. This is precommitment in the purer sense of literally removing an option that the firms previously had. And if this type of tying-of-hands was actually possible, then each company would be rationally incentivized to sign on! (Of course, they’d all be looking for ways to cheat the system and break the mechanism at every step, which would make its actual creation a tad bit difficult.)

So, if you give all companies the ability to jointly sign on to a credible precommitment to not undercut their opponents, then they will take that opportunity. This will keep prices high and keep profits flowing in to the companies. Producer surplus will be maximized, and consumers will get the short end of the stick. Is there any way for the consumers to fight back?

Sure there is! All they need is the ability to precommit as well. Suppose that all consumers are now given the opportunity to come together and boycott any and all companies that precommit to not undercutting each other. If every consumer signs on, and if producers know this, then it’s no longer worth it for them to put in place the price-monitoring mechanism, as they’d just lose all their customers! Of course, the consumers now face their own coordination problem; many of them will still value the product at a price higher than that which is being offered by the companies, even if they’re colluding. And each individual reasons that as long as everybody else is still boycotting the companies, it makes little difference if just one mutually beneficial trade is made with them. So the consumers will themselves face the problem of how to enforce the boycott. But let’s assume that the consumers work this out so that they credibly precommit to never buying from a company that credibly precommits to not undercutting its competitors. Now the market price swings back in their favor, dropping to the cost of production! The consumers win! Whoohoo!

But we’re not done yet. It was only worth it for the consumers to sign on to this precommitment because they predicted that the companies would respond to their precommitment. But what if the companies, seeing the boycott-tactic coming, credibly precommit to never yielding to boycotters? Then the consumers, responding to this precommitment, will realize that boycotting will have no effect on prices, and will just cause them all to lose out on mutually beneficial trades! So they won’t boycott, and therefore the producers get the surplus once more. And just like before, this swings back and forth, with the outcome at each stage depending on which agent treats the other agent’s precommitment as being more primal. But if they each run their apparently-best strategy (that is, making their precommitments with no regard to the precommitments of the other party so as to force their hand and place their own precommitments at the beginning of the causal chain), then we end up with the worst possible outcome for all: producers don’t produce anything and consumers don’t consume, and everybody loses out.

This question of how agents that can simulate one another AND precommit to courses of action should ultimately behave is something that I find quite puzzling and am not sure how to resolve.

# Can chess be solved?

There are some chess positions in which one player can force a win. Here’s an extremely simple example:

White just moves their queen up to a8 and checkmates Black. Some winning positions are harder to see than this. Take a look at the following position. Can you find a guaranteed two-move checkmate for White?

And for fun, here’s a harder one, again a guaranteed two-move checkmate, this time by Black:

Notice that in this last one, the opponent had multiple possible moves to choose from. A forced mate does not necessarily mean restricting your opponent to exactly one move on each of their turns. It just means that no matter what they do, you can still guarantee a win. Forced wins can become arbitrarily complicated and difficult to see if you’re looking many moves down the line, as you have to consider all the possible responses your opponent has at each turn. The world record for the longest forced win is the following position:

It’s White’s move, and White does have a strategy for a forced win. It just takes 549 turns to actually do this! (This strategy does violate the 50-move rule, which says that after 50 turns with no pawn moves or capture the game is drawn.) At this link you can watch the entire 549 move game. Most of it is totally incomprehensible to human players, and apparently top chess players that look at this game have reported that the reasoning behind the first 400 moves is opaque to them. Interestingly, White gets a pawn promotion after six moves, and it promotes it to a knight instead of a queen! It turns out that promoting to a queen actually loses for White, and their only way to victory is the knight promotion!

This position is the longest forced win with 7 pieces on the board. There are a few others that are similarly long. All of them represent a glimpse at the perfect play we might expect to see if a hypercomputer could calculate the whole game tree for chess and select the optimal move.

A grandmaster wouldn’t be better at these endgames than someone who had learned chess yesterday. It’s a sort of chess that has nothing to do with chess, a chess that we could never have imagined without computers. The Stiller moves are awesome, almost scary, because you know they are the truth, God’s Algorithm – it’s like being revealed the Meaning of Life, but you don’t understand one word.

Tim Krabbe

With six pieces on the board, the longest mate takes 262 moves (you can play out this position here). For five pieces, it’s 127 moves, for four it’s 43 moves, and the longest 3-man mate takes 28 moves.

But now a natural question arises. We know that a win can be forced in some positions. But how about the opening position? That is, is there a guaranteed win for White (or for Black) starting in this position?

Said more prosaically: Can chess be solved?

Zermelo’s theorem, published in “On an Application of Set Theory to the Theory of the Game of Chess” (1913), was the first formal theorem in game theory. It predated the work of von Neumann (the so-called “father of game theory”) by 15 years. It proves that yes, it is in fact possible to solve chess. We don’t know what the solution is, but we know that either White can force a win, or Black can force a win, or the result will be a draw if both play perfectly.

Of course, the guarantee that in principle there is a solution to chess doesn’t tell us much in practice. The exponential blowup in the number of possible games is so enormous that humans will never find this solution. Nonetheless, I still find it fascinating to think that the mystery of chess is ultimately a product of computational limitations, and that in principle, if we had a hypercomputer, we could just find the unique best chess game and watch it play out, either to a win by one side or to a draw. That would be a game that I would love to see.

✯✯✯

Here’s another fun thing. There’s an extremely bizarre variant on chess called suicide chess (or anti-chess). The goal of suicide chess is to lose all your pieces. Of course, if all the rules of play were the same, it would be practically impossible to win (since your opponent could always just keep refusing to take a piece that you are offering them). To remedy this, in suicide chess, capturing is mandatory! And if you have multiple possible captures, then you can choose among them.

Suicide chess gameplay is extremely complicated and unusual looking, and evaluating who is winning at any given moment tends to be really difficult, as sudden turnarounds are commonplace compared to ordinary chess. But one simplifying factor is that it tends to be easier to restrict your opponents’ moves. In ordinary chess, you can only restrict your opponents’ moves by blocking off their pieces or threatening their king. But in suicide chess, your opponents’ moves are restricted ANY time you put one of your pieces in their line of fire! This feature of the gameplay makes the exponential blow up in possible games more manageable.

Given this, it probably won’t be much of a surprise that suicide chess is, just like ordinary chess, in principle solvable. But here’s the crazy part. Suicide chess is solved!!

That’s right: it was proven a triple of years ago that White can force a win by moving first with e3!

Here’s the paper. The proof amounts to basically running a program that looks at all possible responses to e3 and expands out the game tree, ultimately showing that all branches can be terminated with White losing all pieces and winning the game.

Not only do we know that by starting with e3, White is guaranteed a win, we also know that Black can force a win if White starts with any of the following moves: a3, b4, c3, d3, d4, e4, f3, f4, h3, h4, Nc3, Nf3. As far as I was able to tell, there are only six opening moves remaining for which we don’t know if White wins, Black wins, or they draw: a4, b3, c4, e3, g3, and g4.

✯✯✯

Alright, final chess variant trivia. Infinite chess is just chess, but played on an infinite board.

There’s a mind-blowing connection between infinite chess and mathematical logic. As a refresher, a little while back I discussed the first-order theory of Peano arithmetic. This is the theory of natural numbers with addition and multiplication. If you recall, we found that Peano arithmetic was incomplete (in that not all first-order sentences about the natural numbers can be proven from its axioms). First order PA is also undecidable, in that there exists no algorithm that takes in a first order sentence and returns whether it is provable from the axioms. (In fact, first order logic in general is undecidable! To get decidability, you have to go to a weaker fragment of first order logic known as monadic predicate calculus, in which predicates take only one argument and there are no functions. As soon as you introduce a single binary predicate, you lose decidability.)

Okay, so first order PA (the theory of natural numbers with addition and multiplication) is incomplete and undecidable. But there are weaker fragments of first order PA that are decidable! Take away multiplication, and you have Presburger arithmetic, the theory of natural numbers with addition. Take away addition, and you have Skolem arithmetic, the theory of natural numbers with multiplication. Both of these fragments are significantly weaker than Peano arithmetic (each is unable to prove general statements about the missing operation, like that multiplication is commutative for Presburger arithmetic). But in exchange for this weakness, you get both decidability and completeness!

How does all this relate to infinite chess? Well, consider the problem of determining whether there exists a checkmate in n turns from a given starting position. This seems like a really hard problem, because unlike in ordinary chess, now it’s possible for there to be literally infinite possible moves for a given player from a position. (For instance, a queen on an empty diagonal can move to any of the infinite locations on this diagonal.) So apparently, the game tree for infinite chess, in general, branches infinitely. Given this, we might expect that this problem is not decidable.

Well, it turns out that any instance of this problem (any particular board setup, with the question of whether there’s a mate-in-n for one of the players) can be translated into a sentence in Presburger arithmetic. You do this by translating a position into a fixed length sequence of natural numbers, where each piece is given a sequence of numbers indicating its type and location. The possibility of attacks can be represented as equations about these numbers. And since the distance pieces (bishops, rooks, and queens – those that have in general an infinite number of available moves) all move in straight lines, there are simple equations expressible in Presburger arithmetic that describe whether these pieces can attack other pieces! From the attack relations, you can build up more complicated relations, including the mate-in-n relation.

So we have a translation from the mate-in-n problem to a sentence in Presburger arithmetic. But Presburger arithmetic is decidable! So there must also be a decision procedure for the mate-in-n problem in infinite chess. And not only is there a decision procedure for the mate-in-n problem, but there’s an algorithm that gives the precise strategy that achieves the win in the fewest number of moves!

Here’s the paper in which all of this is proven. It’s pretty wild. Many other infinite chess problems can be proven to be decidable by the same method (demonstrating interpretability of the problem in Presburger arithmetic). But interestingly, not all of them! This has a lot to do with the limitations of first-order logic. The question of whether, in general, there is a forced win from a given position can not be shown to be decidable in this way. (This relates to the general impossibility in first-order logic of expressing infinitely long statements. Determining whether a given position is a winning position for a given player requires looking at the mate-in-n problem, but without any upper bound on what this n is – on how many moves the win may take.) It’s not even clear whether the winning-position problem can be phrased in first-order arithmetic, or whether it requires going to second-order!

The paper takes this one step further. This proof of the decidability of the mate-in-n problem for infinite chess doesn’t crucially rest upon the two-dimensionality of the chess board. We could easily translate the proof to a three-dimensional board, just by changing the way we code positions! So in fact, we have a proof that the mate-in-n problem for k-dimensional infinite chess is decidable!

I’ll leave you with this infinite chess puzzle:

It’s White’s turn. Can they guarantee a checkmate in 12 moves or less?

# There’s a problem with infinity

Last post I described the Ross-Littlewood paradox, in which an ever-expanding quantity of numbered billiard balls are placed into a cardboard box in such a way that after an infinite number of steps the box ends up empty. Here’s a version of this paradox:

Process 1
Step 1: Put 1 through 9 into the box.
Step 2: Take out 1, then put 10 through 19 into the box.
Step 3: Take out 2, then put 20 through 29 into the box.
Step 4: Take out 3, then put 30 through 39 into the box.
And so on.

Box contents after each step
Step 1: 1 through 9
Step 2: 2 through 19
Step 3: 3 through 29
Step 4: 4 through 39
And so on.

Now take a look at a similar process, where instead of removing balls from the box, we just change the number that labels them (so, for example, we paint a 0 after the 1 to turn “Ball 1” to “Ball 10″).

Process 2
Step 1: Put 1 through 9 into the box
Step 2: Change 1 to 10, then put 11 through 19 into the box.
Step 3: Change 2 to 20, then put 21 through 29 in.
Step 3: Change 3 to 30, then put 31 through 39 in.
And so on.

Box contents after each step
Step 1: 1 through 9
Step 2: 2 through 19
Step 3: 3 through 29
Step 4: 4 through 39
And so on.

Notice that the box contents are identical after each step. If that’s all that you are looking at (and you are not looking at what the person is doing during the step), then the two processes are indistinguishable. And yet, Process 1 ends with an empty box, and Process 2 ends with infinitely many balls in the box!

Why does Process 2 end with an infinite number of balls in it, you ask?

Process 2 ends with infinitely many balls in the box, because no balls are ever taken out. 1 becomes 10, which later becomes 100 becomes 1000, and so on forever. At infinity you have all the natural numbers, but with each one appended an infinite number of zeros.

So apparently the method you use matters, even when two methods provably get you identical results! There’s some sort of epistemic independence principle being violated here. The outputs of an agent’s actions should be all that matters, not the specific way in which the agent obtains those outputs! Something like that.

Somebody might respond to this: “But the outputs of the actions aren’t the same! In Process 1, each step ten are added and one removed, whereas in Process 2, each step nine are added. This is the same with respect to the box, but not with respect to the rest of the universe! After all, those balls being removed in Process 1 have to go somewhere. So somewhere in the universe there’s going to be a big pile of discarded balls, which will not be there in Process 2.

This responds holds water as long as our fictional universe doesn’t violate conservation of information, as if not, these balls can just vanish into thin air, leaving no trace of their existence. But that rebuttal feels cheap. Instead, let’s consider another variant that gets at the same underlying problem of “relevance of things that should be irrelevant”, but avoids this problem.

Process 1 (same as before)
Step 1: Put 1 through 9 into the box.
Step 2: Take out 1, then put 10 through 19 into the box.
Step 3: Take out 2, then put 20 through 29 into the box.
Step 4: Take out 3, then put 30 through 39 into the box.
And so on.

Box contents after each step
Step 1: 1 through 9
Step 2: 2 through 19
Step 3: 3 through 29
Step 4: 4 through 39
And so on.

And…

Process 3
Step 1: Put 1 through 9 into the box.
Step 2: Take out 9, then put 10 through 19 into the box.
Step 3: Take out 19, then put 20 through 29 into the box.
Step 4: Take out 29, then put 30 through 39 into the box.
And so on.

Box contents after each step
Step 1: 1 through 9
Step 2: 1 to 8, 10 to 19
Step 3: 1 to 8, 10 to 18, 20 to 29
Step 4: 1 to 8, 10 to 18, 20 to 28, 30 to 39
And so on

Okay, so as I’ve written it, the contents of each box after each step are different in Processes 1 and 3. Just one last thing we need to do: erase the labels on the balls. The labels will now just be stored safely inside our minds as we look over the balls, which will be indistinguishable from one another except in their positions.

Now we have two processes that look identical at each step with respect to the box, AND with respect to the external world. And yet, the second process ends with an infinite number of balls in the box, and the first with none! (Every number that’s not one less than a multiple of ten will be in there.) It appears that you have to admit that the means used to obtain an end really do matter.

But it’s worse than this. You can arrange things so that you can’t tell any difference between the two processes, even when observing exactly what happens in each step. How? Well, if the labelling is all in your heads, then you can switch around the labels you’ve applied without doing any harm to the logic of the thought experiment. So let’s rewrite Process 3, but fill in both the order of the balls in the box and the mental labelling being used:

Process 3
1 2 3 4 5 6 7 8 9
Mentally rotate labels to the right:
9 1 2 3 4 5 6 7 8
Remove the furthest left ball:
1 2 3 4 5 6 7 8
Add the next ten balls to the right in increasing order:
1 2 3 4 5 6 7 8 10 11 12 13 14 15 16 17 18 19
Repeat!

Compare this to Process 1, supposing that it’s done without any relabelling:

Process 1
1 2 3 4 5 6 7 8 9
Remove the furthest left ball:
2 3 4 5 6 7 8 9
Add the next tell balls to the right in increasing order:
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 18 19
Repeat!

If the labels are all in your head, then these two processes are literally identical except for how a human being is thinking about them.

But looking at Process 3, you can prove that after Step 1 there will always be a ball labelled 1 in the box. Same with 2, 3, 4, and all other numbers that are not a multiple of 10 minus one. Even though we remove an infinity of balls, there are ball numbers that are never removed. And if we look at the pile of discarded balls, we’ll see that it consists of 9, 19, 29, 39, and so on, but none of the others. Unless some ball numbers vanish in the process (which they never do!), all the remainders must still be sitting in the box!

So we have two identical-in-every-relevant-way processes, one of which ends with an infinite number of balls in the box and the other with zero. Do you find this troubling? I find this very troubling. If we add some basic assumption that an objective reality exists independent of our thoughts about it, then we’ve obtained a straightforward contradiction.

✯✯✯

Notice that it’s not enough to say “Well, in our universe this process could never be completed.” This is for two reasons:

First of all, it’s actually not obvious that supertasks (tasks involving the completion of an infinite number of steps in a finite amount of time) cannot be performed in our universe. In fact, if space and time are continuous, then every time you wave your hand you are completing a sort of supertask.

You can even construct fairly physically plausible versions of some of the famous paradoxical supertasks. Take the light bulb that blinks on and off at intervals that get shorter and shorter, such that after some finite duration it has blinked an infinity of times. We can’t say that the bulb is on at the end (as that would seem to imply that the sequence 0101010… had a last number) or that it is off (for much the same reason). But these are the only two allowed states of the bulb! (Assume the bulb is robust against bursting and all the other clever ways you can distract from the point of the thought experiment.)

Now, here’s a variant that seems fairly physically reasonable:

A ball is dropped onto a conductive plate that is attached by wire to a light bulb. The ball is also wired to the bulb, so that when the ball contacts the plate, a circuit is completed that switches the light bulb on. Each bounce, the ball loses some energy to friction, cutting its velocity exactly in half. This means that after each bounce, the ball hangs in the air for half as long as it did the previous bounce.

Suppose the time between the first and second bounce was 1 second. Then the time between the second and third will be .5 seconds. And next will be .25 seconds. And so on. At 2 seconds, the ball will have bounced an infinite number of times. So at 2 seconds, the light bulb will have switched on and off an infinite number of times.

And of course, at 2 seconds the ball is at rest on the plate, completing the circuit. So at 2 seconds, upon the completion of the supertask, the light will be on.

Notice that there are no infinite velocities here, or infinite quantities of energy. Just ordinary classical mechanics applied to a bouncing ball and a light bulb. What about infinite accelerations? Well even that is not strictly speaking necessary; we just imagine that each velocity reversal takes some amount of time, which shrinks to zero as the velocity shrinks to zero in such a way as to keep all accelerations finite and sum to a finite total duration.

All this is just to say that we shouldn’t be too hasty in dismissing the real-world possibility of apparently paradoxical supertasks.

But secondly, and more importantly, physical possibility is not the appropriate barometer of whether we should take a thought experiment seriously. Don’t be the person that argues that the fat man wouldn’t be sufficient to stop a trolley’s momentum. When we find that some intuitive conceptual assumptions lead us into trouble, the takeaway is that we need to closely examine and potentially revise our concepts!

Think about Russell’s paradox, which showed that some of our most central intuitions about the concept of a set lead us to contradiction. Whether or not the sets that Bertie was discussing can be pointed to in the physical world is completely immaterial to the argument. Thinking otherwise would have slowed down progress in axiomatic set theory immensely!

These thought experiments are a problem if you believe that it is logically possible for there to be a physical universe in which these setups are instantiated. That’s apparently all that’s required to get a paradox, not that the universe we live in happens to be that one.

So it appears that we have to conclude some limited step in the direction of finitism, in which we rule out a priori the possibility of a universe that allows these types of supertasks. I’m quite uncomfortable with this conclusion, for what it’s worth, but I don’t currently see a better option.

## The Puzzle

You have in front of you an empty box. You also have on hand an infinite source of billiard balls, numbered 0, 1, 2, 3, 4, and so on forever.

At time zero, you place balls 0 and 1 in the box.

In thirty minutes, you remove ball 0 from the box, and place in two new balls (2 and 3).

Fifteen minutes after that, you remove ball 1 from the box, and place in two new balls (4 and 5).

7.5 minutes after that, you remove ball 2 and place in balls 6 and 7.

And so on.

After an hour, you will have taken an infinite number of steps. How many billiard balls will be in the box?

✯✯✯

At time zero, the box contains two balls (0 and 1). After thirty minutes, it contains three (1, 2, and 3). After 45 minutes, it contains four (2, 3, 4, and 5). You can see where this is going…

Naively taking the limit of this process, we arrive at the conclusion that the box will contain an infinity of balls.

But hold on. Ask yourself the following question: If you think that the box contains an infinity of balls, name one ball that’s in there. Go ahead! Give me a single number such that at the end of this process, the ball with that number is sitting in the box.

The problem is that you cannot do this. Every single ball that is put in at some step is removed at some later step. So for any number you tell me, I can point you to the exact time at which that ball was removed from the box, never to be returned to it!

But if any ball that you can name can be proven to not be in the box.. and every ball you put in there was named… then there must be zero balls in the box at the end!

In other words, as time passes and you get closer and closer to the one-hour mark, the number of balls in the box appears to be growing, more and more quickly each moment, until you hit the one-hour mark. At that exact moment, the box suddenly becomes completely empty. Spooky, right??

Let’s make it weirder.

What if at each step, you didn’t just put in two new balls, but one MILLION? So you start out at time zero by putting balls 0, 1, 2, 3, and so on up to 1 million into the empty box. After thirty minutes, you take out ball 1, but replace it with the next 1 million numbered balls. And at the 45-minute mark, you take out ball 2 and add the next 1 million.

What’ll happen now?

Well, the exact same argument we gave initially applies here! Any ball that is put in the box at any point, is also removed at a later point. So you literally cannot name any ball that will still be in the box after the hour is up, because there are no balls left in the box! The magic of infinity doesn’t care about how many more balls you’ve put in than removed at any given time, it still delivers you an empty box at the end!

Now, here’s a final variant. What if, instead of removing the smallest numbered ball each step, you removed the largest numbered ball?

So, for instance, at the beginning you put in balls 0 and 1. Then at thirty minutes you take out ball 1, and put in balls 2 and 3. At 45 minutes, you take out ball 3, and put in balls 4 and 5. And so on, until you hit the one hour mark. Now how many balls are there in the box?

Infinity! Why not zero like before? Well, because now I can name you an infinity of numbers whose billiard balls are still guaranteed to be in the box when the hour’s up. Namely, 0, 2, 4, 6, and all the other even numbered balls are still going to be in there.

Take a moment to reflect on how bizarre this is. We removed the exact same number of balls each step as we did last time. All that changed is the label on the balls we removed! We could even imagine taking off all the labels so that all we have are identical plain billiard balls, and just labeling them purely in our minds. Now apparently the choice of whether to mentally label the balls in increasing or decreasing order will determine whether at the end of the hour the box is empty or packed infinitely full. What?!? It’s stuff like this that makes me sympathize with ultrafinitists.

One final twist: what happens if the ball that we remove each step is determined randomly? Then how many balls will there be once the hour is up? I’ll leave it to you all to puzzle over!

# Can an irrational number raised to an irrational power be rational?

There’s a wonderful proof that yes, indeed, this is possible, and it goes as follows:

Let’s consider the number $\sqrt 2 ^ {\sqrt 2}$. This number is either rational or irrational. Let’s examine each case.

## Case 1: $\sqrt 2 ^ {\sqrt 2}$$\sqrt 2 ^ {\sqrt 2}$ is rational.

Recall that $\sqrt 2$ is irrational. So if $\sqrt 2 ^ {\sqrt 2}$ is rational, then we have proven that it’s possible to raise an irrational number to an irrational power and get a rational value. Done!

## Case 2: $\sqrt 2 ^ {\sqrt 2}$$\sqrt 2 ^ {\sqrt 2}$ is irrational.

In this case, $\sqrt 2 ^ {\sqrt 2}$ and $\sqrt 2$ are both irrational numbers. So what if we raise the $\sqrt 2 ^ {\sqrt 2}$ to the power of $\sqrt 2$?

$\left( \sqrt 2 ^ {\sqrt 2} \right) ^{\sqrt 2} = \sqrt 2 ^ {\sqrt 2 \cdot \sqrt 2} = \sqrt 2 ^ 2 = 2$

So in this case, we have again found a pair of irrational numbers such that one raised to the power of the other is a rational number! Proof complete!

✯✯✯

One thing that’s fun about this proof is that the result is pretty surprising. I would not have guessed a priori that you could get a rational by raising one irrational to another; it just seems like irrationality is the type of thing that would be closed under ordinary arithmetic operations.

But an even cooler thing is that it’s a non-constructive proof. By the end of the proof, we know for sure that there is a pair of irrational numbers such that one raised to the other gives us a rational number, but we have no idea whether it’s ($\sqrt 2$, $\sqrt 2$) or ($\sqrt 2 ^ {\sqrt 2}$,$\sqrt 2$).

(It turns out that it’s the second. The Gelfond–Schneider theorem tells us that for any two non-zero algebraic numbers a and b with a ≠ 1 and b irrational, the number ab is irrational. So $\sqrt 2 ^ {\sqrt 2}$ is in fact irrational.)

Now, most mathematicians are totally fine with non-constructive proofs, as long as they follow all the usual rules of proofs. But there is a branch of mathematics known as constructive mathematics that only accepts constructive proofs of existence. Within constructive mathematics, this proof is not valid!

Now, it so happens that you can prove the irrationality of $\sqrt 2 ^ {\sqrt 2}$ by purely constructive means, but that’s besides the point. To my eyes, the refusal to accept such an elegant and simple proof because it asserts a number’s existence without telling us exactly what it is just looks a little silly!

Along similar lines, here’s one more fun problem.

## Are $\pi + e$$\pi + e$ and $\pi - e$$\pi - e$ transcendental?

We know that $\pi$ and $e$ are both transcendental numbers (i.e. they cannot be expressed as the roots of any polynomial with rational coefficients). But are $\pi + e$ and $\pi - e$ both transcendental?

It turns out that this amazingly simple sounding problem is unsolved to this day! But one thing that we do know is that it can’t be that neither of them are transcendental. Because if this was the case, then their sum $(\pi + e) + (\pi - e) = 2 \pi$ would also not be transcendental, which we know is false! So we know that at least one of them has to be true, using a proof that doesn’t guarantee the truth of either of them! Cool, right?

# The Weirdest Consequence of the Axiom of Choice

This post is about the most startling puzzle I’ve ever heard. First, two warm-up hat puzzles.

## Four prisoners, two black hats, two white hats

Four prisoners are in a line, with the front-most one hidden from the rest behind a wall. Each is wearing a hat, but they can’t see its color. There are two black hats and two white hats, and all four know this.

As soon as any prisoner figures out the color of their own hat, they must announce it. If they’re right, everybody goes free. They cannot talk to each other or look any direction but forwards.

Are they guaranteed freedom?

Solution

Yes, they are! If D sees the same color hat on B and C’s heads, then he can conclude his own hat’s color is the other, so everybody goes free. If he sees differently colored hats, then he cannot conclude his hat color. But C knows this, so if D doesn’t announce his hat color, then C knows that his hat color is different from B’s and they all go free. Done!

Next:

## Ten prisoners, unknown number of black and white hats

Each prisoner is randomly assigned a hat. The number of black and white hats is unknown. Starting from the back, each must guess their hat color. If it matches they’re released, and if not then they’re killed on the spot. They can coordinate beforehand, but cannot exchange information once the process has started.

There is a strategy that gives nine of the prisoners 100% chance of survival, and the other a 50% chance. What is it?

Solution

A10 counts the number of white hats in front of him. If it’s odd, he says ‘white’. Otherwise he says ‘black’. This allows each prisoner to learn their own hat color once they hear the prisoner behind them.

— — —

Alright, now that you’re all warmed up, let’s make things a bit harder.

## Countably infinite prisoners, unknown number of black and white hats, no hearing

There are a countable infinity of prisoners lined up with randomly assigned hats. They each know their position in line.

The hat-guessing starts from A1 at the back of the line. Same consequences as before: the reward for a right guess is freedom and the punishment for a wrong guess is death.

The prisoners did have a chance to meet and discuss a plan beforehand, but now are all deaf. Not only can they not coordinate once the guessing begins, but they also have no idea what happened behind them in the line.

The puzzle for you is: Can you find a strategy that ensures that only finitely many prisoners are killed?

Oh boy, the prisoners are in a pretty tough spot here. We’ll give them a hand; let’s allow them logical omniscience and the ability to memorize an uncountable infinity of information. Heck, let’s also give them an oracle to compute any function they like.

Give it a try!

(…)

(…)

Solution

Amazingly, the answer is yes. All but a finite number of prisoners can be saved.

Here’s the strategy. First start out by identifying white hats with the number 0, and black hats with the number 1. Now the set of all possible sequences of hats in the line is the set of all binary sequences. We define an equivalence relation on such sequences as follows: 𝑥 ~ 𝑦 if and only if 𝑥 and 𝑦 are identical after a finite number of digits in the sequence. This will partition all possible sequences into equivalence classes.

For example, the equivalence class of 0 will just be the subset of the rationals whose binary expansion ends at some point (i.e. the subset of the rationals that can be written as an integer over a power of 2). Why? Well, if a binary sequence 𝑥 is equivalent to .000000…, then after a finite number of digits of 𝑥, it will have to be all 0s forever. And this means that it can be written as some integer over a power of 2.

When the prisoners meet up beforehand, they use the axiom of choice to select one representative from each equivalence class. (Quick reminder: the axiom of choice says that for any set 𝑥 of nonempty disjoint sets, you can form a new set that shares exactly one element with each of the sets in 𝑥.) Now each prisoner is holding in their head an uncountable set of sequences, each one of which represents an equivalence class.

Once they’re in the room, every prisoner can see all but a finite number of hats, and therefore they know exactly which equivalence class the sequence of hats belongs to. So each prisoner guesses their hat color as if they were in the representative sequence from the appropriate equivalence class. Since the actual sequence and the representative sequence differ in only finitely many places (all at the start), all entries are going to be the same after some finite number of prisoners. So every single prisoner after this first finite batch will be saved!

This result is so ridiculous that it actually makes me crack up thinking about it. There is surely some black magic going on here. Remember, each prisoner can see all the hats in front of them, but they know nothing about the total number of hats of each color, so there is no correlation between the hats they see and the hat on their head. And furthermore, they are deaf! So they can’t learn any new information from what’s going on behind them! They literally have no information about the color of their own hat. So the best that each individual could do must be 50/50. Surely, surely, this means that there will be an infinite number of deaths.

But nope! Not if you accept the axiom of choice! You are guaranteed only a finite number of deaths, just a finite number that can be arbitrarily large. How is this not a contradiction? Well, for it to be a contradiction, there has to be some probability distribution over the possible outcomes which says that Pr(finite deaths) = 0. And it turns out that the set of representative sequences form a non-measurable set (a set which cannot have a well-defined size using the ordinary Lebesgue measure). So no probability can be assigned to it (not zero probability, literally undefined)! Now remember that zero deaths occur exactly when the real sequence is one of the representative sequences. This means that no probability can be assigned to this state of affairs. The same thing applies to the state of affairs in which one prisoner dies, or two prisoners, and so on. You literally cannot define a probability distribution over the number of prisoners to die.

By the way, what happens if you have an uncountable infinity of prisoners? Say we make them infinitely thin and then squeeze them along the real line so as to populate every point. Each prisoner can see all but a finite number of the other prisoner’s hats. Let’s even give them hats that have an uncountable number of different colors. Maybe we pick each hat color by just randomly selecting any frequency of visible light.

Turns out that we can still use the axiom of choice to guarantee the survival of all but finitely many prisoners!

One last one.

## Countably infinite prisoners, unknown number of black and white hats, with hearing

We have a countable infinity of prisoners again, each with either a black or white hat, but this time they can hear the colors called out by previous prisoners. Now how well can they do?

The answer? (Assuming the axiom of choice) Every single prisoner can be guaranteed to survive except for the first one, who survives with 50% probability. I really want this to sink in. When we had ten prisoners with ten hats, they could pull this off by using their knowledge of the total number of black and white hats amongst them. Our prisoners don’t have this anymore! They start off knowing nothing about the number of white and black hats besides what they can see in front of them. And yet they still get out of it with all but one prisoner guaranteed freedom.

How do they do this? They start off same as before, defining the equivalence relation and selecting a representative sequence from each equivalence class. Now they label every single sequence with either a 0 or a 1. A sequence gets a 0 if it differs from the representative sequence in its equivalence class in an even number of places, and otherwise it gets a 1. This labeling has the result that any two sequences that differ by exactly one digit have opposite labels.

Now the first person (A) just says the label of the sequence he sees. For the next person up (B), this is the sequence that starts with their hat. And remember, they know which equivalence class they’re in, since they can see everybody in front of them! So all they need to do is consider what the label of the sequence starting with them would be if they had a white hat on. If it would be different than the label they just heard, then they know that their hat is black. And if it would be the same, then their hat is white!

The person in front of B knows the equivalence class they’re in, and now also knows what color B’s hat was. So they can do the exact same reasoning to figure out their hat color! And so on, saving every last countable prisoner besides A..

Let’s see an example of this, for the sequence 100000…

And so on forever, until every last prisoner besides A is free.

This set of results is collectively the most compelling argument I know for rejecting AC, and I love it.

# Introduction to Mathematical Logic (Part 1)

Mathematical logic is the study of the type of reasoning we perform when we do mathematics, and the attempt to formulate a general language as the setting in which all mathematics is done. In essence, it is an attempt to form a branch of mathematics, of which all other branches of mathematics will emerge as special cases.

You might sort of think that when speaking at this level of abstraction, there will nothing general and interesting to say. After all, we’re trying to prove statements not within a particular domain of mathematics, but theorems that are true across a wide swath of mathematics, potentially encompassing all of it.

The surprising and amazing thing is that this is not the case. It turns out that there are VERY general and VERY surprising things you can discover by looking at the logical language of mathematics, a host of results going by names like the Completeness Theorem, the Incompleteness Theorem, the Compactness Theorem, the Löwenheim-Skolem Theorem, and so on. These results inevitably have a great deal of import to our attitudes towards the foundations of mathematics, being that they generally establish limitations or demonstrate eccentricities in the types of things that we can say in the language of mathematics.

My goal in this post is to provide a soft introduction to the art of dealing in mathematics at this level of ultimate abstraction, and then to present some of the most strange things that we know to be true. I think that this is a subject that’s sorely missing this type of soft introduction, and hope I can convey some of the subject’s awesomeness!

— — —

To start out with, why think that there is any subject matter to be explored here? Different branches of mathematics sometimes appear to be studying completely different types of structures. I remember an anecdote from an old math professor of mine, who worked within one very narrow and precisely defined area of number theory, and who told me that when she goes to talks that step even slightly outside her area of specialty, the content of the  lectures quickly incomprehensible to her. Why think that there is such a common language of mathematics, if specialists in mathematics can’t even understand each other when talking between fields?

The key thing to notice here is that although different fields of mathematics are certainly wildly different in many ways, there nevertheless remain certain fundamental features that are shared in all fields. Group theorists, geometrists, and number theorists will all accept the logical inference rule of modus ponens (if P is true and P implies Q, then Q is true), but none of them will accept its converse (if Q is true and P implies Q, then P is true). No matter what area of mathematics you study, you will accept that if P(x) is true for all x, then it is true for any particular x you choose. And so on. These similarities may seem obvious and trivial, but they HAVE to be obvious and trivial to be things that every mathematician agrees on. The goal, then, is to formalize a language that has these fundamental inference rules and concepts built in, and that has many special cases to account for the differences between domains of math, specified by some parameters that are freely chosen by any user of the system.

There are actually several distinct systems that attempt to accomplish this task. Generally speaking, there are three main branches, in order of increasing expressive power: propositional logic, first order (predicate) logic, and second order logic.

## Reasoning In Zeroth Order

Let’s start with propositional logic, sometimes called “zeroth order logic.” Propositional logic is the framework developed to deal with the validity of the following types of arguments:

Argument 1

1. 2+2=4.
2. If 2+2=4, then 1+3=4.
3. So 1+3=4.

Argument 2

1. The Riemann hypothesis is false and P = NP.
2. So P = NP.

Notice that it doesn’t matter if our premises are true or not. Logical validity doesn’t care about this, it just cares that the conclusions really do follow from the premises. This is a sign of the great generality at which we’re speaking. We’re perfectly fine with talking about a mathematical system in which the Riemann hypothesis is false, or in which 2+2 is not 4, just so long as we accept the logical implications of our assumptions.

Propositional logic can express the validity of these arguments by formalizing rules about valid uses of the concepts ‘and’, ‘if…then…’, ‘or’, and so on. It remains agnostic to the subject matter being discussed by not fully specifying the types of sentences that are allowed to be used in the language. Instead, any particular user of the language can choose their set of propositions that they want to speak about.

To flesh this out more, propositional logic fulfills the following three roles:

1. Defines an alphabet of symbols.
2. Specifies a set of rules for which strings are grammatical and which are not.
3. Details rules for how to infer new strings from existing strings.

In more detail:

1. The set of symbols in propositional logic are split into two categories: logical symbols and what I’ll call “fill-in-the-blank” symbols. The logical symbols are (, ), , , ¬, and →. The fill-in-the-blank symbols represent specific propositions, that are specified by any particular user of the logic.
2. Some strings are sensible and others not. For example, the string “P∧∧” will be considered to be nonsensical, while “PQ” will not. Some synonyms for sensible strings are well-formed formulas (WFFs), grammatical sentences, and truth-apt sentences. There is a nice way to inductively generate the set of all WFFs: Any proposition is a WFF, and for any two WFFs F and F’, the following are also WFFs: (FF’), (FF’), ¬F, (F→F’).
3. These include rules like modus ponens (from P and P→Q, derive Q), conjunction elimination (from PQ, derive P), double negation elimination (from ¬¬P, derive P), and several more. They are mechanical rules that tell you how to start with one set of strings and generate new ones in a logically valid way, such that if the starting strings are true than the derived ones must also be true. There are several different but equivalent formulations of the rules of inference in propositional logic.

A propositional language fills in the blanks in the logic. Say that I want to talk about two sentences using propositional logic: “The alarm is going off”, “A robber has broken in.” For conciseness, we’ll abbreviate these two propositions as A for alarm and R for robber. All I’ll do to specify my language is to say “I have two propositions: {A, R}”

The next step is to fill in some of the details about the relationships between the propositions in my language. This is done by supplementing the language with a set of axioms, and we call the resulting constrained structure a propositional theory. For instance, in my example above we might add the following axioms:

1. A→R
2. AR

In plain English, these axioms tell us that (1) if the alarm is going off, then a robber has broken in, and (2) an alarm is going off or a robber has broken in.

Finally, we talk about the models of our theory. Notice that up until now, we haven’t talked at all about whether any statements are true or false, just about syntactic properties like “The string P¬ is not grammatical” and about what strings follow from each other. Now we interpret our theory by seeing what possible assignments of truth values to the WFFs in our language are consistent with our axioms and logical inference rules. In our above example, there are exactly two interpretations:

Model 1: A is true, R is true
Model 2: A is false, R is true

These models can be thought of as the possible worlds that are consistent with our axioms. In one of them, the alarm has gone off and a robber has broken in, and in the other, the alarm hasn’t gone off and the robber has broken in.

Notice that R turns out true in both models. When a formula F is true in all models of a theory, we say that the theory semantically entails F, and write this as T F. When a formula can be proven from the axioms of the theory using the rules of inference given by the logic, then we say that the theory syntactically entails F, and write this as T F.

This distinction between syntax and semantics is really important, and will come back to us in later discussion of several important theorems (notably the completeness and incompleteness theorems). To give a sneak peek: above we found that R was semantically entailed by our theory. If you’re a little familiar with propositional logic, you might have also realized that R can be proven from the axioms. In general, syntactic truths will always be semantic truths (if you can prove something, then it must be true in all models, or else the models would be inconsistent. But a model is by definition a consistent assignment of truth values to all WFFs). But a natural question is: are all semantic consequences of a theory also syntactic consequences? That is, are all universal truths of the theory (things that are true in every model of the theory) provable from the theory?

If the answer is yes, then we say that our logic is complete. And it turns out that the answer is yes, for propositional logic. Whether more complex logics are complete turns out to be a more interesting question. More on this later.

This four-step process I just laid out (logic to language to theory to model) is a general pattern we’ll see over and over again. In general, we have the following division of labor between the four concepts:

1. Logic: The logic tells us the symbols we may use (including some fill-in-the-blank categories of symbols), the rules of grammar, and a set of inference rules for deriving new strings from an existing set.
2. Language: The language fills in the blanks in our logic, fully specifying the set of symbols we will be using.
3. Theory: The theory adds axioms to the language.
4. Model: A model is an assignment of truth values to all WFFs in the language, consistent with the axioms and the inference rules of our logic.

It’s about time that we apply this four-step division to a more powerful logic. Propositional logic is pretty weak. Not much interesting math can be done in a purely propositional language, and it’s wildly insufficient to capture our notion of logically valid reasoning. Consider, for example, the following argument:

1. Socrates is a man.
2. All men are mortal.
3. So, Socrates is mortal.

This is definitely a valid argument. No rational agent could agree that 1 and 2 are true, and yet deny the truth of 3. But can we represent the validity of this argument in propositional logic? No!

Consider that the three sentences “Socrates is a man”, “All men are mortal”, and “Socrates is mortal” are distinct propositions, and the relationships between them are too subtle for propositional logic to capture. Propositional logic can’t see that the first proposition is asserting the membership of Socrates to a general class of things (“men”), and that the second proposition is then making a statement about a universal property of things in this class. It just sees two distinct propositions. To propositional logic, this argument just looks like

1. P
2. Q
3. Therefore, R

But this is not logically valid! We could make it valid by adding as a premise the sentence (PQ)→R, which corresponds to the English sentence “If Socrates is a man and all men are mortal, then Socrates is mortal.” But this should be seen as a tautology, something that is provable in any first order theory that contains the propositions P Q and R, not a required additional assumption. Worse, if somebody came along and stated the proposition A = “Aristotle is a man”, then we would need a whole ‘nother assumption to assert that Aristotle is also mortal! And in general, for any individual instance of this argument, we’d need an independent explanation for its validity. This is not parsimonious, and indicative that propositional logic is missing something big.

Missing what? To understand why this argument is valid, you must be able to reason about objects, properties, and quantification. This is why we must move on to an enormously more powerful and interesting logic: first order logic.

## Reasoning In First Order

First order logic is a logic, so it must fill the same three roles as we saw propositional logic did above. Namely, it must define the alphabet, the grammar, and the inference rules.

Symbols
Logical: ¬ → ( ) =
Variables: x y z w …
Constants: ______
Predicates: ______
Functions: ______

That’s right, in first order we have three distinct categories of fill-in-the-blank symbols. Intuitively, constants will be names that refer to objects, predicates will be functions from objects to truth values, and functions will take objects to objects. To take an everyday example, if the objects in consideration are people, then we might take ‘a’ to be a constant referring to a person named Alex, ’T’ to be a predicate representing ‘is tall’, and ‘f’ to be a function representing “the father of”. So T(a) is either true or false, while f(a) doesn’t have a truth value (it just refers to another object). But T(f(a)) does have a truth value, because it represents the sentence “Alex’s father is tall.”

Next, we define well-formed formulas. This process is more complicated than it was for propositional logic, because we have more types of things than we did before, but it’s not too bad. We start by defining a “term”. The set of terms is inductively generated by the following scheme: All constants and variables are terms, and any function of terms is itself a term. Intuitively, the set of terms is the set of objects that our language is able to “point at”.

With the concept of terms in hand, we can define WFFs through a similar inductive scheme: Any predicate of terms is a WFF. And for any WFFs F and F’, (FF’), (FF’), (¬F), (F→F’), x F, x F are all WFFs. The details of this construction are not actually that important, I just think it’s nice how you can generate all valid first order formulas from fairly simple rules.

Good! We have an alphabet, a grammar, and now all we need from our logic is a set of inference rules. It turns out that this set is just going to be the inference rules from propositional logic, plus some new ones:

Quantifier elimination: From x P(x) derive P(t) (for any term t and predicate P)
Quantifier introduction: From P(t) derive x P(x) (for any term t and predicate P)

That’s it, we’ve defined first order logic! Now let’s talk about a first order language. Just like before, the language is just obtained by filling in the blanks left open by our logic. So for instance, we might choose the following language:

Constants: a
Functions: f
Predicates: none

In specifying our function, we have to say exactly what type of function it is. Functions can take as inputs a single object (“the father of”) or multiple objects (“the nearest common ancestor of”), and this will make a difference to how they are treated. So for simplicity, let’s say that our function f just takes in a single object.

A first order theory will simply be a language equipped with some set of axioms. Using our language above, we might have as axioms:

1. x (f(x) ≠ x)
2. x (f(x) ≠ a)

In plain English, we’re saying that f never takes any object to itself or to a.

And lastly, we get to the models of a first order theory. There’s an interesting difference between models here and models in propositional logic, which is that to specify a first order model, you need to first decide on the size of your set of objects (the size of the ‘universe’, as it’s usually called), and then find a consistent assignment of truth values to all propositions about objects in this universe.

So, for instance, we can start searching for models of our theory above by starting with models with one object, then two objects, then three, and so on. We’ll draw little diagrams below in which points represent objects, and the arrow represents the action of the function f on an object.

1. No 1-element universe.
2. No 2-element universe.
3. 3-element universe
4. 4 element universes
5. And so on…

It’s a fun little exercise to go through these cases and see if you can figure out why there are no models of size 1 or 2, or why the one above is the only model of size 3. Notice, by the way, that in the last two images, we have an object for which there is no explicit term! We can’t get there just using our constant and our functions. Of course, in this case we can still be clever with our quantifiers to talk about the Nameless One indirectly (for instance, in both of these models we can refer to the object that is not equal to a, f(a), or f(f(a))) But in general, the set of all things in the universe is not going to be the same as the set of things in the universe that we can name.

Here’s a puzzle for you: Can you make a first order sentence that says “I have exactly four objects”? That is, can you craft a sentence that, if added as an axiom to our theory, will rule out all models besides the ones that have exactly four elements?

(…)

(Think about it for a moment before moving on…)

(…)

Here’s how to do it for a universe with two objects (as well as a few other statements of interest). The case of four objects follows pretty quickly from this.

• “I have at most two objects” = xyz (z=x z=y)
• “I have exactly two objects” = xy (x≠y z(z=x z=y))
• “I have at least two objects” = xy (x≠y)

Another puzzle: How would you say “I have an infinity of objects”, ruling out all finite models?

(…)

(Think about it for a moment before moving on…)

(…)

This one is trickier. One way to do it is to introduce an infinite axiom schema: “I have at least n objects” for each n.

This brings up an interesting point: theories with infinite axioms are perfectly permissible for us. This is a choice that we make that we could perfectly well deny, and end up with a different and weaker system of logic. How about infinitely long sentences? Are those allowed? No, not in any of the logics we’re talking about here. A logic in which infinite sentences are allowed is called an infinitary logic, and I don’t know too much about such systems (besides that propositional, first, and second order logics are not infinitary).

Okay… so how about infinitely long derivations? Are those allowed? No, we won’t allow those either. This one is more easy to justify, because if infinite derivations were allowed, then we could prove any statement P simply by the following argument “P, because P, because P, because P, because …”. Each step logically follows from the previous, and in any finite system we’d eventually bottom out and realize that the proof has no basis, but in a naive infinite-proof system, we couldn’t see this.

Alright, one last puzzle. How would you form the sentence “I have a finite number of objects”? I.e., suppose you want to rule out all infinite models but keep the finite ones. How can you do it?

(…)

(Think about it for a moment before moving on…)

(…)

This one is especially tricky, because it turns out to be impossible! (Sorry about that.) You can prove, and we will prove in a little bit, that no first order axiom (or even infinite axiom schema) is in general capable of restricting us to finite models. We have run up against our first interesting expressive limitation of first-order logic!

Okay, let’s now revisit the question of completeness that I brought up earlier. Remember, a logic is complete if for any theory in that logic, the theory’s semantic implications are the same as its syntactic implications (all necessary truths are provable). Do you think that first order logic is complete?

The answer: Yes! Kurt Gödel proved that it is in his 1929 doctoral dissertation. Anything that is true in all models of a first order theory, can be proven from the axioms of that theory (and vice versa). This is a really nice feature to have in a logic. It’s exactly the type of thing that David Hilbert was hoping would be true of mathematics in general. (“We must know. We will know!”) But this hope was dashed by the same Kurt Gödel as above, in his infamous incompleteness theorems.

There will be a lot more to say about that in the future, but I’ll stop this post for now. Next time, we’ll harness the power of first order logic to create a first order model of number theory! This will give us a chance to apply some powerful results in mathematical logic, and to discover surprising truths about the logic’s limitations.

# A Talmudic Probability Puzzle

Today we’ll take a little break from the more intense abstract math stuff I’ve been doing, and do a quick dive into a fun probabilistic puzzle I found on the internet.

Background for the puzzle: In Ancient Israel, there was a court of 23 wise men that tried important cases, known as the Sanhedrin. If you were being tried for a crime by the Sanhedrin and a majority of them found you guilty, you were convicted. But there was an interesting twist on this! According to the Talmud (Tractate Sanhedrin: Folio 17a), if the Sanhedrin unanimously found you guilty, you were to be acquitted.

If the Sanhedrin unanimously find [the accused] guilty, he is acquitted. Why? — Because we have learned by tradition that sentence must be postponed till the morrow in hope of finding new points in favour of the defence. But this cannot be anticipated in this case.

Putting aside the dubious logic of this rule, it gives rise to an interesting probability puzzle with a counterintuitive answer. Imagine that an accused murderer has been brought before the Sanhedrin, and that the evidence is strong enough that no judge has any doubt in their mind about his guilt. Each judge obviously wants for the murderer to be convicted, and would ordinarily vote to convict. But under this Talmudic rule, they need to be worried about the prospect of them all voting guilty and therefore letting him off scot-free!

So: If a probability p can be chosen such that each and every judge votes to convict with probability p, and to acquit with probability 1 – p, which p will give them the highest probability of ultimately convicting the guilty man?

Furthermore, imagine that the number of judges is not 23, but some arbitrarily high number. As the number of judges goes to infinity, what does p converge to?

(…)

(…)

(…)

So, it turns out that the optimal p for 23 judges is actually ≈ 75.3%. And as the number of judges goes to infinity? The optimal value of p converges to…

80%!

This was a big shock to me. I think the natural first thought is that when you have thousands and thousands of judges, you only need a minuscule chance for any one judge to vote ‘acquit’ in order to ensure a majority and prevent him from getting off free. So I initially guessed that p would be something like 99%, and would converge to 100% in the limit of infinite judges.

But this is wrong! And of the small sample of mathematically gifted friends I asked this question to, they mostly guessed the same as me.

There’s clearly a balance going on between the risk of a minority voting to convict and the risk of a unanimous vote to convict. For small p, the first of these is ~1 and the second is ~0, and for p ~ 1, the first is ~0 and the second ~1.  It seems that we are naturally underemphasizing the danger of a minority vote to convict, and overemphasizing the danger of the unanimous vote.

Here are some plots of the various relevant values, for different numbers of judges:

One thing to notice is that as the number of judges gets larger, the graph’s peak becomes more and more of a plateau. And in the limit of infinite judges, you can show that the graph is actually just a simple step function: Pr(conviction) = 0 if p < .5, and 1 if p > .5. This means that while yes, technically, 80% is the optimal value, you can do pretty much equally well by choosing any value of p greater than 50%.

My challenge to you is to come up with some justification for the value 80%. Good luck!