The Weirdest Consequence of the Axiom of Choice

This post is about the most startling puzzle I’ve ever heard. First, two warm-up hat puzzles.

Four prisoners, two black hats, two white hats

Four prisoners are in a line, with the front-most one hidden from the rest behind a wall. Each is wearing a hat, but they can’t see its color. There are two black hats and two white hats, and all four know this.


As soon as any prisoner figures out the color of their own hat, they must announce it. If they’re right, everybody goes free. They cannot talk to each other or look any direction but forwards.

Are they guaranteed freedom?


Yes, they are! If D sees the same color hat on B and C’s heads, then he can conclude his own hat’s color is the other, so everybody goes free. If he sees differently colored hats, then he cannot conclude his hat color. But C knows this, so if D doesn’t announce his hat color, then C knows that his hat color is different from B’s and they all go free. Done!


Ten prisoners, unknown number of black and white hats


Each prisoner is randomly assigned a hat. The number of black and white hats is unknown. Starting from the back, each must guess their hat color. If it matches they’re released, and if not then they’re killed on the spot. They can coordinate beforehand, but cannot exchange information once the process has started.

There is a strategy that gives nine of the prisoners 100% chance of survival, and the other a 50% chance. What is it?


A10 counts the number of white hats in front of him. If it’s odd, he says ‘white’. Otherwise he says ‘black’. This allows each prisoner to learn their own hat color once they hear the prisoner behind them.

— — —

Alright, now that you’re all warmed up, let’s make things a bit harder.

Countably infinite prisoners, unknown number of black and white hats, no hearing

There are a countable infinity of prisoners lined up with randomly assigned hats. They each know their position in line.


The hat-guessing starts from A1 at the back of the line. Same consequences as before: the reward for a right guess is freedom and the punishment for a wrong guess is death.

The prisoners did have a chance to meet and discuss a plan beforehand, but now are all deaf. Not only can they not coordinate once the guessing begins, but they also have no idea what happened behind them in the line.

The puzzle for you is: Can you find a strategy that ensures that only finitely many prisoners are killed?

Oh boy, the prisoners are in a pretty tough spot here. We’ll give them a hand; let’s allow them logical omniscience and the ability to memorize an uncountable infinity of information. Heck, let’s also give them an oracle to compute any function they like.

Give it a try!




Amazingly, the answer is yes. All but a finite number of prisoners can be saved.

Here’s the strategy. First start out by identifying white hats with the number 0, and black hats with the number 1. Now the set of all possible sequences of hats in the line is the set of all binary sequences. We define an equivalence relation on such sequences as follows: 𝑥 ~ 𝑦 if and only if 𝑥 and 𝑦 are identical after a finite number of digits in the sequence. This will partition all possible sequences into equivalence classes.

Hat Puzzle (2)

For example, the equivalence class of 0 will just be the subset of the rationals whose binary expansion ends at some point (i.e. the subset of the rationals that can be written as an integer over a power of 2). Why? Well, if a binary sequence 𝑥 is equivalent to .000000…, then after a finite number of digits of 𝑥, it will have to be all 0s forever. And this means that it can be written as some integer over a power of 2.

When the prisoners meet up beforehand, they use the axiom of choice to select one representative from each equivalence class. (Quick reminder: the axiom of choice says that for any set 𝑥 of nonempty disjoint sets, you can form a new set that shares exactly one element with each of the sets in 𝑥.) Now each prisoner is holding in their head an uncountable set of sequences, each one of which represents an equivalence class.

Hat Puzzle (3)

Once they’re in the room, every prisoner can see all but a finite number of hats, and therefore they know exactly which equivalence class the sequence of hats belongs to. So each prisoner guesses their hat color as if they were in the representative sequence from the appropriate equivalence class. Since the actual sequence and the representative sequence differ in only finitely many places (all at the start), all entries are going to be the same after some finite number of prisoners. So every single prisoner after this first finite batch will be saved!


This result is so ridiculous that it actually makes me crack up thinking about it. There is surely some black magic going on here. Remember, each prisoner can see all the hats in front of them, but they know nothing about the total number of hats of each color, so there is no correlation between the hats they see and the hat on their head. And furthermore, they are deaf! So they can’t learn any new information from what’s going on behind them! They literally have no information about the color of their own hat. So the best that each individual could do must be 50/50. Surely, surely, this means that there will be an infinite number of deaths.

But nope! Not if you accept the axiom of choice! You are guaranteed only a finite number of deaths, just a finite number that can be arbitrarily large. How is this not a contradiction? Well, for it to be a contradiction, there has to be some probability distribution over the possible outcomes which says that Pr(finite deaths) = 0. And it turns out that the set of representative sequences form a non-measurable set (a set which cannot have a well-defined size using the ordinary Lebesgue measure). So no probability can be assigned to it (not zero probability, literally undefined)! Now remember that zero deaths occur exactly when the real sequence is one of the representative sequences. This means that no probability can be assigned to this state of affairs. The same thing applies to the state of affairs in which one prisoner dies, or two prisoners, and so on. You literally cannot define a probability distribution over the number of prisoners to die.

By the way, what happens if you have an uncountable infinity of prisoners? Say we make them infinitely thin and then squeeze them along the real line so as to populate every point. Each prisoner can see all but a finite number of the other prisoner’s hats. Let’s even give them hats that have an uncountable number of different colors. Maybe we pick each hat color by just randomly selecting any frequency of visible light.

Turns out that we can still use the axiom of choice to guarantee the survival of all but finitely many prisoners!

Colbert’s reaction when he first learned about the uncountable version of the hat problem

One last one.

Countably infinite prisoners, unknown number of black and white hats, with hearing

We have a countable infinity of prisoners again, each with either a black or white hat, but this time they can hear the colors called out by previous prisoners. Now how well can they do?

The answer? (Assuming the axiom of choice) Every single prisoner can be guaranteed to survive except for the first one, who survives with 50% probability. I really want this to sink in. When we had ten prisoners with ten hats, they could pull this off by using their knowledge of the total number of black and white hats amongst them. Our prisoners don’t have this anymore! They start off knowing nothing about the number of white and black hats besides what they can see in front of them. And yet they still get out of it with all but one prisoner guaranteed freedom.

How do they do this? They start off same as before, defining the equivalence relation and selecting a representative sequence from each equivalence class. Now they label every single sequence with either a 0 or a 1. A sequence gets a 0 if it differs from the representative sequence in its equivalence class in an even number of places, and otherwise it gets a 1. This labeling has the result that any two sequences that differ by exactly one digit have opposite labels.


Now the first person (A) just says the label of the sequence he sees. For the next person up (B), this is the sequence that starts with their hat. And remember, they know which equivalence class they’re in, since they can see everybody in front of them! So all they need to do is consider what the label of the sequence starting with them would be if they had a white hat on. If it would be different than the label they just heard, then they know that their hat is black. And if it would be the same, then their hat is white!

The person in front of B knows the equivalence class they’re in, and now also knows what color B’s hat was. So they can do the exact same reasoning to figure out their hat color! And so on, saving every last countable prisoner besides A..

Let’s see an example of this, for the sequence 100000…


Hat Puzzle (6)

Hat Puzzle (7)Hat-Puzzle-8.png

And so on forever, until every last prisoner besides A is free.

This set of results is collectively the most compelling argument I know for rejecting AC, and I love it.

26 thoughts on “The Weirdest Consequence of the Axiom of Choice

  1. This isn’t mathematics, this is brainwashing. The results you prove using the axiom of choice on the continuum are not correct, they DO NOT become more acceptable as you stare at them longer, there is NO ANALOGY with finite problems with prisoners and hats. It is simply impossible for all but finitely many prisoners to guess the correct random color on their head.

    This is a theorem, there is no strategy when hats can be picked at random, i.e. in a universe where every set of reals is Lebesgue measurable.

    So stop trying to make this nonsense look correct or intuitive. It just isn’t. The correct intuitive answer for the infinite problem is that there is no strategy. The axiom of choice is simply bad mathematical practice for sets of size the continuum, and that was the point of O’Connor’s original blog post.

    The reason you get the opposite answer for universes with choice is that they resemble Godel’s L universe, they are ordered. So you can’t place random hats on infinitely many heads in universes with choice, it doesn’t make sense. You have to pick one of the restricted un-random real numbers (subsets of Z) in your universe restricted by having uncountable choice

    1. I don’t disagree with any of the content of this comment, but the tone is puzzling. I feel like it’s pretty clear from reading this post that my stance is “Look, if you allow the axiom of choice you get this totally insane and obviously wrong result. This gives us a good reason to reject the axiom of choice.” So I was not “trying to make this nonsense look correct or intuitive”, and certainly not trying to brainwash anybody into ACCEPTING the axiom of choice! I was simply attempting to describe the proof from AC of existence of a strategy in as understandable a way as possible.

  2. I don’t understand the part that says “if you have an uncountable infinity of prisoners? Say we make them infinitely thin and then squeeze them along the real line so as to populate every point. Each prisoner can see all but a finite number of the other prisoner’s hats … we can still use the axiom of choice to guarantee the survival of all but finitely many prisoners!”

    If the prisoners are indexed by, say, the nonnegative real numbers, then doesn’t each prisoner (except for the very first one) fail to see the uncountably infinite number of hats of the prisoners behind him/her? I agree that you can save all of the prisoners except for those that lie in a bounded interval, but how can you save all but a finite number?

    1. Ah sorry, I worded that badly. Should have said “IF each prisoner can see all but a finite number of the other prisoner’s hats”. In fact, for any cardinality of prisoners and any cardinality of colors, if every prisoner sees all but finitely many of the other hats, then there’s a strategy under which all but finitely many prisoners guess correctly (theorem due to Gabay-O’Connor).

      (Another result from Lenstra I just came across: for any cardinality of prisoners and exactly two hat colors, if every prisoner sees all the hats besides their own then there’s a strategy under which either everybody’s guess is correct or everybody’s guess is incorrect)

  3. I don’t see why this is a reason to reject AC. I mean, sure, this is really unintuitive, but why is it surprising that you would get an unintuitive result? We’re talking about agents that can hold a literal uncountable infinity of strategies in their head and choose between them by looking at the entirety of an infinite sequence. There’s no reason for them to behave the same way as finite agents like ourselves.

    Sure, maybe it’s a reason to say that AC and higher infinities aren’t super useful for practical applications, but I already figured that was the case.

    1. That’s a fair point. The epistemology is weird here… I’m not sure what exactly should count as evidence for or against a proposed axiom of set theory. I think that the background model I’m using is something like: ZFC is an attempt to make a consistent formalization of our intuitive notion of “collections”. Given that our intuitive notion of collections has built-in contradictions (for instance, that there’s a collection of all things, and that any group of things that you can describe forms a collection), we know that there will be certain compromises between intuitiveness and consistency, and that we shouldn’t expect formal set theory to be intuitive. With that said, though, ZF and ZFC are equiconsistent, which I think makes intuition more relevant

    2. But on your point about the setup involving beings that are so unlike ourselves that our intuitions about what’s possible aren’t relevant, I can sympathize with that perspective for sure,

      1. Well… you’ve certainly piqued my curiosity. Who do you think I am? I post a few things online in various places but I’m not anybody famous or anything.

        1. Ok maybe I’m just very confused. When you followed me I got an email that included a “Great posts worth seeing from Plain Sight section”, all of which were very familiar to me (ssc). Maybe it’s a WordPress mistake? If not then you certainly qualify as famous in my circles.

          1. You think I’m Scott Alexander? I mean, I’m flattered, but I’m not. I do like his blog though. Maybe there is a glitch, I’m still figuring out how this site works. I see links to his posts in my “posts” section for some reason. Which is strange since I don’t even follow ssc

            This site is so confusing to me lmao. In summary I am a fan of his but I am certainly not Scott.

  4. If they are lined up wouldn’t that make them countably infinite? Like the first person you ask is 1 and the second person you ask is 2?

    1. Yes, there are countably infinitely many prisoners. On the other hand, there are uncountably many ways of assigning hats to prisoners (each assignment is a map from a countably infinite set – the prisoners – to a set of size 2 – the distinct hat colors, and there are continuum many such maps).

  5. Am I the only person who realises that the weirdness of the infinite hat problem is nothing to do with the Axiom of Choice, but the weirdness that infinite sets have non-empty null subsets ?

    If there were 27,000 prisoners instead of an infinite number, then you could not set up the equivalence classes in the first place, never mind choose a representative from each. The relationship, say, “differs in less than 27 places” between two guesses (two sequences of 27,000 ‘0’s or ‘1’s) is not transitive, and thus cannot be an equivalence relationship. On the other hand, “differs in only a finite number of places” is a transitive relationship with regard to an infinite sequence of ‘0’s or ‘1’s, and thus can build equivalence classes.

    Looking at the wrong, or differing, guesses as subsets of the sequences, this is the same as the fact that the only null subsets of a finite set are the empty set, whereas infinite sets have non-empty null subsets. You can extend the concept of “null-measure sets” to null subsets in general. A null subset is a subset of a parent set that is not only insignificantly small compared to the parent set, but also any union of two such sets is also insignificantly small and also a null subset. Thus, by daisy-chaining, any finite grand union of null subsets must also be a null subset.

    Actually humans are used to the concept of non-empty null subsets, but only in the context of mass-objects (that you weigh or measure, like milk, string, fields, water-tanks). We understand that a line has no area, and an area or surface has no volume. But non-empty null subsets of a set of count-objects (ones we count like pints of milk, door-handles, or people) are “weird” and counter-intuitive.

    1. I think I disagree with this framing of the problem. The particular subset we’re talking about isn’t zero-measure, it’s non-measurable (you can look into Vitali sets for more info). And the existence of non-measurable subsets of R IS in fact tied to the axiom of choice. Of course the full force of the axiom of choice isn’t needed, but nonetheless choice-like constructions are playing an important role here.

  6. The Vitali set, as I understand it, is an uncountable number of points that nowhere make a line of any length, and is unmeasurable. But I always thought a countable number of points have zero length. Remove them from a line and length of the grand union of what is left is the same as before. After all, the sequence 0, 0+0, 0+0+0, etc. is convergent. Here the number of wrong guesses, or differences between two sets of guesses, if finite, never mind countable, and thus is zero-measure.

    1. Yes, countable subsets of R always have zero measure. The set which is non-measurable here is the set of representative sequences we used, which is uncountable.

  7. Hi,

    It’s a nice challenge, but I don’t believe it works as an argument against the AC, for the following reasons.

    1. [This objection was raised by someone else, so any mistakes in the presentation are mine; but the objection seems decisive to me, barring perhaps a significant modification of the scenario] There seems to be a gap in the proof: The AC says there is a set with one representative of each kind. But the set is not unique, and there is no stipulation that the prisoners have the power to agree on a choice set. For all we know, each of them might pick a different one. If you give them the power to pick one, I’d say the weird conclusion is the result of their huge powers, not the AC. On that note…

    2. [This objection is mine] If we give them the power to agree on things after considering infinitely many choice sets (or some relevantly similar power), it seems to me a solution is possible without assuming AC, even if it requires “cheating” to introduce AC (but it’s a kind of cheating that seems allowed by the logical omniscience and other relevant powers given to the prisoners). Here’s how.

    We consider, for example, the countably infinite case, no hearing. So, there are denumerably many prisoners P1, P2, etc., lined up with randomly assigned hats. They have all of the powers listed in the scenario plus one that makes it work with AC as described above. But there is a twist: the prisoners accept ZF+(¬C), because that is very intuitive to them. So, what can they do? P1 proposes a plan:

    P1: Okay, we have no AC. But we still have logical omniscience. We can, for example, pick an ordinal O1 that cannot be put into one-to-one correspondence with the powerset of the reals, or whatever. Those things exist, because of ZF.
    P2: Sure, but how does that help?
    P1: The ordinal? It was just as example, to get started.
    P2: Alright, so get to the point. How do we solve the problem?
    P1: We construct a suitable scenario. You see, by logical omniscience, we know all of the logical consequences of any statements we come up with. So, we can construct a scenario – whether concrete, involving people, etc., or abstract, involving whatever abstract objects we invent -, and we will know everything that follows from our initial conditions/axioms, including everything that exists in the scenario because its existence follows from those axioms.
    P3: Sure, but how would that help?
    P1: Let us create our abstract scenario. It has something we call the ’empty set’ and we denote 0*. It is a scenario in which all of the ZFC conditions hold. We impose no further conditions, so it is consistent, which we know by logical omniscience.
    P3: Sure, it’s logically consistent. But so what?
    P1: So, we also have 1*={0*}, 2*={0*,1*}, etc.; all of those abstract entities exist in our scenario. I know, those aren’t the ‘true’ natural numbers, but some analogue in a scenario we come up with. It matters not. There are sequences of zeroes* and ones*, equivalence classes*, etc. in our scenario, and a choice set* that contains exactly one representative for each class*. So, let us pick one such choice set* C*, agree on it, identify our numbers on the line with the numbers* in our scenario, and proceed as in the plan that would use the AC.
    P4: That’s cheating! We cannot choose a choice set, or a choice set* (or choice function*, or however you like to put it), because it does not actually exist. You’re writing fiction. Fictional “existence” is not existence. A fictional monster is not something we should be afraid of!
    P1: Let’s take a closer look. We have two consistent abstract scenarios: one is the true mathematics, with ZF¬C, and the other one is our hypothetical scenario, with ZFC. In which sense of ‘exists’ does the ordinal O1 above exists but our choice set C* does not? I can tell the true mathematics scenario is an abstract scenario that is more intuitive to us. Some other beings – say, aliens with different dispositions – might find the one with ZFC more intuitive, and then they might do mathematics* instead of mathematics. But both scenarios are logically possible; in one of them, it follows from the axioms that there exists O1, but not a choice set; in the other one, it follows that there exists O1* (an analogue of O1 in our scenario), and also C*. But why would we be able to work only in the scenario that is more intuitive to us? We have logical omniscience. Both are logically possible. Both are abstract. We can pick O1; why can’t we pick C* in our scenario, and use it as well?
    P5: But that would not be true! It would be fiction, whereas ¬AC is true!
    P1: What does that mean? True on which domain of discourse? It seems to me ¬AC is true because in the intuitive abstract scenarios we call ‘math’, there are sets of disjoint nonempty subsets for which there is no choice set, or function, etc. In other words, it is true in those intuitive scenarios. But there is no logical obstacle to constructing an abstract scenario that is less intuitive to beings with dispositions such as ours, and apply our logical omniscience in that scenario. AC does not need to be true in the powerset of the set of the natural numbers or the real numbers, or any familiar setting. It suffices for us to have a logically consistent domain in which it holds and in which we have analogues to the natural numbers, the sequences of zeroes and ones, etc., that are close enough to have good analogues of the relevant properties.
    P6: No, I disagree: you see, we can only find sets that actually exist, in the true Platonic realm. Logical omniscience only tell us that there it is logically consistent that ZFC holds in some scenario, but we cannot really pick the choice set* C*, because it is not actual. On the other hand, O1 is actual; it dwelss in the Platonic realm.
    P1: Well, then, I don’t think Platonism is true, so let’s try me plan. Since we already know we can pick O1 and similar things, provided that Platonism is false, we should be able to pick C*.

    (Side note: There is an objection to the above dialogue: arguably, the concept of logical omniscience is such that logically omniscient beings would know beforehand whether they can actually pick the set*, so there would be no debate between them. I’d reply that my argument is independent of the debate, but it’s more fun 😉 with it. )

    So, would P1’s plan fail?

    I haven’t been able to find a good reason why it would (but that does not look weird to me, given the powers of the prisoners – those, on the other hand, look weird to me). One can come up with ad-hoc modifications of the powers, and/or embrace Platonism, but other than that, how would that fail?

    1. I did say that they had the ability to memorize uncountable amounts of information, but perhaps also need to stipulate the ability to *communicate* uncountable amounts of information, so that all prisoners could get on board with the same choice set. Your point here seems to just be that you don’t need the full axiom of choice, only a much weaker variant of it. We just need to find some way to well-order these sequences of naturals and then we’re fine, we don’t need the ability to well-order sets of arbitrary cardinality. This in fact strengthens the argument: all that’s required to generate the paradox is a weak variant of choice, and so certainly full choice is going to be too strong if even this weak variant generates problems. Perhaps I’ve misunderstood what role the ordinal O1 is playing in your setup, though.

      1. I did say that they had the ability to memorize uncountable amounts of information, but perhaps also need to stipulate the ability to *communicate* uncountable amounts of information, so that all prisoners could get on board with the same choice set.

        Yeah, you said that, but I think memorizing is not the issue. They need to agree on the same choice set. That requires not only to communicate uncountable amounts of informacion, but to in a sense go back and forth and settle on a choice set, with a kind of ‘back and forth’ involving uncountably many steps. These entities are very much unlike humans, so I am not sure there is any intuition that suggests entities with such vast powers being able to avoid more than finitely many fatalities is weird.

        That said, my other point is meant to deal with the addition of these sort of power: i.e., let us assume these entities do have the power described above.

        Your point here seems to just be that you don’t need the full axiom of choice, only a much weaker variant of it.

        I may not have been clear; O1 does not play a significant role. It was just an example to make the dialog more like a conversation.

        But what I’m arguing is that once these entities are given the relevant amount of power, they can avoid more than finitely many fatalities even under the assumption that AC is false. For that reason, I do not believe the argument works against AC – i.e., it goes through with or without AC, or even with its negation.

        The reason is that even if AC is false (in whatever sense) in the ‘real’ realm of math (whatever that might be), there is a logically consistent scenario (even if not ‘true’ in some sense) that is pretty much the same as what class of hereditary sets would be if ZFC were true (I’m assuming consistency of ZF). So, perhaps the ‘real’ class of hereditary sets is not like that and ZF¬C holds in the ‘real’ mathematical world, but that does not matter, because the entities have logical – not just mathematical – omniscience, and ZFC holds in a relevantly similar logically possible (even if, say, not mathematically possible) realm.

        What is the difference between logical and mathematical omniscience (or possibility)? I don’t know, because I do not know how math is being distinguished from logic so that there may be a logically consistent set of axioms (ZFC) which is not mathematically true. But however this distinction is made, it seems it is implicit in any argument against AC that there is such a distinction, else it’s hard to grok a question of whether AC is true or false.
        At any rate, my argument does not require that we settle the above question, only that the beings in question have logical omniscience and ZFC be consistent.

        Granted, one might set their powers to mathematical omniscience but not logical omniscience (whatever the correct distinction might be), and that would block my objection, but I think only at the cost of making it clear that what is doing all the philosophical work here is the power given to these entities, not the axiom of choice.

Leave a Reply