How hard is classification? Equivalence relations and Borel reductions

What is classification?

Classification is one of the most basic human activities. We wake up to a world of vibrant experience and immediately begin structuring it, organizing it into objects and actions, people and animals, edible and non-edible, friend and foe, and so on. Eventually our system of classifications becomes immense and interconnected, partitioning up the world of blooming buzzing confusion into a million tiny but intelligible pieces. 

In the real world, classification is often vague. In math, it can be made a bit more precise through the notion of an equivalence relation. An equivalence relation can be thought of in two ways. First, concretely, it’s a way to “carve up” a set, partitioning it into disjoint pieces called equivalence classes. Every element x of the original set appears in exactly one equivalence class, which is referred to as [x]. 

More abstractly, an equivalence relation on a set X is a binary relation E on X satisfying three axioms:

  1. Reflexivity: (∀x ∈ X) (x E x)
  2. Symmetry: (∀x,y ∈ X) (x E y ⇒ y E x)
  3. Transitivity: (∀x,y,z ∈ X) (x E y E z ⇒  x E z)

One can prove that any binary relation satisfying these three axioms yields a carving-up of X in the sense described above.

One thing that classification systems allow you to do is to coarse-grain the world, forgetting about the finer details and remembering only higher-order properties. Rather than think about my dog in particular, I can think about the class of all dogs, and treat this class as an object in its own right. Mathematically, this is called quotienting

Quotienting is a very common mathematical move that appears wherever there are equivalence relations around. If E is an equivalence relation on a set X, then the quotient of X by E (written X/E) is just the set of all the equivalence classes. 

X/E = { [x] | x ∈ X }

Quotienting can change structures in interesting and complicated ways. One of the most common examples is ℤn (the integers mod n), which we get by quotienting the integers ℤ by the “differs-by-n” equivalence relation:

x ~ y ⇔ |x – y| = n

For instance, in ℤ5, the number 2 is identified with the equivalence class [2] = {…, -8, -3, 2, 7, 12, …}.

Equally common, the real numbers are often defined by quotienting. You start with the set of all Cauchy sequences of rational numbers, and consider the equivalence relation of “converging to one another” between sequences:

x ~ y ⇔ limn→∞ |xn – yn| = 0

A real number is defined to be an equivalence class of such sequences. This is a good example of coarse-graining in practice. You never really think of real numbers as sets of Cauchy sequences of rationals (outside of an analysis course). Once you quotient out by this equivalence relation, you forget about this “internal structure” and treat each real number as a primitive object. Similarly, in ℤ5 you think of 2 as a primitive element, not an infinite set of integers.

Some classifications are harder than others

Let’s quickly recap the discussion in the last post.

We began with an infinite group of prisoners whose freedom rested on their ability to pick representatives from a particular equivalence relation on Cantor space, 2. Cantor space can be thought of in many ways. In our case, it’s the space of all ways of assigning black and white hats to the lineup. It’s also the space of all infinite binary sequences, or equivalently all functions from ℕ to {0,1}. And it can be visualized as the infinite paths through the complete infinite binary tree.

The equivalence relation the prisoners found themselves stuck with was the “eventually agrees” relation, E0, defined by:

x E0 y ⇔ (∃n ∈ ℕ) (∀m > n) (xm = ym)

For instance, here’s what the equivalence class of the all-zeros sequence 000… looks like:

The prisoners had to find some way of agreeing on a choice of representative from each equivalence class. There’s actually a few different ways to formalize this idea: transversals, selectors, and reductions. 

A transversal of an equivalence relation E on X is a subset A ⊆ X which intersects each E-class exactly once.

(∀C ∈ X/E) (|A ∩ C| = 1)

A selector for an equivalence relation E on X is a function f: X → X which takes the elements of an equivalence class C to the representative element for C.

(∀x ∈ X) (f(x) ∈ [x]E) and (∀x,y ∈ X) (xEy ⇒ f(x) = f(y))

And finally (and most importantly) there’s the idea of a reduction. This idea is significantly more general than the previous two, and will play a big role in the upcoming posts. First, the formal definition:

Given two sets X,Y and equivalence relations E (on X) and F (on Y), 
a reduction of E to F is a function f: X → Y such that
(∀x, x’ ∈ X) (x E x’ ⟺ f(x) F f(x’))

If such a function exists, we say that E is reducible to F and write E ≤ F.

Informally, reducibility measures the relative complexity of equivalence relations. If E ≤ F, then E is “simpler” or “easier. to compute” than F. For instance, if we want to check if two elements x and x’ are E-related, we can instead check if f(x) and f(x’) are F-related. Thus if we had an oracle for F then we could figure out E (using f). 

A special case of reduction is where F is just the identity relation =Y on Y, in which case we have:

xEx’ if and only if f(x) = f(x’)

Now, if we’re allowed to use any function f whatsoever, then this notion of reducibility ends up not being not so interesting. For instance, we can reduce any equivalence relation to equality by choosing Y = X/E and defining f(x) = [x]E. More generally, reducibility with arbitrary functions turns out to just be a matter of comparing the cardinalities of the quotients. Thus we shift our focus from arbitrary functions to definable functions, in the sense of Borel. 

In the last post we talked about Borel subsets of a space, not functions. But a function can be identified with its graph and treated as a subset of X × Y. So f: X → Y is Borel if and only if it is Borel as a subset of X × Y. Borel relations are defined similarly.

(Reminder: the Borel sets in a topological space X are just the sets you can construct out of open sets through countable unions, intersections, and complements. Equivalently, they’re the sets definable in countable propositional logic, with atomic propositions interpreted as defining the basic open sets.)

(Notice that we’re taking advantage of the relationship between definability and topology: if X and Y are both topological spaces, then the product X × Y already has a canonical “product topology”, generated by products of open sets in X and Y. So once we know how to interpret the atomic propositions in X and in Y, we can automatically interpret atomic propositions in X × Y.)

We’ve finally arrived at the central concept: Borel reducibility or definable reducibility.

Given two topological spaces X,Y and equivalence relations E (on X) and F (on Y), 
a Borel reduction of E to F is a Borel function f: X → Y such that
(∀x, x’ ∈ X) (x E x’ ⟺ f(x) F f(x’))

If such a function exists, we say that E is Borel reducible to F and write E ≤B F.

Classifying classifications

Let me now get to the punchline.

We carve up the mathematical universe by defining equivalence relations on the sets we’re interested in. When these sets are topological spaces, we can compare these equivalence relations through the relationship of Borel reducibility. At the end of the last post, I told you that there was no Borel transversal of E0. By the same token, there is no Borel reduction of E0 to the identity relation on Cantor space. “Eventual equality” is a strictly more complicated notion than “equality”. 

This might not sound very surprising. Of course eventual equality is more complicated than equality, it has an extra word in its name! But it turns out that lots of complicated-looking equivalence relations are Borel reducible to the equality relation. Such equivalence relations are called smooth or concretely classifiable. For example, the relationship of “similarity” between square matrices (intuitively, two matrices are similar if they represent the same linear transformation but in different bases) turns out to be smooth.

E is smooth if and only if there’s a Borel reduction E ≤B =

(Notice that I defined it here in terms of identity on ℝ rather than 2. Not all identity relations are of equal complexity, but these two are. We’ll see that for many purposes ℝ and 2 are interchangeable.)

The smooth equivalence relations are the simplest ones out there. If E is smooth, then there’s some definable way to assign a different real number to each class. We can begin to draw a picture of the Borel reducibility hierarchy:

Natural questions immediately arise. Are there any equivalence relations strictly between smooth and E0? (No.) Are there equivalence relations above E0? (Yes, many.) Is there a most complex equivalence relation? (No, for any equivalence relation there’s a strictly harder one.) Are there equivalence relations of incomparable complexity? (Yes, in fact there’s uncountably many such equivalence relations!)

The Borel reducibility hierarchy for equivalence relations is a relatively recent discovery in the history of mathematics. It’s only about twenty years old. As such, there are many open questions about its structure. For instance, at the time of writing it’s unknown whether there’s exactly one class directly above E0. There could be multiple incomparable classes directly above E0, or it could be that for any equivalence relation E above E0, there’s another one strictly in between E0 and E.

The big balloon represents the unknown territory waiting to be explored. But one thing that is clear at this point is that the internal structure of this balloon is very rich. In upcoming posts I hope to describe some of what we do know about it, and describe some recent attempts to probe its structure using techniques in model theory and infinitary first-order logic.

Choosing things is hard: infinite hats, definability, and topology

A hat puzzle

Hat-Puzzle-1-3964260265-1571331739116.png

Infinitely many prisoners are assembled in a line as pictured. Each knows their place in the line. Each wears either a black or white hat, and each can only see the hats in front of them. Starting from the back of the line, each prisoner has to guess the color of their own hat. The prisoners were allowed to coordinate before the hats were assigned, but now no communication is allowed. Even the guesses must be silently submitted.

If only finitely many prisoners guess wrong, then everybody goes free. Can they succeed?

(Pause for thought.)

Amazingly, yes! Here’s the strategy: 

Label white hats as 1 and black as 0. Then an assignment of hats becomes an infinite binary sequence, i.e. an element of 2. Define an equivalence relation called E0 on 2 as follows: 

x E0 y if and only if (∃n ∈ ℕ) (∀m > n) (xm = ym)
“x and y eventually agree”

When the prisoners meet up beforehand, they coordinate by agreeing on a choice of one representative from each class.

Once they’re in the room, every prisoner can see all but a finite number of hats. So they all know exactly which equivalence class they’re in. Now each prisoner guesses as if they were in the representative sequence from this class. Since the actual sequence and the representative sequence eventually agree, the prisoners’ guesses eventually agree with reality, and so they go free!

Making choices is hard

I talked about this puzzle a few years ago in this post. Several commenters balked at the solution and said something like: “but there are uncountably many equivalence classes, so therefore the prisoners need to be able to coordinate on uncountably many representatives. Surely this is unreasonable!”

I think that uncountably many representatives is not what’s at issue here. Consider the equivalence relation on the reals defined by:

x ~ y if and only if x – y ∈ ℤ.

Here we also have uncountably many equivalence classes, but the prisoners could easily come to an agreement on which representative to pick. They could for instance agree to choose the unique representative which lies in the interval [0,1). Here the prisoners are able to coordinate on uncountably many representatives, simply by agreeing on a function (f(x) = x mod 1) which takes each real to the representative for its class. A function f like this is called a reduction of ~ to =, as it converts the problem of deciding x ~ y into the problem of deciding if two real numbers are equal (in particular, f(x) = f(y)).

Now, is there a function f from 2 to 2 that takes an infinite binary sequence to the representative sequence for its class? That is, is there a reduction of E0 to the identity relation on 2? Sure! Each E0-class C is non-empty, so we can “make a choice” of any element γC ∈ C. Then set f(x) = γ[x], where [x] denotes the equivalence class of x.

I highlighted the key phrase in the above definition: make a choice. I said that we could choose an element from each class, but didn’t tell you how. And this is a problem for the prisoners! For them to all agree on the function’s values, they must be able to communicate how this choice is made to each other.

In the case of the equivalence relation ~, we were able to find a precise recipe for choosing representatives, namely the definition of the function (x ↦ x mod 1). But can the prisoners find a precise recipe for choosing representatives for the E0-classes?

That is, is there a definable function that reduces E0 to =? Well, what exactly is definability?

What is definability?

A major theme of descriptive set theory is the following identification, which I’d like to try to motivate:

DEFINABLE = BOREL

A definition is a syntactic thing. It’s a sentence with a free variable, like “x is the 15th digit in the decimal expansion of π” or “f is the identity function on ℝ”. To precisely state what definability is, we must specify a formal language to work in. The simplest logical language is that given by propositional logic. Here we begin with an alphabet, a countable set of basic atomic propositions, and build all other sentences through finite conjunctions, disjunctions, and negation. 

Returning to our puzzle, we were interested in describing 2 and its subsets. We want our atomic propositions to represent easily definable subsets of 2, or equivalently, simple properties of infinite binary sequences. A natural choice for these basic atomic properties is Pnm = “x’s nth bit is m”, interpreted as defining the set {x ∈ 2 | x’s nth bit is m}. Translated back to prisoners and hats, these are sentences like “Prisoner 35 is wearing a black hat” or “Prisoner 15 is wearing a white hat”. Intuitively, everything that can be said about infinite binary sequences, should be in principle expressible just in terms of sentences like these. 

With finite conjunctions, disjunctions, and negation, we can define sets like {x ∈ 2 | x starts with 010110} and {x ∈ 2 | x’s first two bits agree}. Identifying 0 with “left” and 1 with “right”, we can draw these sets as subsets of the infinite binary tree:

x starts with 010110
x’s first two bits agree

How about a set like {x ∈ 2 | x contains at least one 1}?

x contains at least one 1

What we want is a proposition like “x’s first bit is 0 or x’s second bit is 0 or …”, i.e. 

(P00 ∨ P10 ∨ P20 ∨ …), or \/n∈ℕ Pn,0

What we need is the ability to take countably infinite conjunctions and disjunctions. Ordinary propositional logic doesn’t allow this. So we graduate to countable propositional logic. In other words, we expand the syntax by closing it under countable conjunctions and disjunctions:

For any countable collection of sentences {φn | n ∈ ℕ},
/\n φn and \/n φn are also sentences

On the semantic side, our collection of definable sets is now closed under countable unions, intersections, and negations. For the measure theorist, this is a familiar object: we’ve just defined a sigma-algebra!

(Technical note: when we say “closed under countable unions and intersections”, we also include empty unions and intersections, which correspond to ∅ and X, respectively. For notational convenience, we introduce the symbols ⊥ and ⊤ into our syntax, thought of as the atomic propositions “False” and “True”.) 

In general there are many different sigma algebras you can put on a set, corresponding to different choices of the atomic propositions. But when our set is also a topological space, as in ℝ and 2, there’s a natural choice of sigma-algebra, called the Borel sigma-algebra. Here we take our atomic propositions to define the basic (or sub-basic) open sets. Then the Borel sets are all the sets constructible through countable unions and intersections from basic opens, or equivalently,  the sets definable in countable propositional logic.

In 2 the topology is generated by sets of the form

{x ∈ 2 | x’s nth bit is m} for any n, m ∈ ℕ,

which are the same as our earlier Pmn.

How about in ℝ? Here the topology is generated by basic sets of the form 

(a,b) = {x ∈ ℝ | a < x < b}  for any a, b ∈ ℚ

So we choose our atomic propositions accordingly: for any two rationals a,b, we have an atomic proposition Pab, which we interpret as “a < x < b”.

Let’s pause to recall how we got here. We began by trying to define “definability”, and have found that there’s a natural way to interpret countable propositional logic through Borel sigma algebras on topological spaces. We have an atomic proposition for each (sub-)basic open set, and every set is defined by some countable propositional sentence. As we vary our interpretation of the atomic propositions, we move between different topological spaces. 

The question “can the prisoners coordinate on a strategy?” has now taken on a definite form: “is there a Borel subset of 2 that picks exactly one element from each equivalence class?” And it turns out that the answer is no! For the prisoners to coordinate on a choice function, they need more syntactic resources at hand than countable propositional logic.

(To be continued…)

The Hypergame Paradox

Credit to Joel David Hamkins, who I heard discussing this paradox on an episode of the podcast My Favorite Theorem.

Define a finite game to be any two-player turn-based game such that every possible playthrough ends after finitely many turns. For example, tic-tac-toe is a finite game because every game ends in at most nine turns.. So is chess with the 50-move-rule enforced (if 50 moves are taken without any pawn advances or captures, then the game ends in a draw). In both these examples, there’s an upper bound to how long the game can last, but this is not required. A game of transfinite Nim would count as finite; every game lasts for only finitely many turns, even though there is no upper bound on the number of turns it takes.

Now, consider the game Hypergame. To play Hypergame, Player 1 begins by choosing any finite game G. Then Player 2 plays the first move of G, Player 1 plays the second move of G, and so on until the game is completed. (Since Player 1 chose a finite game, this will always happen after some finite amount of time.)

Is Hypergame a finite game? Yes, we can easily see that it must be. Whatever game Player 1 chooses will be over after n steps, for some finite n. So that playthrough of Hypergame will have taken n+1 steps.

But if Hypergame is a finite game, then it is a valid choice for the first move of Hypergame! So we can now imagine the following playthrough of Hypergame:

A Troubling Playthrough of Hypergame
Player 1: For the finite game that we shall play, I pick Hypergame.
Player 2: Hm, okay. So now I’m playing the first move of Hypergame. So I must now choose any finite game. I’ll choose Hypergame!
Player 1: Alright so I’m again playing the first move of this new game of Hypergame. I’ll choose Hypergame again.
Player 2: And I choose Hypergame again as well.
So on forever…

At no point does either player violate the rules of Hypergame. And yet, we ended up with an infinite playthrough of Hypergame, which we proved was impossible! So we have a contradiction. What is the resolution?

✵✵✵

Here’s one possible resolution, analogous to the resolutions of similar set-theoretic paradoxes.

We can think about a game as a directed rooted tree. The vertices of the tree correspond to game states, and the edges correspond to the allowed moves. The root of the tree corresponds to the starting game state, and Player 1 gets to choose which edge to travel along first. From the new vertex, Player 2 decides the next edge to travel along. And so on. The tree’s leaves correspond to ending states of the game, and each leaf is labelled according to which player won in that ending state.

In this framework, what is a finite game? As I defined it above, a finite game is just any directed rooted tree such that every path starting at the root ends at a leaf after passing through finitely many edges. This corresponds perfectly to the idea that every possible playthrough of the game takes only finitely many turns. Notice that a finite game is not necessarily a finite tree! The game tree of a finite game is only finite if it’s also finitely branching. In other words, for a game to have a finite tree requires not just that every playthrough is finitely long, but that each player always has only finitely many choices on their turn.

For instance, the game tree of Hypergame is not a finite tree, because Player 1 has infinitely many possible finite games to choose from on his first turn. How big exactly is the game tree of Hypergame? We know that we have to have a vertex corresponding to the start of any finite game, so it must be at least as large as the set of all finite games. But how large is this set?

This is where we run into problems. The game tree of a finite game can be arbitrarily large. Consider the game which starts by Player 1 choosing any real number and then immediately losing. The height of the game tree is 1, but its width is the cardinality of the continuum. Similarly for any set X we can find a game whose tree has cardinality |X|. This means that there are finite games of arbitrary cardinalities. But then as a corollary to the nonexistence of a largest cardinality, we know that there is no set of all finite games! And this implies that Hypergame has no game tree! More precisely, there is no set corresponding to the game tree of Hypergame as we defined it.

Couldn’t we instead think about Hypergame as a proper class? Sure! But then when we choose a finite game in our first move, we couldn’t be picking Hypergame, as the property “is a finite game” would only apply to sets and not proper classes. This means that we can’t actually select Hypergame as our first move! And so we avoid the paradoxical conclusion that we can keep picking Hypergame ad infinitum.

I find it quite fascinating that the seemingly innocent notion of a finite game can lead us into paradoxes involving proper classes!

A Dating Puzzle, and Bayesian Experimental Design

A puzzle for you!

A group of six men (A, B, C, D, E, and F) and six women (a, b, c, d, e, and f) are chosen so that everybody has a perfect match of the opposite gender. They are given the following clues:

  1. E and e are a match.
  2. A and c are a match.
  3. The pairing (A,b), (B,a), (C,c), (D,d), (E,e), (F,f) contains exactly 2 matches.
  4. The pairing (A,c), (B,d), (C,a), (D,b), (E,e), (F,f) contains exactly 2 matches.
  5. The pairing (A,c), (B,f), (C,b), (D,d), (E,e), (F,a) contains exactly 3 matches.

If they can figure out everybody’s perfect matches, the group will get a million dollars to share! Can they figure it out?

If you figure that one out, then here’s the next level, with ten pairs and twelve clues!

A group of ten men (ABCDEFGHIJ) and ten women (abcdefghij) are chosen so that everybody has a perfect match of the opposite gender. They are given the following twelve clues:

  1. F and f are not a match
  2. J and h are not a match
  3. B and e are not a match
  4. D and d are a match
  5. H and c are a match
  6. The pairing (Ai Bb Ca Dd Ee Fc Gj Hg If Jh) contains exactly 2 matches.
  7. The pairing (Af Be Cg Dj Eh Fd Gi Hc Ia Jb) contains exactly 4 matches.
  8. The pairing (Af Be Ca Dd Ej Fh Gi Hb Ig Jc) contains exactly 2 matches.
  9. The pairing (Aa Bc Ci Dd Eg Fj Gf Hb Ih Je) contains exactly 2 matches.
  10. The pairing (Af Bi Ce Dd Eh Fg Gj Hc Ia Jb) contains exactly 5 matches.
  11. The pairing (Af Ba Cb Dd Eh Fi Gj Hc Ie Jg) contains exactly 5 matches.
  12. The pairing (Af Bi Ch Dd Eb Fg Gj Hc Ia Je) contains exactly 7 matches.

Can you help them get their million dollars?

✯✯✯

Some background on the puzzle:

I didn’t actually come up with it out of nowhere, I encountered in the wild! A few days ago I started watching a new Netflix reality show called “Are You The One?” The premise of the show: ten men and ten women are paired up via a matchmaking algorithm, and if they can all figure out their other halves after living together for a month, then they win one million dollars. It’s an extremely corny show, but there was one aspect of it which I found pretty interesting: it’s a perfect setting for Bayesian experimental design! Me being me, I spent the whole show thinking about the math of how contestants get information about their potential “perfect matches”.

Let’s start with a very basic look at the underlying math. Ten pairs, chosen from two groups of ten, gives ten factorial possible matchings. 10! is 3,628,800, which is a lot of possibilities. If you were to randomly choose the ten matches, with no background information, you’d have a .000028% chance of getting everything right. This is about as likely as you are to correctly guess the outcome of 22 tosses of an unbiased coin in a row!

Of course, the show isn’t so cruel as to force them to just guess with no information. For one thing, they have a whole month to get to know each other and gather lots of that subtle tricky-to-quantify social evidence. But more importantly, every episode they get two crucial pieces of information:

First, they get to choose one pair (a man and a woman) to go into the “Truth Booth”. The group is then informed whether these two are a genuine match or not.

And second, at the end of each episode the entire group gathers together, everybody pairs off with one another, and they are told how many pairs they got right (though not which ones). On the final episode, the matching they choose determines whether they get the million dollars or not.

I call the first type of test the individual pair test and the second the group test. And this is where the two types of clues in the above puzzles come from! The clues for the second puzzle are actually just a translation of the tests that the group decided to do in the first seven episodes. (So if you successfully solved it, then feel good about yourself for being the type of person that would have won the million.) Interestingly, it turns out that by the seventh episode they could had already figured out everybody’s perfect match, but it took them three more episodes to get to that point! Silly non-logically-omniscient humans.

Putting aside the puzzle we started with, the show setup naturally lends itself to some interesting questions. First of all, is there a strategy that guarantees that the group will get the million dollars by the tenth episode? In other words, is ten pair tests and nine group tests sufficient to go from 3,628,800 possibilities to 1?

I am not yet sure of the answer. The fact that the group in season one managed to narrow it down to a single possible world with only seven episodes seems like evidence that yes, those 19 tests do provide enough evidence. In addition, there have been eight seasons and in only one did the group fail to find the correct matches. (And in the season whose group failed, they started with eleven pairs – multiplying the number of possible worlds by 11 – and used two fewer pair tests than previous seasons.)

However, it’s worth keeping in mind that while the group of twenty individuals was far from logically omniscient, they did have a great wealth of additional social evidence, and that evidence may have allowed them to make choices for their tests that yielded much more expected information than the optimal test in the absence of social information. (I’m also suspicious that the producers might be a little more involved in the process than they seem. There are a few seasons where the group is consistently getting 2 or 3 matches until the last episode where they suddenly get everything right. This happens in season 3, and by my calculations there were still four possible worlds consistent with all their information by the time of the last matching ceremony!)

We can also ask the Bayesian experimental design question. What’s the optimal pair test/group test to perform, given the set of remaining possible worlds?

We can solve this fairly easily using a brute force approach. Consider the problem of finding the optimal pair test. First we actually generate the set of all possible worlds (where each world is a specification of all ten matches). Let N be the size of this set. Then we look through all possible pairs of individuals (i, j), and for each pair calculate the number of worlds in which this pair is a match. Call this quantity nij. Then the expected number of worlds remaining after performing a pair test on (i, j) is:

nij Pr(i and j are a match) + (N – nij) Pr(i and j are not a match) = nij2/N+ (N – nij)2/N

So we simply search for the pair (i, j) that minimizes nij2 + (N – nij)2. This is equivalent to maximizing nij (N – nij): the product of the number of worlds where (i, j) is a match and the number of worlds where (i, j) is not a match.

We do pretty much the same thing for the group test. But here’s where we run into a bit of trouble: though our algorithm is guaranteed to return an optimal test, the brute force approach has to search through 10!10! possibilities, and this is just not feasible. The time taken to solve the problem grows exponentially in the number of worlds, which grows exponentially in the number of individuals.

So we have another interesting question on our hands: Is there an efficient algorithm for calculating the optimal pair/group test? In the more general setting, where the tests are not restricted to just being pair tests or group tests, but can be a test of ANY boolean expression, this is the question of whether SAT can be solved efficiently. And given that SAT is known to be NP-complete (it was in fact the first problem to be proven NP-complete!), this more general question ends up being equivalent to whether P = NP!

What’s the probability that an election winner leads throughout the entire vote?

Try to answer the question in the title for yourself. An election has n votes for candidate A, m votes for candidate B, and no third candidate. Candidate A wins, so n > m. Assuming that the votes were equally likely to be counted in any order, what’s the probability that candidate A was always ahead? (Note that “always ahead” precludes the vote being tied at any point.)

My friend showed me this puzzle a couple of days ago and told me that there’s an extremely simple way to solve it. After trying a bunch of complicated things (including some wild Pascal’s triangle variants), I eventually relented and asked for the simple solution. Not only is there a short proof of the answer, but the answer is itself incredibly simple and elegant. See if you can do better than I did!

(…)

(…)

(…)

(Read on only after you’ve attempted the problem!)

(…)

(…)

(…)

So, here’s the solution. We start by considering the opposite of A always being ahead, which is that A is either behind or that the vote is tied at some point. Since we know that A eventually wins, A being behind at some point implies that at a later point A and B are tied. So the opposite of A always being ahead is really just that there is a tie at some point.

Pr(A is always ahead | n votes for A, m votes for B)
= 1 – Pr(A is behind at some point or tied at some point | n, m)
= 1 – Pr(There’s a tie at some point | n, m)

Now, let’s consider the probability of a tie at some point. There are two ways for this to happen: either the first vote is for A or the first vote is for B. The first vote being for B entails that there must be a tie at some point, since A must eventually pull ahead to win. This allows us to say the following:

Pr(Tie at some point | n, m)
= Pr(Tie at some point & A is first vote | n, m) + Pr(Tie at some point & B is first vote | n, m)
= Pr(Tie at some point & A is first vote | n, m) + Pr(B is first vote | n, m)
= Pr(Tie at some point & A is first vote | n, m) + m/(n+m)

Now, the final task is to figure out Pr(Tie at some point & A is first vote | n, m). If you haven’t yet figured it out, I encourage you to pause for a minute and think it over.

(…)

(…)

Alright, so here’s the trick. If there’s a tie at some point, then up to and including that point there are an equal number of votes for A and B. But this means that there are the same number of possible worlds in which A votes first as there are possible worlds in which B votes first! And this means that we can say the following:

Pr(Tie at some point & A is first vote | n, m)
= Pr(Tie at some point & B is first vote | n, m)

And this is exactly the probability we’ve already solved!

Pr(Tie at some point & B is first vote | n, m)
= Pr(B is first vote | n, m)
= m/(n+m)

And now we’re basically done!

Pr(Tie at some point | n, m)
= m/(n+m) + m/(n+m)
= 2m/(n+m)

Pr(A is always ahead | n, m)
= 1 – Pr(Tie at some point | n, m)
= 1 – 2m/(n+m)
= (n – m) / (n + m)

And there we have it: The probability that A is always ahead is just the difference in votes over the total number of votes! Beautiful, right?

We can even more elegantly express this as simply the percent of people that voted for A minus the percent that voted for B.

(n – m) / (n + m)
= n/(n+m) – m/(n+m)
=%A – %B

This tells us that even in the case where candidate A gets 75% of the vote, there’s still a 50/50 chance that they fall behind at some point!

An example of this: the recent election had Biden with 81,283,485 votes and Trump with 74,223,744 votes. Imagining that there were no third candidates, this would mean that Biden had 52.27% of the popular vote and Trump had 47.73%. And if we now pretend that the votes were equally likely to be counted in any order, then this tells us that there would only be a 9.54% chance that Biden would be ahead the entire time! Taking into account the composition of mail-in ballots, which were counted later, this means that Trump having an early lead was in fact exactly what we should have expected. The chance that Biden would have fallen behind at some point was likely quite a bit higher than 90.5%!

Polish Notation and Garden-Path Sentences

Polish notation is a mathematical notation system that allows you to eliminate parentheses without ambiguity. It’s called “Polish” because the name of its Polish creator, Jan Łukasiewicz, was too difficult for people to pronounce.

A motivating example: Suppose somebody says “p and q implies r”. There are two possible interpretations of this: “(p and q) implies r” and “p and (q implies r)”. The usual way to disambiguate these two is to simply add in parentheses like I just did. Another way is to set an order-of-operations convention, like that “and” always applies before “implies”. This is what’s used in basic algebra, and what allows you to write 2 + 2 ⋅ 4 without any fear that you’ll be interpreted as meaning (2 + 2) ⋅ 4.

Łukasiewicz’s method was to make all binary connectives into prefixes. So “A and B” because “and A B”, “P implies Q” becomes “implies P Q”, and so on. In this system, “(p and q) implies r” translates to “implies and p q r”, and “p and (q implies r)” translates to “and p implies q r”. Since the two expressions are different, there’s no need for parentheses! And in general, no ambiguity ever arises from lack of parentheses when using Polish notation.

If this is your first time encountering Polish notation, your first reaction might be to groan and develop a slight headache. But there’s something delightfully puzzling about reading an expression written in Polish notation and trying to understand what it means. Try figuring out what this means: “implies and not p or q s r”. Algebra can be written in Polish notation just as easily, removing the need for both parentheses AND order-of-operations. “2 + 2 = 4” becomes “+ 2 2 = 4”, or even better, “= + 2 2 4”.

Other binary connectives can be treated in Polish notation as well, creating gems like: “If and you’re happy you know it clap your hands!” “When life is what happens you’re busy making plans.” “And keep calm carry on.” “Therefore I think, I am.” (This last one is by of the author the Meditations). Hopefully you agree with me that these sentences have a nice ring to them, though the meaning is somewhat obscured.

But putting connectives in front of the two things being connected is not unheard of. Some examples in English: “ever since”, “because”, “nonwithstanding”, “whenever”, “when”, “until”, “unless”. Each of these connects two sentences, and yet can appear in front of both. When we hear a sentence like “Whenever he cheated on a test the professor caught him”, we don’t have any trouble parsing it. (And presumably you had no trouble parsing that entire last sentence either!) One could imagine growing up in a society where “and” and “or” are treated the same way as “ever since” and “until”, and perhaps in this society Polish notation would seem much more natural!

Slightly related to sentential connectives are verbs, which connect subjects and objects. English places its verbs squarely between the subject and the object, as does Chinese, French, and Spanish. But in fact the most common ordering is subject-object-verb! 45% of languages, including Hindi, Japanese, Korean, Latin, and Ancient Greek, use this pattern. So for instance, instead of “She burned her hand”, one would say “she her hand burned”. This is potentially weirder to English-speakers than Polish notation; it’s reverse Polish notation!

9% of languages use Polish notation for verbs (the verb-subject-object pattern). These include Biblical Hebrew, Arabic, Irish, and Filipino. In such languages, it would be grammatical to say “Loves she him” but not “She loves him”. (3% of languages are VOS – loves him she – 1% are OVS – him loves she – and just a handful are OSV – him she loves).

Let’s return to English. Binary prepositions like “until” appear out front, but they also swap the order of the two things that they connect. For instance, “Until you do your homework, you cannot go outside” is the same as “You cannot go outside until you do your homework”, not “You do your homework until you cannot go outside”, which sounds a bit more sinister.

I came up with some examples of sentences with several layers of these binary prepositions to see if the same type of confusion as we get when examining Polish notation for “and” or “implies” sets in here, and oh boy does it.

Single connective
Since when the Americans dropped the bomb the war ended, some claimed it was justified.

Two connectives, unlayered
Since when the Americans dropped the bomb the war ended, when some claimed it was an atrocity others argued it was justified.

Still pretty readable, no? Now let’s layer the connectives.

One layer
Whenever he was late she would weep.
She would weep whenever he was late.

Two layers
Since whenever he was late she would weep, he hurried over.
He hurried over, since she would weep whenever he was late.

Three layers
Because since whenever he was late she would weep he hurried over, he left his wallet at home.
He left his wallet at home, because he hurried over since she would weep whenever he was late.

Four layers
Because because since whenever he was late she would weep he hurried over he left his wallet at home, when he was pulled over the officer didn’t give him a ticket.
The officer didn’t give him a ticket when he was pulled over, because he left his wallet at home because he hurried over since she would weep whenever he was late.

Five layers
When he heard because because since whenever he was late she would weep he hurried over he left his wallet at home, when he was pulled over the officer didn’t give the man a ticket, the mayor was outraged at the lawlessness.
The mayor was outraged at the lawlessness when he heard the officer didn’t give the man a ticket when he was pulled over because he left his wallet at home because he hurried over since she would weep whenever he was late.

Read that last one out loud to a friend and see if they believes you that it makes grammatical sense! With each new layer, things become more and more… Polish. That is, indecipherable. (Incidentally, Polish is SVO just like English). Part of the problem is that when we have multiple layers like this, phrases that are semantically connected can become more and more distant in the sentence. It reminds me of my favorite garden-path sentence pattern:

The mouse the cat the dog chased ate was digested.
(The mouse that (the cat that the dog chased) ate) was digested.
The mouse (that the cat (that the dog chased) ate) was digested.

The phrases that are meant to be connected, like “the mouse” and “was digested” are sandwiched on either side of the sentence, and can be made arbitrarily distant by the addition of more “that the X verbed” clauses.

Does anybody know of any languages where “and” comes before the two conjuncts? What about “or”? English does this with “if”, so it might not be too much of a stretch.