How hard is classification? Equivalence relations and Borel reductions

September 26, 2023September 26, 2023 ~ squarishbracket ~ 2 Comments

What is classification?

Classification is one of the most basic human activities. We wake up to a world of vibrant experience and immediately begin structuring it, organizing it into objects and actions, people and animals, edible and non-edible, friend and foe, and so on. Eventually our system of classifications becomes immense and interconnected, partitioning up the world of blooming buzzing confusion into a million tiny but intelligible pieces.

In the real world, classification is often vague. In math, it can be made a bit more precise through the notion of an equivalence relation. An equivalence relation can be thought of in two ways. First, concretely, it’s a way to “carve up” a set, partitioning it into disjoint pieces called equivalence classes. Every element x of the original set appears in exactly one equivalence class, which is referred to as [x].

More abstractly, an equivalence relation on a set X is a binary relation E on X satisfying three axioms:

Reflexivity: (∀x ∈ X) (x E x)
Symmetry: (∀x,y ∈ X) (x E y ⇒ y E x)
Transitivity: (∀x,y,z ∈ X) (x E y E z ⇒ x E z)

One can prove that any binary relation satisfying these three axioms yields a carving-up of X in the sense described above.

One thing that classification systems allow you to do is to coarse-grain the world, forgetting about the finer details and remembering only higher-order properties. Rather than think about my dog in particular, I can think about the class of all dogs, and treat this class as an object in its own right. Mathematically, this is called quotienting.

Quotienting is a very common mathematical move that appears wherever there are equivalence relations around. If E is an equivalence relation on a set X, then the quotient of X by E (written X/E) is just the set of all the equivalence classes.

X/E = { [x] | x ∈ X }

Quotienting can change structures in interesting and complicated ways. One of the most common examples is ℤ_n (the integers mod n), which we get by quotienting the integers ℤ by the “differs-by-n” equivalence relation:

x ~ y ⇔ |x – y| = n

For instance, in ℤ₅, the number 2 is identified with the equivalence class [2] = {…, -8, -3, 2, 7, 12, …}.

Equally common, the real numbers are often defined by quotienting. You start with the set of all Cauchy sequences of rational numbers, and consider the equivalence relation of “converging to one another” between sequences:

x ~ y ⇔ lim_n→∞ |x_n – y_n| = 0

A real number is defined to be an equivalence class of such sequences. This is a good example of coarse-graining in practice. You never really think of real numbers as sets of Cauchy sequences of rationals (outside of an analysis course). Once you quotient out by this equivalence relation, you forget about this “internal structure” and treat each real number as a primitive object. Similarly, in ℤ₅ you think of 2 as a primitive element, not an infinite set of integers.

Some classifications are harder than others

Let’s quickly recap the discussion in the last post.

We began with an infinite group of prisoners whose freedom rested on their ability to pick representatives from a particular equivalence relation on Cantor space, 2^ℕ. Cantor space can be thought of in many ways. In our case, it’s the space of all ways of assigning black and white hats to the lineup. It’s also the space of all infinite binary sequences, or equivalently all functions from ℕ to {0,1}. And it can be visualized as the infinite paths through the complete infinite binary tree.

The equivalence relation the prisoners found themselves stuck with was the “eventually agrees” relation, E₀, defined by:

x E₀ y ⇔ (∃n ∈ ℕ) (∀m > n) (x_m = y_m)

For instance, here’s what the equivalence class of the all-zeros sequence 000… looks like:

The prisoners had to find some way of agreeing on a choice of representative from each equivalence class. There’s actually a few different ways to formalize this idea: transversals, selectors, and reductions.

A transversal of an equivalence relation E on X is a subset A ⊆ X which intersects each E-class exactly once.

(∀C ∈ X/E) (|A ∩ C| = 1)

A selector for an equivalence relation E on X is a function f: X → X which takes the elements of an equivalence class C to the representative element for C.

(∀x ∈ X) (f(x) ∈ [x]_E) and (∀x,y ∈ X) (xEy ⇒ f(x) = f(y))

And finally (and most importantly) there’s the idea of a reduction. This idea is significantly more general than the previous two, and will play a big role in the upcoming posts. First, the formal definition:

Given two sets X,Y and equivalence relations E (on X) and F (on Y),
a reduction of E to F is a function f: X → Y such that
(∀x, x’ ∈ X) (x E x’ ⟺ f(x) F f(x’))

If such a function exists, we say that E is reducible to F and write E ≤ F.

Informally, reducibility measures the relative complexity of equivalence relations. If E ≤ F, then E is “simpler” or “easier. to compute” than F. For instance, if we want to check if two elements x and x’ are E-related, we can instead check if f(x) and f(x’) are F-related. Thus if we had an oracle for F then we could figure out E (using f).

A special case of reduction is where F is just the identity relation =_Y on Y, in which case we have:

xEx’ if and only if f(x) = f(x’)

Now, if we’re allowed to use any function f whatsoever, then this notion of reducibility ends up not being not so interesting. For instance, we can reduce any equivalence relation to equality by choosing Y = X/E and defining f(x) = [x]_E. More generally, reducibility with arbitrary functions turns out to just be a matter of comparing the cardinalities of the quotients. Thus we shift our focus from arbitrary functions to definable functions, in the sense of Borel.

In the last post we talked about Borel subsets of a space, not functions. But a function can be identified with its graph and treated as a subset of X × Y. So f: X → Y is Borel if and only if it is Borel as a subset of X × Y. Borel relations are defined similarly.

(Reminder: the Borel sets in a topological space X are just the sets you can construct out of open sets through countable unions, intersections, and complements. Equivalently, they’re the sets definable in countable propositional logic, with atomic propositions interpreted as defining the basic open sets.)

(Notice that we’re taking advantage of the relationship between definability and topology: if X and Y are both topological spaces, then the product X × Y already has a canonical “product topology”, generated by products of open sets in X and Y. So once we know how to interpret the atomic propositions in X and in Y, we can automatically interpret atomic propositions in X × Y.)

We’ve finally arrived at the central concept: Borel reducibility or definable reducibility.

Given two topological spaces X,Y and equivalence relations E (on X) and F (on Y),
a Borel reduction of E to F is a Borel function f: X → Y such that
(∀x, x’ ∈ X) (x E x’ ⟺ f(x) F f(x’))

If such a function exists, we say that E is Borel reducible to F and write E ≤_B F.

Classifying classifications

Let me now get to the punchline.

We carve up the mathematical universe by defining equivalence relations on the sets we’re interested in. When these sets are topological spaces, we can compare these equivalence relations through the relationship of Borel reducibility. At the end of the last post, I told you that there was no Borel transversal of E₀. By the same token, there is no Borel reduction of E₀ to the identity relation on Cantor space. “Eventual equality” is a strictly more complicated notion than “equality”.

This might not sound very surprising. Of course eventual equality is more complicated than equality, it has an extra word in its name! But it turns out that lots of complicated-looking equivalence relations are Borel reducible to the equality relation. Such equivalence relations are called smooth or concretely classifiable. For example, the relationship of “similarity” between square matrices (intuitively, two matrices are similar if they represent the same linear transformation but in different bases) turns out to be smooth.

E is smooth if and only if there’s a Borel reduction E ≤_B =_ℝ

(Notice that I defined it here in terms of identity on ℝ rather than 2^ℕ. Not all identity relations are of equal complexity, but these two are. We’ll see that for many purposes ℝ and 2^ℕ are interchangeable.)

The smooth equivalence relations are the simplest ones out there. If E is smooth, then there’s some definable way to assign a different real number to each class. We can begin to draw a picture of the Borel reducibility hierarchy:

Natural questions immediately arise. Are there any equivalence relations strictly between smooth and E₀? (No.) Are there equivalence relations above E₀? (Yes, many.) Is there a most complex equivalence relation? (No, for any equivalence relation there’s a strictly harder one.) Are there equivalence relations of incomparable complexity? (Yes, in fact there’s uncountably many such equivalence relations!)

The Borel reducibility hierarchy for equivalence relations is a relatively recent discovery in the history of mathematics. It’s only about twenty years old. As such, there are many open questions about its structure. For instance, at the time of writing it’s unknown whether there’s exactly one class directly above E₀. There could be multiple incomparable classes directly above E₀, or it could be that for any equivalence relation E above E₀, there’s another one strictly in between E₀ and E.

The big balloon represents the unknown territory waiting to be explored. But one thing that is clear at this point is that the internal structure of this balloon is very rich. In upcoming posts I hope to describe some of what we do know about it, and describe some recent attempts to probe its structure using techniques in model theory and infinitary first-order logic.

Choosing things is hard: infinite hats, definability, and topology

September 21, 2023September 25, 2023 ~ squarishbracket ~ 4 Comments

A hat puzzle

Infinitely many prisoners are assembled in a line as pictured. Each knows their place in the line. Each wears either a black or white hat, and each can only see the hats in front of them. Starting from the back of the line, each prisoner has to guess the color of their own hat. The prisoners were allowed to coordinate before the hats were assigned, but now no communication is allowed. Even the guesses must be silently submitted.

If only finitely many prisoners guess wrong, then everybody goes free. Can they succeed?

(Pause for thought.)

Amazingly, yes! Here’s the strategy:

Label white hats as 1 and black as 0. Then an assignment of hats becomes an infinite binary sequence, i.e. an element of 2^ℕ. Define an equivalence relation called E₀ on 2^ℕ as follows:

x E₀ y if and only if (∃n ∈ ℕ) (∀m > n) (x_m = y_m)
“x and y eventually agree”

When the prisoners meet up beforehand, they coordinate by agreeing on a choice of one representative from each class.

Once they’re in the room, every prisoner can see all but a finite number of hats. So they all know exactly which equivalence class they’re in. Now each prisoner guesses as if they were in the representative sequence from this class. Since the actual sequence and the representative sequence eventually agree, the prisoners’ guesses eventually agree with reality, and so they go free!

Making choices is hard

I talked about this puzzle a few years ago in this post. Several commenters balked at the solution and said something like: “but there are uncountably many equivalence classes, so therefore the prisoners need to be able to coordinate on uncountably many representatives. Surely this is unreasonable!”

I think that uncountably many representatives is not what’s at issue here. Consider the equivalence relation on the reals defined by:

x ~_ℤ y if and only if x – y ∈ ℤ.

Here we also have uncountably many equivalence classes, but the prisoners could easily come to an agreement on which representative to pick. They could for instance agree to choose the unique representative which lies in the interval [0,1). Here the prisoners are able to coordinate on uncountably many representatives, simply by agreeing on a function (f(x) = x mod 1) which takes each real to the representative for its class. A function f like this is called a reduction of ~_ℤ to =_ℝ, as it converts the problem of deciding x ~_ℤ y into the problem of deciding if two real numbers are equal (in particular, f(x) =_ℝ f(y)).

Now, is there a function f from 2^ℕ to 2^ℕ that takes an infinite binary sequence to the representative sequence for its class? That is, is there a reduction of E₀ to the identity relation on 2^ℕ? Sure! Each E₀-class C is non-empty, so we can “make a choice” of any element γ_C ∈ C. Then set f(x) = γ_[x], where [x] denotes the equivalence class of x.

I highlighted the key phrase in the above definition: make a choice. I said that we could choose an element from each class, but didn’t tell you how. And this is a problem for the prisoners! For them to all agree on the function’s values, they must be able to communicate how this choice is made to each other.

In the case of the equivalence relation ~_ℤ, we were able to find a precise recipe for choosing representatives, namely the definition of the function (x ↦ x mod 1). But can the prisoners find a precise recipe for choosing representatives for the E₀-classes?

That is, is there a definable function that reduces E₀ to =_ℝ? Well, what exactly is definability?

What is definability?

A major theme of descriptive set theory is the following identification, which I’d like to try to motivate:

DEFINABLE = BOREL

A definition is a syntactic thing. It’s a sentence with a free variable, like “x is the 15th digit in the decimal expansion of π” or “f is the identity function on ℝ”. To precisely state what definability is, we must specify a formal language to work in. The simplest logical language is that given by propositional logic. Here we begin with an alphabet, a countable set of basic atomic propositions, and build all other sentences through finite conjunctions, disjunctions, and negation.

Returning to our puzzle, we were interested in describing 2^ℕ and its subsets. We want our atomic propositions to represent easily definable subsets of 2^ℕ, or equivalently, simple properties of infinite binary sequences. A natural choice for these basic atomic properties is P_nm = “x’s nth bit is m”, interpreted as defining the set {x ∈ 2^ℕ | x’s nth bit is m}. Translated back to prisoners and hats, these are sentences like “Prisoner 35 is wearing a black hat” or “Prisoner 15 is wearing a white hat”. Intuitively, everything that can be said about infinite binary sequences, should be in principle expressible just in terms of sentences like these.

With finite conjunctions, disjunctions, and negation, we can define sets like {x ∈ 2^ℕ | x starts with 010110} and {x ∈ 2^ℕ | x’s first two bits agree}. Identifying 0 with “left” and 1 with “right”, we can draw these sets as subsets of the infinite binary tree:

How about a set like {x ∈ 2^ℕ | x contains at least one 1}?

What we want is a proposition like “x’s first bit is 0 or x’s second bit is 0 or …”, i.e.

(P₀₀ ∨ P₁₀ ∨ P₂₀ ∨ …), or \/_n∈ℕ P_n,0

What we need is the ability to take countably infinite conjunctions and disjunctions. Ordinary propositional logic doesn’t allow this. So we graduate to countable propositional logic. In other words, we expand the syntax by closing it under countable conjunctions and disjunctions:

For any countable collection of sentences {φ_n | n ∈ ℕ},
/\_nφ_n and \/_nφ_n are also sentences

On the semantic side, our collection of definable sets is now closed under countable unions, intersections, and negations. For the measure theorist, this is a familiar object: we’ve just defined a sigma-algebra!

(Technical note: when we say “closed under countable unions and intersections”, we also include empty unions and intersections, which correspond to ∅ and X, respectively. For notational convenience, we introduce the symbols ⊥ and ⊤ into our syntax, thought of as the atomic propositions “False” and “True”.)

In general there are many different sigma algebras you can put on a set, corresponding to different choices of the atomic propositions. But when our set is also a topological space, as in ℝ and 2^ℕ, there’s a natural choice of sigma-algebra, called the Borel sigma-algebra. Here we take our atomic propositions to define the basic (or sub-basic) open sets. Then the Borel sets are all the sets constructible through countable unions and intersections from basic opens, or equivalently, the sets definable in countable propositional logic.

In 2^ℕ the topology is generated by sets of the form

{x ∈ 2^ℕ | x’s nth bit is m} for any n, m ∈ ℕ,

which are the same as our earlier P_mn.

How about in ℝ? Here the topology is generated by basic sets of the form

(a,b) = {x ∈ ℝ | a < x < b} for any a, b ∈ ℚ

So we choose our atomic propositions accordingly: for any two rationals a,b, we have an atomic proposition P_ab, which we interpret as “a < x < b”.

Let’s pause to recall how we got here. We began by trying to define “definability”, and have found that there’s a natural way to interpret countable propositional logic through Borel sigma algebras on topological spaces. We have an atomic proposition for each (sub-)basic open set, and every set is defined by some countable propositional sentence. As we vary our interpretation of the atomic propositions, we move between different topological spaces.

The question “can the prisoners coordinate on a strategy?” has now taken on a definite form: “is there a Borel subset of 2^ℕ that picks exactly one element from each equivalence class?” And it turns out that the answer is no! For the prisoners to coordinate on a choice function, they need more syntactic resources at hand than countable propositional logic.

(To be continued…)