How hard is classification? Equivalence relations and Borel reductions

September 26, 2023September 26, 2023 ~ ~ 2 Comments

What is classification?

Classification is one of the most basic human activities. We wake up to a world of vibrant experience and immediately begin structuring it, organizing it into objects and actions, people and animals, edible and non-edible, friend and foe, and so on. Eventually our system of classifications becomes immense and interconnected, partitioning up the world of blooming buzzing confusion into a million tiny but intelligible pieces.

In the real world, classification is often vague. In math, it can be made a bit more precise through the notion of an equivalence relation. An equivalence relation can be thought of in two ways. First, concretely, it’s a way to “carve up” a set, partitioning it into disjoint pieces called equivalence classes. Every element x of the original set appears in exactly one equivalence class, which is referred to as [x].

More abstractly, an equivalence relation on a set X is a binary relation E on X satisfying three axioms:

Reflexivity: (∀x ∈ X) (x E x)
Symmetry: (∀x,y ∈ X) (x E y ⇒ y E x)
Transitivity: (∀x,y,z ∈ X) (x E y E z ⇒ x E z)

One can prove that any binary relation satisfying these three axioms yields a carving-up of X in the sense described above.

One thing that classification systems allow you to do is to coarse-grain the world, forgetting about the finer details and remembering only higher-order properties. Rather than think about my dog in particular, I can think about the class of all dogs, and treat this class as an object in its own right. Mathematically, this is called quotienting.

Quotienting is a very common mathematical move that appears wherever there are equivalence relations around. If E is an equivalence relation on a set X, then the quotient of X by E (written X/E) is just the set of all the equivalence classes.

X/E = { [x] | x ∈ X }

Quotienting can change structures in interesting and complicated ways. One of the most common examples is ℤ_n (the integers mod n), which we get by quotienting the integers ℤ by the “differs-by-n” equivalence relation:

x ~ y ⇔ |x – y| = n

For instance, in ℤ₅, the number 2 is identified with the equivalence class [2] = {…, -8, -3, 2, 7, 12, …}.

Equally common, the real numbers are often defined by quotienting. You start with the set of all Cauchy sequences of rational numbers, and consider the equivalence relation of “converging to one another” between sequences:

x ~ y ⇔ lim_n→∞ |x_n – y_n| = 0

A real number is defined to be an equivalence class of such sequences. This is a good example of coarse-graining in practice. You never really think of real numbers as sets of Cauchy sequences of rationals (outside of an analysis course). Once you quotient out by this equivalence relation, you forget about this “internal structure” and treat each real number as a primitive object. Similarly, in ℤ₅ you think of 2 as a primitive element, not an infinite set of integers.

Some classifications are harder than others

Let’s quickly recap the discussion in the last post.

We began with an infinite group of prisoners whose freedom rested on their ability to pick representatives from a particular equivalence relation on Cantor space, 2^ℕ. Cantor space can be thought of in many ways. In our case, it’s the space of all ways of assigning black and white hats to the lineup. It’s also the space of all infinite binary sequences, or equivalently all functions from ℕ to {0,1}. And it can be visualized as the infinite paths through the complete infinite binary tree.

The equivalence relation the prisoners found themselves stuck with was the “eventually agrees” relation, E₀, defined by:

x E₀ y ⇔ (∃n ∈ ℕ) (∀m > n) (x_m = y_m)

For instance, here’s what the equivalence class of the all-zeros sequence 000… looks like:

The prisoners had to find some way of agreeing on a choice of representative from each equivalence class. There’s actually a few different ways to formalize this idea: transversals, selectors, and reductions.

A transversal of an equivalence relation E on X is a subset A ⊆ X which intersects each E-class exactly once.

(∀C ∈ X/E) (|A ∩ C| = 1)

A selector for an equivalence relation E on X is a function f: X → X which takes the elements of an equivalence class C to the representative element for C.

(∀x ∈ X) (f(x) ∈ [x]_E) and (∀x,y ∈ X) (xEy ⇒ f(x) = f(y))

And finally (and most importantly) there’s the idea of a reduction. This idea is significantly more general than the previous two, and will play a big role in the upcoming posts. First, the formal definition:

Given two sets X,Y and equivalence relations E (on X) and F (on Y),
a reduction of E to F is a function f: X → Y such that
(∀x, x’ ∈ X) (x E x’ ⟺ f(x) F f(x’))

If such a function exists, we say that E is reducible to F and write E ≤ F.

Informally, reducibility measures the relative complexity of equivalence relations. If E ≤ F, then E is “simpler” or “easier. to compute” than F. For instance, if we want to check if two elements x and x’ are E-related, we can instead check if f(x) and f(x’) are F-related. Thus if we had an oracle for F then we could figure out E (using f).

A special case of reduction is where F is just the identity relation =_Y on Y, in which case we have:

xEx’ if and only if f(x) = f(x’)

Now, if we’re allowed to use any function f whatsoever, then this notion of reducibility ends up not being not so interesting. For instance, we can reduce any equivalence relation to equality by choosing Y = X/E and defining f(x) = [x]_E. More generally, reducibility with arbitrary functions turns out to just be a matter of comparing the cardinalities of the quotients. Thus we shift our focus from arbitrary functions to definable functions, in the sense of Borel.

In the last post we talked about Borel subsets of a space, not functions. But a function can be identified with its graph and treated as a subset of X × Y. So f: X → Y is Borel if and only if it is Borel as a subset of X × Y. Borel relations are defined similarly.

(Reminder: the Borel sets in a topological space X are just the sets you can construct out of open sets through countable unions, intersections, and complements. Equivalently, they’re the sets definable in countable propositional logic, with atomic propositions interpreted as defining the basic open sets.)

(Notice that we’re taking advantage of the relationship between definability and topology: if X and Y are both topological spaces, then the product X × Y already has a canonical “product topology”, generated by products of open sets in X and Y. So once we know how to interpret the atomic propositions in X and in Y, we can automatically interpret atomic propositions in X × Y.)

We’ve finally arrived at the central concept: Borel reducibility or definable reducibility.

Given two topological spaces X,Y and equivalence relations E (on X) and F (on Y),
a Borel reduction of E to F is a Borel function f: X → Y such that
(∀x, x’ ∈ X) (x E x’ ⟺ f(x) F f(x’))

If such a function exists, we say that E is Borel reducible to F and write E ≤_B F.

Classifying classifications

Let me now get to the punchline.

We carve up the mathematical universe by defining equivalence relations on the sets we’re interested in. When these sets are topological spaces, we can compare these equivalence relations through the relationship of Borel reducibility. At the end of the last post, I told you that there was no Borel transversal of E₀. By the same token, there is no Borel reduction of E₀ to the identity relation on Cantor space. “Eventual equality” is a strictly more complicated notion than “equality”.

This might not sound very surprising. Of course eventual equality is more complicated than equality, it has an extra word in its name! But it turns out that lots of complicated-looking equivalence relations are Borel reducible to the equality relation. Such equivalence relations are called smooth or concretely classifiable. For example, the relationship of “similarity” between square matrices (intuitively, two matrices are similar if they represent the same linear transformation but in different bases) turns out to be smooth.

E is smooth if and only if there’s a Borel reduction E ≤_B =_ℝ

(Notice that I defined it here in terms of identity on ℝ rather than 2^ℕ. Not all identity relations are of equal complexity, but these two are. We’ll see that for many purposes ℝ and 2^ℕ are interchangeable.)

The smooth equivalence relations are the simplest ones out there. If E is smooth, then there’s some definable way to assign a different real number to each class. We can begin to draw a picture of the Borel reducibility hierarchy:

Natural questions immediately arise. Are there any equivalence relations strictly between smooth and E₀? (No.) Are there equivalence relations above E₀? (Yes, many.) Is there a most complex equivalence relation? (No, for any equivalence relation there’s a strictly harder one.) Are there equivalence relations of incomparable complexity? (Yes, in fact there’s uncountably many such equivalence relations!)

The Borel reducibility hierarchy for equivalence relations is a relatively recent discovery in the history of mathematics. It’s only about twenty years old. As such, there are many open questions about its structure. For instance, at the time of writing it’s unknown whether there’s exactly one class directly above E₀. There could be multiple incomparable classes directly above E₀, or it could be that for any equivalence relation E above E₀, there’s another one strictly in between E₀ and E.

The big balloon represents the unknown territory waiting to be explored. But one thing that is clear at this point is that the internal structure of this balloon is very rich. In upcoming posts I hope to describe some of what we do know about it, and describe some recent attempts to probe its structure using techniques in model theory and infinitary first-order logic.

Choosing things is hard: infinite hats, definability, and topology

September 21, 2023April 16, 2026 ~ ~ 4 Comments

A hat puzzle

Infinitely many prisoners are assembled in a line as pictured. Each knows their place in the line. Each wears either a black or white hat, and each can only see the hats in front of them. Starting from the back of the line, each prisoner has to guess the color of their own hat. The prisoners were allowed to coordinate before the hats were assigned, but now no communication is allowed. Even the guesses must be silently submitted.

If only finitely many prisoners guess wrong, then everybody goes free. Can they succeed?

(Pause for thought.)

Amazingly, yes! Here’s the strategy:

Label white hats as 1 and black as 0. Then an assignment of hats becomes an infinite binary sequence, i.e. an element of 2^ℕ. Define an equivalence relation called E₀ on 2^ℕ as follows:

x E₀ y if and only if (∃n ∈ ℕ) (∀m > n) (x_m = y_m)
“x and y eventually agree”

When the prisoners meet up beforehand, they coordinate by agreeing on a choice of one representative from each class.

Once they’re in the room, every prisoner can see all but a finite number of hats. So they all know exactly which equivalence class they’re in. Now each prisoner guesses as if they were in the representative sequence from this class. Since the actual sequence and the representative sequence eventually agree, the prisoners’ guesses eventually agree with reality, and so they go free!

Making choices is hard

I talked about this puzzle a few years ago in this post. Several commenters balked at the solution and said something like: “but there are uncountably many equivalence classes, so therefore the prisoners need to be able to coordinate on uncountably many representatives. Surely this is unreasonable!”

I think that uncountably many representatives is not what’s at issue here. Consider the equivalence relation on the reals defined by:

x ~_ℤ y if and only if x – y ∈ ℤ.

Here we also have uncountably many equivalence classes, but the prisoners could easily come to an agreement on which representative to pick. They could for instance agree to choose the unique representative which lies in the interval [0,1). Here the prisoners are able to coordinate on uncountably many representatives, simply by agreeing on a function (f(x) = x mod 1) which takes each real to the representative for its class. A function f like this is called a reduction of ~_ℤ to =_ℝ, as it converts the problem of deciding x ~_ℤ y into the problem of deciding if two real numbers are equal (in particular, f(x) =_ℝ f(y)).

Now, is there a function f from 2^ℕ to 2^ℕ that takes an infinite binary sequence to the representative sequence for its class? That is, is there a reduction of E₀ to the identity relation on 2^ℕ? Sure! Each E₀-class C is non-empty, so we can “make a choice” of any element γ_C ∈ C. Then set f(x) = γ_[x], where [x] denotes the equivalence class of x.

I highlighted the key phrase in the above definition: make a choice. I said that we could choose an element from each class, but didn’t tell you how. And this is a problem for the prisoners! For them to all agree on the function’s values, they must be able to communicate how this choice is made to each other.

In the case of the equivalence relation ~_ℤ, we were able to find a precise recipe for choosing representatives, namely the definition of the function (x ↦ x mod 1). But can the prisoners find a precise recipe for choosing representatives for the E₀-classes?

That is, is there a definable function that reduces E₀ to =_ℝ? Well, what exactly is definability?

What is definability?

A major theme of descriptive set theory is the following identification, which I’d like to try to motivate:

DEFINABLE = BOREL

A definition is a syntactic thing. It’s a sentence with a free variable, like “x is the 15th digit in the decimal expansion of π” or “f is the identity function on ℝ”. To precisely state what definability is, we must specify a formal language to work in. The simplest logical language is that given by propositional logic. Here we begin with an alphabet, a countable set of basic atomic propositions, and build all other sentences through finite conjunctions, disjunctions, and negation.

Returning to our puzzle, we were interested in describing 2^ℕ and its subsets. We want our atomic propositions to represent easily definable subsets of 2^ℕ, or equivalently, simple properties of infinite binary sequences. A natural choice for these basic atomic properties is P_nm = “x’s nth bit is m”, interpreted as defining the set {x ∈ 2^ℕ | x’s nth bit is m}. Translated back to prisoners and hats, these are sentences like “Prisoner 35 is wearing a black hat” or “Prisoner 15 is wearing a white hat”. Intuitively, everything that can be said about infinite binary sequences, should be in principle expressible just in terms of sentences like these.

With finite conjunctions, disjunctions, and negation, we can define sets like {x ∈ 2^ℕ | x starts with 010110} and {x ∈ 2^ℕ | x’s first two bits agree}. Identifying 0 with “left” and 1 with “right”, we can draw these sets as subsets of the infinite binary tree:

How about a set like {x ∈ 2^ℕ | x contains at least one 1}?

What we want is a proposition like “x’s first bit is 0 or x’s second bit is 0 or …”, i.e.

(P₀₀ ∨ P₁₀ ∨ P₂₀ ∨ …), or \/_n∈ℕ P_n,0

What we need is the ability to take countably infinite conjunctions and disjunctions. Ordinary propositional logic doesn’t allow this. So we graduate to countable propositional logic. In other words, we expand the syntax by closing it under countable conjunctions and disjunctions:

For any countable collection of sentences {φ_n | n ∈ ℕ},
/\_nφ_n and \/_nφ_n are also sentences

On the semantic side, our collection of definable sets is now closed under countable unions, intersections, and negations. For the measure theorist, this is a familiar object: we’ve just defined a sigma-algebra!

(Technical note: when we say “closed under countable unions and intersections”, we also include empty unions and intersections, which correspond to ∅ and X, respectively. For notational convenience, we introduce the symbols ⊥ and ⊤ into our syntax, thought of as the atomic propositions “False” and “True”.)

In general there are many different sigma algebras you can put on a set, corresponding to different choices of the atomic propositions. But when our set is also a topological space, as in ℝ and 2^ℕ, there’s a natural choice of sigma-algebra, called the Borel sigma-algebra. Here we take our atomic propositions to define the basic (or sub-basic) open sets. Then the Borel sets are all the sets constructible through countable unions and intersections from basic opens, or equivalently, the sets definable in countable propositional logic.

In 2^ℕ the topology is generated by sets of the form

{x ∈ 2^ℕ | x’s nth bit is m} for any n, m ∈ ℕ,

which are the same as our earlier P_mn.

How about in ℝ? Here the topology is generated by basic sets of the form

(a,b) = {x ∈ ℝ | a < x < b} for any a, b ∈ ℚ

So we choose our atomic propositions accordingly: for any two rationals a,b, we have an atomic proposition P_ab, which we interpret as “a < x < b”.

Let’s pause to recall how we got here. We began by trying to define “definability”, and have found that there’s a natural way to interpret countable propositional logic through Borel sigma algebras on topological spaces. We have an atomic proposition for each (sub-)basic open set, and every set is defined by some countable propositional sentence. As we vary our interpretation of the atomic propositions, we move between different topological spaces.

The question “can the prisoners coordinate on a strategy?” has now taken on a definite form: “is there a Borel subset of 2^ℕ that picks exactly one element from each equivalence class?” And it turns out that the answer is no! For the prisoners to coordinate on a choice function, they need more syntactic resources at hand than countable propositional logic.

(To be continued…)

The Hypergame Paradox

December 15, 2021April 16, 2026 ~ ~ 2 Comments

Credit to Joel David Hamkins, who I heard discussing this paradox on an episode of the podcast My Favorite Theorem.

Define a finite game to be any two-player turn-based game such that every possible playthrough ends after finitely many turns. For example, tic-tac-toe is a finite game because every game ends in at most nine turns.. So is chess with the 50-move-rule enforced (if 50 moves are taken without any pawn advances or captures, then the game ends in a draw). In both these examples, there’s an upper bound to how long the game can last, but this is not required. A game of transfinite Nim would count as finite; every game lasts for only finitely many turns, even though there is no upper bound on the number of turns it takes.

Now, consider the game Hypergame. To play Hypergame, Player 1 begins by choosing any finite game G. Then Player 2 plays the first move of G, Player 1 plays the second move of G, and so on until the game is completed. (Since Player 1 chose a finite game, this will always happen after some finite amount of time.)

Is Hypergame a finite game? Yes, we can easily see that it must be. Whatever game Player 1 chooses will be over after n steps, for some finite n. So that playthrough of Hypergame will have taken n+1 steps.

But if Hypergame is a finite game, then it is a valid choice for the first move of Hypergame! So we can now imagine the following playthrough of Hypergame:

A Troubling Playthrough of Hypergame
Player 1: For the finite game that we shall play, I pick Hypergame.
Player 2: Hm, okay. So now I’m playing the first move of Hypergame. So I must now choose any finite game. I’ll choose Hypergame!
Player 1: Alright so I’m again playing the first move of this new game of Hypergame. I’ll choose Hypergame again.
Player 2: And I choose Hypergame again as well.
So on forever…

At no point does either player violate the rules of Hypergame. And yet, we ended up with an infinite playthrough of Hypergame, which we proved was impossible! So we have a contradiction. What is the resolution?

✵✵✵

Here’s one possible resolution, analogous to the resolutions of similar set-theoretic paradoxes.

We can think about a game as a directed rooted tree. The vertices of the tree correspond to game states, and the edges correspond to the allowed moves. The root of the tree corresponds to the starting game state, and Player 1 gets to choose which edge to travel along first. From the new vertex, Player 2 decides the next edge to travel along. And so on. The tree’s leaves correspond to ending states of the game, and each leaf is labelled according to which player won in that ending state.

In this framework, what is a finite game? As I defined it above, a finite game is just any directed rooted tree such that every path starting at the root ends at a leaf after passing through finitely many edges. This corresponds perfectly to the idea that every possible playthrough of the game takes only finitely many turns. Notice that a finite game is not necessarily a finite tree! The game tree of a finite game is only finite if it’s also finitely branching. In other words, for a game to have a finite tree requires not just that every playthrough is finitely long, but that each player always has only finitely many choices on their turn.

For instance, the game tree of Hypergame is not a finite tree, because Player 1 has infinitely many possible finite games to choose from on his first turn. How big exactly is the game tree of Hypergame? We know that we have to have a vertex corresponding to the start of any finite game, so it must be at least as large as the set of all finite games. But how large is this set?

This is where we run into problems. The game tree of a finite game can be arbitrarily large. Consider the game which starts by Player 1 choosing any real number and then immediately losing. The height of the game tree is 1, but its width is the cardinality of the continuum. Similarly for any set X we can find a game whose tree has cardinality |X|. This means that there are finite games of arbitrary cardinalities. But then as a corollary to the nonexistence of a largest cardinality, we know that there is no set of all finite games! And this implies that Hypergame has no game tree! More precisely, there is no set corresponding to the game tree of Hypergame as we defined it.

Couldn’t we instead think about Hypergame as a proper class? Sure! But then when we choose a finite game in our first move, we couldn’t be picking Hypergame, as the property “is a finite game” would only apply to sets and not proper classes. This means that we can’t actually select Hypergame as our first move! And so we avoid the paradoxical conclusion that we can keep picking Hypergame ad infinitum.

I find it quite fascinating that the seemingly innocent notion of a finite game can lead us into paradoxes involving proper classes!

ZFC as One of Humankind’s Great Inventions

December 14, 2021June 16, 2022 ~ ~ 9 Comments

Recently I told a friend that I thought ZFC was one of humankind’s greatest inventions. He pointed out that it was pretty bold to claim this about something that most of mankind has never heard of, which I thought was a fair objection. After thinking for a bit, I reflected that the sense of greatness I meant wasn’t really consequentialist, and thus it was independent of how many people know what ZFC is, or even how many people’s lives are affected in any way by it. Instead I intended greatness in a sort of aesthetic and intellectual sense.

The closest analogy to ZFC outside of math is the idea of a “theory of everything” for physics. If we found a theory of everything for physics, it’d likely have a bunch of important practical consequences, and that’d be part of what makes it a great invention. But it would also be a great invention in an intellectual sense, as a discovery of something fundamental and unifying of many seemingly disparate phenomena we observe. This is what ZFC is like: a mathematical theory of everything. One reason this analogy is imperfect is that due to the incompleteness theorems, we know that there can be no “theory of everything” for mathematics. (Any theory of everything will have at least one thing it can’t prove, namely its own consistency.) So ZFC’s greatness can’t come from being a perfect theory of everything, because we know that it is not. Nonetheless, ZFC serves as a foundation for virtually all known mathematics, and this is what I think is so incredible about it.

What does it mean for something to “serve as a foundation” for math? ZFC is a foundation in (at least) three ways: (1) in terms of its ability to define virtually all mathematical concepts, (2) in terms of its structures being rich enough to contain objects that come from virtually all fields of math, and (3) in terms of being an axiom system that suffices to prove virtually every result in known mathematics.

Syntax

Virtually every mathematical concept you can think of has a definition in the language of ZFC. For example, we have definitions for numbers like “π” and “√2”, sets like ℕ and ℝ, algebraic objects like the group S₅ and the ring ℚ[x], geometric objects like Platonic solids and differential manifolds, computational objects like Turing machines and cellular automata, and even logical entities like models of first order theories and proofs within formal systems. What makes this especially impressive is the simplicity of the language: it uses nothing besides the basic symbols of first order logic and one binary relation symbol: ∈. So one thing that ZFC teaches us is that virtually every concept in mathematics can be defined just in terms of the set membership relation, and all mathematics can be understood as exploring the properties of this relation.

Semantics

Models of ZFC are insanely richly structured. You can navigate within them to find sets corresponding to every object that mathematicians study. π has a representative set within any model of ZFC, as does the Monster group or the torus. These representative sets are not always perfect: there are models of ZFC where ℝ is countable, for instance. But within the model, they nonetheless share enough similarities with the original objects that virtually everything you can prove about the original object, remains true of the ZFC-representative.

Proof

Finally, ZFC is a computable set of sentences, and we may inquire about what can be proven from it. Keeping up the ambition of the previous two sections, we might want to claim that all mathematical truths can be proven from ZFC. But due to the limitations of first order logic discovered over the last century, we now know that this goal is not achievable. The set of all first order truths of arithmetic is not computable, and so there must be some such truths that aren’t logical consequences of ZFC. Nonetheless, it is commonly claimed that virtually all mathematical truths can be derived from ZFC using the usual proof system for first order logic.

This is especially remarkable given the simplicity of ZFC. I believe that the intuitive content of each axiom could be explained to a smart middle schooler. Additionally, these axioms are extremely intuitively appealing. the most controversial of them has been choice, which is equivalent to the statement that the Cartesian product of non-empty sets is also non-empty. Second most controversial is probably the axiom of infinity, which just says that there’s an infinite set. The rest are even less hard to accept than these.

Now, the fact that you can prove virtually everything from ZFC doesn’t mean that you should. So don’t interpret me as saying that ZFC is of practical use to the daily work of mathematicians trying to prove things outside of set theory and logic. Again, an analogy to physics: we might discover a theory of everything that we know reproduces all the known phenomena of GR and QM, but find that it’s so hard to prove things that we are practically never better off using this theory to calculate things. Nonetheless, ZFC as a theory of everything teaches us that most of math can be understood as conceptually quite simple: the logical consequences of a fairly simple and computable set of sentences about sets. People make a big deal out of Euclid’s axiomatization of geometry, but this is a small feat relative to the axiomatization of all of mathematics.

Metamath

And not only can ZFC prove virtually everything in ordinary mathematics, but ZFC can prove much of what we know in metamathematics and logic itself. When logicians are studying model theory, or even when set theorists are studying ZFC, they are almost always working with ZFC as their meta-theory, meaning that they are making sure that all of their proofs could ultimately be expanded out as ZFC proofs. So the big results of logic, like the completeness theorem, the compactness theorem, the incompleteness theorems, the Löwenheim-Skolem theorems, are all theorems of ZFC.

The fact that ZFC can even talk about these model theoretic notions means that models of ZFC are able to talk about models of ZFC, which is where things get very meta. One can prove that every model of ZFC – every one of these crazily richly-structured universes containing virtually all of mathematics – contains another such model of ZFC. This follows from the reflection theorem, which again can be proven in ZFC!

Hopefully I have now roused enough interest in you to get you to take a look at some of the actual mathematics. You might be curious to know what exactly this theory is. And you’re in luck, it’s simple enough that I can write the whole theory in just nine lines!

Note that with the exception of the final axiom, Choice, the only symbols I’ve used are logical symbols and ∈. I used shorthand for Choice for the sake of readability, but this could be expanded out just like the others. I’m also using a convention where any free variables are considered to be universally quantified over, which shortens things further.

I’ll close with a one-sentence description for each axiom.

Extensionality: No two distinct sets have all the same elements.
Pairing: For any two sets, there’s a set containing just those two.
Union: The union of any set of sets exists.
Powerset: There is a set of all subsets of any set.
Specification: For any property Φ and any set x, you can form a set out of just those elements of x with that property.
Replacement: For any definable function and any set, the image of that set under the function exists.
Infinity: There’s an infinite set.
Regularity: Every non-empty set has a member that it shares nothing with.
Choice: For any set of nonempty sets, there is a function that picks out one element from each.

The fact that you can prove everything from the infinitude of primes to Fermat’s Last Theorem from just these basic principles, is really quite mind-blowing.

Hilbert-type Infinitary Logics

November 23, 2021November 24, 2021 ~ ~ Leave a comment

I want to describe a hierarchy of infinitary logics, and show some properties of one of these logics in particular.

First, a speedy review of first order logic. In the language of first order logic we have access to parentheses {(, )}, the propositional connectives {∧, ∨, ¬, →}, the equals sign {=}, quantifiers {∀, ∃}, and a countably infinite store of variables for quantification over {x₁, x₂, x₃, …}. These are the logical symbols common to any first-order language, but to complete the language we additionally specify a set of constant symbols {c₁, c₂, c₃, …}, function symbols {f₁, f₂, f₃, …}, and relation symbols {R₁, R_₂, R₃, …}. These sets can be any cardinality whatsoever. We can then define the set of grammatical sentences (“well-formed formulas”) of this language. We also interpret these symbols in a fairly straightforward way: a first-order structure has a universe of objects, and constants are assigned referents within that universe, function symbols are assigned n-ary functions on the universe, and relation symbols are assigned n-ary relations on the universe.

One consequence of the construction is that all of our sentences are finite, which puts an important cap on expressive power. Consider a language of arithmetic with constants for every natural number: {0, 1, 2, 3, …}. We might naturally want to say that the set of constants exhausts all the objects in our universe. But this takes an infinitely long sentence: ∀x (x=0 ∨ x=1 ∨ x=2 ∨ x=3 ∨ …). You might think you could be clever and find a finite expression of this idea, but it turns out that you can’t. (This is a consequence of Gödel’s first incompleteness theorem.)

So a natural extension of first-order-logic is to allow infinitely long sentences. For any two cardinal numbers α and β, define L_α,β to be first-order logic, but with conjunctions of any length < α allowed and blocks of quantifiers of any length < β. (Note that infinite conjunctions implies infinite disjunctions as well.) For example, L_ω,ω is ordinary first-order logic: conjunctions and quantifier blocks of any finite length. L_ω1,ω is first-order logic plus countably infinite conjunctions, but only finite quantifiers. L_ω,ω1 has finite conjunctions but countably infinite quantifiers. Question: what logic is this equivalent to?

Notice that countably infinitely many quantifiers use countably infinitely many variables, but if you only have finite conjunctions you can only use finitely many of them. So this ends up being equivalent to L_ω,ω. For this same reason, if β > α then L_α,β is no different from L_α,α.

L_ω1,ω is especially interesting to logicians, because it’s significantly stronger than first-order logic, but not so much stronger as to lose all nice proof-theoretic properties. In particular, there’s a sound and complete proof system for L_ω1,ω! It’s also quite simple: just the FOL proof system plus one new axiom and one infinitary deduction rule:

New axiom
For any m ∈ ω and any set of sentences {φ_n | n ∈ ω},
∧_n∈ω φ_n → φ_m

New inference rule
φ₀, φ₁, φ₂, … ⊢ ∧_n∈ω φ_n

If we allow deductions of any countably infinite (successor) ordinal type, then we get a sound and complete proof system. This means that for any countable set of L_ω1,ω-sentences Σ and any L_ω1,ω-sentence φ, you can deduce φ from Σ just in case Σ actually (ω₁,ω)-logically implies φ.

Infinite conjunctions give you a massive boost in expressive power going from FOL to L_ω1,ω. You can categorically define the natural numbers: Take Peano arithmetic and add to it the axiom: ∀x ∨_n∈ω (x = n). See why this works? We’ll refer back to this theory as PA*.

So L_ω1,ω is powerful enough to be able to categorically define the natural numbers. We might wonder if we can also categorically define all other countable structures. It turns out that while we can’t do that, we can do something slightly weaker. For any countable structure, there’s a single L_ω1,ω-sentence that defines it up to isomorphism among countable structures. The complexity of this sentence is a way of measuring the complexity of the structure, and has close connections to other measures of structure complexity. (The key word to learn more about this is Scott rank.)

What else can you do with L_ω1,ω? Here’s something really cool: Tarski’s theorem on the undefinability of truth tells us that in first-order logic you cannot define a truth predicate. But in L_ω1,ω, you can! Take a countable first-order language L. Add on the language of arithmetic (0, S, +, ×, ≤) to L. L is still countable, so enumerate all its sentences {φ₁, φ₂, φ₃, …}. Now, the sentence Tr(x) := ∨_n∈ω (x=n & φ_n) is a truth predicate: for any n, if φ_n is true then Tr(n) is true and if φ_n is false then Tr(n) is false.

You can express the idea that “infinitely many things satisfy φ(x)” with the following sentence:

∧_n∈ω ∀x₀∀x₁…∀x_n ∃y (φ(y) ∧ ∧_k∈ω(y ≠ x_k)).

Expanding this out:

∀x₀ ∃y (φ(y) ∧ y≠x₀)
∧
∀x₀∀x₁ ∃y (φ(y) ∧ y≠x₀ ∧ y≠x₁)
∧
∀x₀∀x₁∀x₂ ∃y (φ(y) ∧ y≠x₀ ∧ y≠x₁ ∧ y≠x₂)
∧
…

The first line says “there’s at least two things satisfying φ”, the second says “there’s at least three things satisfying φ”, and so on forever.

Finally, unlike FOL, L_ω1,ω is not compact. This means that there are sets of sentences without models, where every finite subset has a model. But it’s actually even worse than that! L_ω1,ω is strongly non-compact. You can have an unsatisfiable set of sentences, every countable subset of which has a model! Once again, this pretty remarkable fact has a simple proof:

Take the language of Peano arithmetic and add on ℵ₁-many constant symbols: {c_α | α < ω₁}. Now add to PA* the set of sentences {c_α ≠ c_β | α ≠ β, both in ω₁}. Let’s call this theory Σ. Every countable subset of Σ has a model, but Σ itself doesn’t. You can always take countably many constant symbols and assign them distinct referents in ℕ such that some natural numbers are left out. However, you can’t assign uncountably many distinct referents in ℕ while still leaving out some natural numbers, by a simple cardinality argument.

There’s an interesting twist here: Consider the set of sentences {c_α ≠ c_β | α ≠ β, both < ω₁^CK} ⊂ Σ, where ω₁^CK is the Church-Kleene ordinal, the smallest uncomputable ordinal. ω₁^CK is countable, so this is a countable set of sentences. This set of sentences has a model, but to obtain it you must choose a bijection from the set of constants {c_α | α < ω₁^CK} to the natural numbers. By the definition of ω₁^CK, this bijection is uncomputable! There are uncountably many countable ordinals above ω₁^CK (there are countably many computable ordinals, and uncountably many countable ordinals), so while every countable subset of Σ has a model, uncountably many of these models will be uncomputable!

Forcing and the Independence of CH (Part 2)

October 2, 2021April 16, 2026 ~ ~ 1 Comment

Part 1 here

Part 2: How big is 𝒫(ω)?

Now the pieces are all in place to start applying forcing to prove some big results. Everything that follows assumes the existence of a countable transitive model M of ZFC.

First, a few notes on terminology.

The language of ZFC is very minimalistic. All it has on top of the first-order-logic connectives is a single binary relation symbol ∈. Nonetheless, people will usually talk about theorems of ZFC as if it has a much more elaborate syntax, involving symbols like ∅ and ω and ℵ₁ and terms like “bijection” and “ordinal”. For instance, the Continuum Hypothesis can be written as “There’s a bijection between 𝒫(ω) and ℵ₁“, which is obviously using much more than just ∈. But these terms are all simply shorthand for complicated phrases in the primitive language of ZFC: for instance a sentence “φ(∅)” involving ∅ could be translated as “∃x (∀y ¬(y ∈ x) ∧ φ(x))”, or in other words “there’s some set x that is empty and φ(x) is true”.

Something interesting goes on when considering symbols like ∅ and ω and ℵ₁ that act like proper names. Each name picks out a unique set, but for some of these names, the set that gets picked out differs in different models of ZFC. For example, the interpretation of the name “ℵ₁” is model-relative; it’s meant to be the first uncountable cardinal, but there are countable models of ZFC in which it’s actually only countably large! If this makes your head spin, it did the same for Skolem: you can find more reading here.

On the other hand, the name “∅” always picks out the same set. In every model of ZFC, ∅ is the unique set that contains nothing at all. When a name’s interpretation isn’t model-relative, it’s called absolute. Examples include “∅” and “1,729”. When a name isn’t absolute, then we need to take care to distinguish between the name itself as a syntactic object, and the set which it refers to in a particular model of ZFC. So we’ll write “ℵ₁” when referring to the syntactic description of the first uncountable cardinal, and ℵ₁^M when referring to the actual set that is picked out by this description in the model M.

With that said, let’s talk about a few names that we’ll be using throughout this post:

In M, ω is the conventional name for the set that matches the description “the intersection of all inductive sets”, where inductive sets are those that contain the ordinal 0 and are closed under successor. In any transitive model of ZFC like M, ω is exactly ℕ, the set of natural numbers. Since we’re restricting ourselves to only considering transitive models, we can treat ω as if it’s unambiguous; its interpretation won’t actually vary across the models we’re interested in.

ω₁ corresponds to the description “the smallest uncountable ordinal”. This description happens to perfectly coincide with the description “the first uncountable cardinal” in ZFC, which has the name ℵ₁. But unlike with ω, different transitive models pick out different sets for ω₁ and ℵ₁. So in a model M, we’ll write ω₁^M for the set that M believes to be the smallest uncountable ordinal and ℵ₁^M for the first uncountable cardinal.

ω₂ is the name for the first ordinal for which there’s no injection into ω₁ (i.e. the first ordinal that’s larger than ω₁). Again, this is not absolute: ω₂^M depends on M (although again, ω₂^M is the same as ℵ₂^M no matter what M is). And in a countable model like those we’ll be working with, ω₂^M is countable. This will turn out to be very important!

𝒫(ω) is the name for the the set that matches the description “the set of all subsets of ω”. Perhaps predictably at this point, this is not absolute either. So we’ll have to write 𝒫(ω)^M when referring to M’s version of 𝒫(ω).

Finally, ZFC can prove that there’s a bijection from 𝒫(ω) to ℝ, meaning that this bijection exists in every model. Thus anything we prove about the size of 𝒫(ω) can be carried over to a statement about the size of ℝ. Model-theoretically, we can say that in every model M, |𝒫(ω)^M| = |ℝ^M|. It will turn out to be more natural to prove things about 𝒫(ω) than ℝ.

Okay, we’re ready to make the continuum hypothesis false! We’ll do this by choosing a partially ordered set (P, ≤) whose extension as a Boolean algebra (B, ≤) has the property that no matter what M-generic filter G in B you choose, M[G] will interpret its existence as proof that |𝒫(ω)| ≥ ℵ₂.

Making |𝒫(ω)| ≥ ℵ₂

Let P = { f ∈ M | f is a finite partial function from ω₂^M × ω to {0,1} }. Let’s have some examples of elements of P. The simplest is the empty function ∅. More complicated is the function {((ω+1, 13), 1)}. This function’s domain has just one element, the ordered pair (ω+1, 13), and this element is mapped to 1. The function {((14, 2), 0), ((ω₁^M•2, 2), 1)} is defined on two elements: (14, 2) and (ω₁^M•2, 2). And so on.

We’ll order P by reverse inclusion: f ≤ g iff f ⊇ g. Intuitively, f ≤ g if f is a function extension of g: f is defined everywhere that g is and they agree in those places. For instance, say f = {((ω, 12), 0)}, g = {((ω, 12), 0), ((13, 5), 1)}, h = {((ω, 12), 1)}, and p = {((ω, 11), 1)}. Check your understanding by verifying that (1) g ≤ f, (2) f and h are incomparable and have no common lower bound, and (3) f and p are incomparable but do have a common lower bound (what is it?).

The empty function ∅ is a subset of every function in P, meaning that ∅ is bigger than all functions. Thus ∅ is the top element of this partial order. And every f in P is finite, allowing a nice visualization what the partial order (P, ≤) looks like:

Now, G will be an M-generic ultrafilter in the Boolean extension (B, ≤) of (P, ≤). G is the set that we’re ultimately adding onto M, so we want to know some of its properties. In particular, how will we use G to show that |𝒫(ω)| = ℵ₂?

What we’re going to do is construct a new set out of G as follows: first take the intersection of P with G. (remember that G is an ultrafilter in B, so it doesn’t only contain elements of P). P⋂G will be some set of finite partial functions from ω₂^M × ω to {0,1}. We’ll take the union of all these functions, and call the resulting set F. What we’ll prove of F is the following:

(1) F := ⋃(P⋂G) is a total function from ω₂^M × ω to {0,1}
(2) For every α, β ∈ ω₂^M, F(α, •) is a distinct function from F(β, •).

A word on the second clause: F takes as input two things: an element of ω₂^M and an element of ω. If we only give F its first input, then it becomes a function from ω to {0,1}. For α ∈ ω₂^M, we’ll give the name F_α to the function we get by feeding α to F. Our second clause says that all of these functions are pairwise distinct.

Now, the crucial insight is that each F_α corresponds to some subset of ω, namely {n ∈ ω | F_α(n) = 1}. So F defines |ω₂^M|-many distinct subsets of ω. So in M[G], it comes out as true that |𝒫(ω)^M| ≥ |ω₂^M|. It’s also true in M[G] that |ω₂^M| = ℵ₂^M, and so we get that |𝒫(ω)^M| ≠ ℵ₁^M. This is ¬CH!

Well… almost. There’s one final subtlety: |𝒫(ω)^M| ≠ ℵ₁^M is what ¬CH looks like in the model M. For ¬CH to be true in M[G], it must be that that |𝒫(ω)^M[G]| ≠ ℵ₁^M[G]. So what if |𝒫(ω)^M| ≠ |𝒫(ω)^M[G]|, or ℵ₁^M ≠ ℵ₁^M[G]? This would throw a wrench into our proof: it would mean that M[G] believes that M’s version of 𝒫(ω) is bigger than M’s version of ℵ₁, but M[G] might not believe that its own version of 𝒫(ω) is bigger than its version of ℵ₁. This is the subject of cardinal collapse, which I will not be going into. However, M[G] does in fact believe that 𝒫(ω)^M = 𝒫(ω)^M[G] and that ℵ₁^M = ℵ₁^M[G].

Alright, so now all we need to do to show that M[G] believes ¬CH is to prove (1) that F is a total function from ω₂^M × ω to {0,1}, and (2) that for every α, β ∈ ω₂^M, F_α is distinct from F_β. We do this in three steps:

F is a function.
F is total.
For every α, β ∈ ω₂^M, F_α is distinct from F_β.

Alright so how do we know that F is a function? Remember that F = ⋃(P⋂G). What if some of the partial functions in P⋂G are incompatible with each other? In this case, their union cannot be a function. So to prove step 1 we need to prove that all the functions in P⋂G are compatible. This follows easily from two facts: that G is an ultrafilter in B and P is dense in B. The argument: G is an ultrafilter, so for any f, g ∈ P there’s some element f∧g ∈ B that’s below both f and g. All we know about f∧g is that it’s an element of B, but we have no guarantee that it’s also a member of P, our set of partial functions. In other words, we can’t say for sure that f∧g is actually a partial function from ω₂^M × ω to {0,1}. But now we use the fact that P is dense in B! By the definition of density, since f∧g is an element of B, P must contain some h ≤ f∧g. Now by transitivity, h ≤ f and h ≤ g. So f and g have a common function extension, meaning they must be compatible functions! Pretty magical right?

So F is a function. But how do we know that it’s total? We prove this by looking closely at the dense subsets of P. In particular, for any α ∈ ω₂^M and any n ∈ ω, define D_α,n := {f ∈ P | (α, n) ∈ dom(f)}. This is a dense subset of P. Why? Well, regardless of what α and n are, any function f in P is either already defined on (α, n), in which case we’re done, or it’s not, in which case f has a function extension with (α, n) in its domain. So from any element of P, you can follow the order downwards until you find a function with (α, n) in its domain. Since D_α,n is dense in P and G is M-generic, G must have some element in common with D_α,n. Thus G contains an element f of P that has (α, n) in its domain. This f is one of the functions that we union over to get F, so F must have (α, n) in its domain as well! And since α and n were totally arbitrary, F must be total.

Finally, why are the “F_α”s pairwise distinct? Again we construct dense subsets for our purposes: for any α, β ∈ ω₂^M, define D_α,β := {f ∈ P | f_α ≠ f_β}. This is clearly dense (you can always extend any f in P to make f_α and f_β disagree somewhere). So G contains an element f of P for which f_α ≠ f_β. And thus it must also be true of F that F_α ≠ F_β!

And we’re done! We’ve shown that once we add G to M, we can construct a new set F = ⋃(P⋂G) such that F encodes |ω₂^M|-many distinct subsets of ω, and thus that M[G] ⊨ |𝒫(ω)| ≥ ℵ₂.

Making |𝒫(ω)| ≥ ℵ₄₂₀

What’s great is that this argument barely relied on the “2” in ω₂^M. We could just as easily have started with P = {f ∈ M | f is a finite partial function from ω₄₂₀^M × ω to {0,1}}, constructed an M-generic filter G in the Boolean extension of P, then defined F to be ⋃(P⋂G).

G is still an ultrafilter in B and P is still dense in B, so F is still a function.

D_α,n := {f ∈ P | (α, n) ∈ dom(f)} is still a dense subset of P for any α ∈ ω₄₂₀^M and any n ∈ ω, so F is still total.

And D_α,β := {f ∈ P | f_α ≠ f_β} is still a dense subset of P for any α, β ∈ ω₄₂₀^M, so all the “F_α”s are pairwise distinct.

And now F encodes |ω₄₂₀^M|-many distinct subsets of ω! The final step, which I’m going to skip over again, is showing that M[G] believes that |ω₄₂₀^M| = |ω₄₂₀^M[G]|, i.e. that cardinal collapse doesn’t occur.

And there we have it: this choice of P gives us a new model M[G] of ZFC that believes that |𝒫(ω)| ≥ ℵ₄₂₀! Note how easy and quick this argument is now that we’ve gone through the argument for how to make M[G] believe |𝒫(ω)| ≥ ℵ₂. This is a great thing about forcing: once you really understand one application, other applications become immensely simpler and easier to understand.

Making |𝒫(ω)| = ℵ₁

Okay, we’ve made CH false. Now let’s make it true! This time our choice for (P, ≤) will be the following:

P = {f ∈ M | f is an M-countable partial function from ω₁^M to 𝒫(ω)^M}. Once more we order by reverse inclusion: f ≤ g iff f ⊇ g.

A crucial thing to notice is that I’ve merely required the functions in P to be M-countable, rather than countable. For a set to be M-countable is for there to exist an injection in M from set to ω. Any M-countable set is countable, but some countable sets may not be M-countable (like M itself!). In particular, we know that ω₁^M is actually a countable set, meaning that if we required true countability instead of just M-countability, then P would include some total functions! But M has no injection from ω₁^M to ω, so no function in P can be total. This will be important in a moment!

We get our M-generic ultrafilter G and define F := ⋂(P⋂G). Now we want to show that F is a total and surjective function from ω₁^M to 𝒫(ω). As before, we proceed in three steps:

F is a function
F is total
F is surjective

Step 1: G is an ultrafilter and P is dense in G, so the same argument works to show that all elements of P⋂G are compatible: (1) ultrafilter implies that any f, g in P⋂G have a least upper bound f∧g in B, (2) density of P in G implies that f∧g has a lower bound h in P, so (3) f and g have a common function extension . Thus F is a function.

Step 2: D_α := {f ∈ P | α ∈ dom(f)} is dense in P for any α ∈ ω₁^M, so F is total.

Step 3: D_A := {f ∈ P | A ∈ image(f)} is dense in P for any A ∈ 𝒫(ω)^M. Why? No f in P is total, so any f in P can be extended by adding one more point (α, A) where α ∉ dom(f). Thus A ∈ image(F) for any A ∈ 𝒫(ω), so F is surjective.

So F is a total surjective function from ω₁^M to 𝒫(ω)^M, meaning that in M[G] it’s true that |𝒫(ω)^M| ≤ ℵ₁^M. And since ZFC proves that |𝒫(ω)| > ℵ₀, it follows that M[G] ⊨ |𝒫(ω)| = ℵ₁.

You might have noticed that each of these arguments could just as easily have been made if we had started out with defining P as the finite partial functions in M from ω₁^M to 𝒫(ω)^M. This is true! If we had done this, then we still would have ended up proving that F was a total surjective function from ω₁^M to 𝒫(ω)^M, and therefore that |𝒫(ω)^M| = ℵ₁^M. However, this is where cardinal collapse rears its ugly head: the final step is to prove that |𝒫(ω)^M| = |𝒫(ω)^M[G]| and that ℵ₁^M = ℵ₁^M[G]. This requires that P be the countable partial functions rather than just the finite ones.

And with that final caveat, we see that M[G] believes that |𝒫(ω)| = ℵ₁.

Making |𝒫(ω)| ≤ ℵ₃₁₄

Let P = {f ∈ M | f is an M-countable partial function from ω₃₁₄^M to 𝒫(ω)} and order by reverse inclusion: f ≤ g iff f ⊇ g.

⋂(P⋂G) is again a total and surjective function from ω₃₁₄^M to 𝒫(ω), following the exact same arguments as in the previous section. Cardinal collapse doesn’t occur here either, so M[G] believes that |𝒫(ω)| ≤ ℵ₃₁₄.

✯✯✯

We’ve proven the independence of the Continuum Hypothesis from ZFC! We’ve done more: we’ve also shown how to construct models of ZFC in which we have lots of control over the size of 𝒫(ω). Want a model of ZFC in which |𝒫(ω)| is exactly ℵ₁₇₂₉? Just force with the right P and you’re good to go! There’s something a bit unsatisfactory about all this though, which is that in each application of forcing we’ve had to skim over some details about cardinal collapse that were absolutely essential to the proof going through. I hope to go into these details in a future post to close these last remaining gaps.

Forcing and the Independence of CH (Part 1)

October 1, 2021April 16, 2026 ~ ~ 1 Comment

Part 2 here.

Part 1: What is Forcing?

Forcing is a set-theoretic technique developed by Paul Cohen in the 1960s to prove the independence of the Continuum Hypothesis from ZFC. He won a Fields Medal as a result, and to this day it’s the only Fields Medal to be awarded for a work in logic. It’s safe to say that forcing is one of the most powerful techniques that set theorists have at their disposal as they explore the set theoretic multiverse.

Forcing has an intimidating reputation, which is somewhat deserved. The technique involves many moving parts and it takes some time to see how they all fit together. I found these three resources to be excellent for learning the basics of the technique:

Kenny Easwaran: A Cheerful Introduction to Forcing and the Continuum Hypothesis
Timothy Y. Chow: A beginner’s guide to forcing
Boban Velickovic: Introduction to Forcing (slides)

Each of these sources takes a slightly different approach, and I think that they complement each other quite well. I’ve produced a high-level outline of the argument for why forcing works, as well as how to use forcing to prove the independence of the CH. I don’t feel quite competent to present all of forcing from the ground up, so this outline is not meant to be self-contained. In particular, I’m gonna assume you’re familiar with the basics of first order logic and ZFC and are comfortable with Boolean algebras. I’m also going to skim over the details of cardinal collapse, which are still fairly opaque to me.

Forcing in 5 lines!

Fix a countable transitive model M ⊨ ZFC and a Boolean algebra (B, ≤) ∈ M.
Define M^B := { f ∈ M | f: M^B ⇀ B } ⊆ M.
Choose an M-generic ultrafilter G ⊆ B.
For every n in M^B, define val_G(n) := { val_G(m) | (m, b) ∈ n for some b in G }.
Define M[G] := { val_G(n) | n ∈ M^B }. This is a countable transitive model of ZFC containing all of M as well as G!

We start with a countable transitive model M of ZFC, and then construct a larger countable transitive model M[G] of ZFC that contains a new set G. Let me say a few words on each step.

Step 1
1. Fix a countable transitive model M ⊨ ZFC and a Boolean algebra (B, ≤) ∈ M.

There’s not much to say about Step 1. I suppose it’s worth noting that we’re assuming the existence of a countable transitive model of ZFC, meaning that the results of forcing are all conditioned not just on the consistency of ZFC, but on the existence of transitive models of ZFC, which is significantly stronger. (The existence of countable transitive models is no stronger than the existence of transitive models, by the downwards Lowenheim-Skolem theorem.)

Moving on!

Step 2
2. Define M^B := { f ∈ M | f: M^B ⇀ B }

M^B is sometimes called a Boolean-valued model of ZFC. The elements of M^B are called “B-names” or “B-valued sets” or just “B-sets” (as I’ll call them). These B-sets are partial functions in M that go from M^B to B. “Huh? Hold on,” I hear you saying, “Isn’t that a circular definition?”

It might look circular, but it’s not; it’s inductive. First of all, regardless of what M^B is, the empty set counts as a partial function from M^B to B: all its inputs are in M^B, all its outputs are in B, and no input gets sent to multiple outputs! ∅ is also in M, so ∅ is an element of M^B. But now that we know that ∅ is in M^B, we can also construct the function {(∅, b)} for each b in B. This satisfies the criterion of “being a partial function from M^B to B”, and it is in M because B is in M. So for any b in B, {(∅, b)} is in M^B. And now we can construct more partial functions with these new elements of M^B, like {(∅, b), ({(∅, b’)}, b’’)} and so on.

If you’re still not convinced that the definition of M^B is sensible, we can define it without any apparent circularity, making the induction explicit:

V₀ = ∅
V_α+1 = { f ∈ M | f is a partial function from V_α to B } for any ordinal α in M.
V_λ = ⋃{V_α | α ∈ λ} for any limit ordinal λ in M.
Finally, M^B = ⋃{V_α | α ∈ M}.

Okay, so now we know the definition of M^B. But what is it? How do we intuitively think about the elements of M^B? One way is to think of them as “fuzzy sets”, sets that contain each other to varying degrees. In this way of seeing things, the Boolean algebra B is our extension of the trivial Boolean algebra {True, False} to a more complicated Boolean algebra that allows intermediate truth values as well. So for instance, of the B-set {(∅, b)} we can say “it contains ∅ to degree b, and contains nothing else”. Of the B-set {(∅, b), ({(∅, b’)}, b’’)} we may say “it contains ∅ to degree b, and contains [the set that contains ∅ to degree b’] to degree b’’, and nothing else.” And so on.

Every Boolean algebra has a top element ⊤ corresponding to “definite truth” and a bottom element ⊥ corresponding to “definite falsity”, so our B-sets don’t have to be entirely fuzzy. For instance the B-set {(∅, ⊤)} definitely contains ∅ and definitely doesn’t contain anything else. It can be thought of as intuitively similar to the set {∅} in M. And the B-set {(∅, ⊥)} definitely doesn’t contain ∅, or anything else for that matter. This brings up an interesting question: how is the B-set {(∅, ⊥)} any different from the B-set ∅? They have the same intuitive properties: both contain nothing (or said differently, contain each thing to degree ⊥). Well that’s right! You can think of {(∅, ⊥)} and ∅ as two different names for the same idea. They are technically distinct as names, but in Steps 3 and 4 when we collapse M^B back down to an ordinary model of ZFC, the distinction between them will vanish. (The details of how exactly we assign B-values to sentences about M^B are interesting in their own right, and might deserve their own post.)

So intuitively what we’ve done is take our starting model M and produce a new structure that looks a lot like M in many ways except that it’s highly ambiguous: in addition to all the old sharp sets we have all these new fuzzy sets that are unsure about their members. In the next two steps we collapse all this ambiguity back down, erasing all of the fuzziness. It turns out that there are many ways to collapse the ambiguity (one for each ultrafilter in B in fact!), but we’ll want to choose our ultrafilter very carefully so that we (1) don’t end up back where we started, (2) end up with a model of ZFC, and (3) end up with an interesting model of ZFC.

Step 3
3. Choose an M-generic ultrafilter G ⊆ B.

There’s some terminology here that you might be unfamiliar with.

Firstly, what’s an ultrafilter? It’s a nonempty proper subset of B that’s closed upwards, closed under intersection, and contains one of b or ¬b for each b in B. I talked about ultrafilters here in the context of power sets, but it generalizes easily to Boolean algebras (and there are some pretty pictures to develop your intuitions a bit).

Okay, so that’s what ultrafilters are. What is it to be a generic ultrafilter? A generic ultrafilter is one that intersects every dense subset of B. What’s a dense subset of B? A dense subset of a Boolean algebra (B, ≤) is any subset of B\{⊥} that lower-bounds all of B\{⊥}. In other words, D is a dense subset of B if for every element of B\{⊥} there’s a lower element of D. Intuitively, D is like a foundation that the rest of B\{⊥} rests upon (where “upwards” corresponds to “greater than”). No matter where you are in B\{⊥}, you can follow the order downwards and eventually find yourself at an element of D. And a generic ultrafilter is an ultrafilter that shares something in common with each of these foundational subsets.Why have we excluded the bottom element of B, ⊥? If we hadn’t done so then the dense subsets would just be any and all sets containing ⊥! Then any generic ultrafilter would have to contain ⊥, but this is inconsistent with the definition of an ultrafilter!

Finally, what’s an M-generic ultrafilter? Recalling that a generic ultrafilter has to intersect every dense subset of B, an M-generic ultrafilter only needs to intersect every dense subset of B that’s in M.

I hear a possible objection! “But you already told us that B is in M! And since ZFC has the power set axiom, don’t we know that M must also contain all of B’s subsets? If so then M-genericity is no different from genericity!” The subtlety here is that the power set axiom guarantees us that M contains a set 𝒫(B) that contains all the subsets of B that are in M. Nowhere in ZFC is there a guarantee that every subset of B exists, and in fact such a guarantee is not possible in any first-order theory whatsoever! (Can you see how this follows from the downward Lowenheim-Skolem theorem?) The power set axiom doesn’t create any subsets of B, it merely collects together all the subsets of B that already exist in M.

Now, why do we care about G being M-generic? A few steps down the line we’ll see why M-genericity is such an important property, but in all honesty my intuition is murky here as to how to motivate it a priori. For now, take it on faith that M-genericity ends up being exactly the right property to require of G for our purposes. We’ll end up making extensive use of the fact that G intersects every dense subset of B in M.

Very keen readers might be wondering: how do we know that an M-generic ultrafilter even exists? This is where the countability of M saves us: since M is countable, it only contains countably many dense subsets of B: (D₀, D₁, D₂, …). We construct G as follows: first we choose any b₀ ∈ D₀. Then for any n ∈ ℕ, since D_n+1 is dense in B, we can find a d_n+1 ∈ D_n+1 such that d_n+1 ≤ d_n. Now we have an infinite descending chain of elements from dense subsets. Define G to be the upwards closure of {d_n | n ∈ ℕ}, i.e. G = {b ∈ B | b ≥ d_n for some n ∈ ℕ}. This is M-generic by construction, and it’s an ultrafilter: for any b ∈ B, either {b’ ∈ B | b’ ≤ b} or {b’ ∈ B | b’ ≤ ¬b} is dense, implying that G contains either b or ¬b. (Convince yourself that it’s also closed upwards and under ∧.)

Really important note: this proof of G’s existence relies on the countability of M. But from M’s perspective, it isn’t countable; i.e. M doesn’t believe that there’s a bijection that puts each set in correspondence with an element of omega. So M can’t prove the existence of G, and in fact G is a set that doesn’t exist within M at all.

Step 4
4. For every n in M^B, define val_G(n) := {val_G(m) | (m, b) ∈ n for some b in G}.

Ok, next we use G to define a “valuation function” val_G(•). This function takes the B-sets created in Step 2 and “collapses” them into ordinary sets. It’s defined inductively: the G-valuation of a B-set n is defined in terms of the G-valuations of the elements of n. (As a side note, the fact that this definition works relies on the transitivity of our starting model! Can you see why?)

Let’s work out some simple examples. Start with n = ∅. Then val_G(∅) = {val_G(m) | (m, b) ∈ ∅ for some b in G} = ∅, because there is no (m, b) in ∅. Thus G evaluates ∅ as ∅. So far so good! Now for a slightly trickier one: n = {(∅, b)}, where b is some element of B.

val_G( {(∅,b)} ) = {val_G(m) | (m, b) ∈ {(∅, b)} for some b in G} = {val_G(m) | m = ∅ & b ∈ G}

There are two cases: either b is in G or b is not in G. In the first case, {(∅, b)} is evaluated to be {∅}. In the second, {(∅,b)} is evaluated to be {}. What’s the intuition here? G is a subset of B, and we can think of it as a criterion for determining which Boolean values will be collapsed to True. And since G is an ultrafilter, every Boolean value will either be collapsed to True (if b ∈ G) or to False (if ¬b ∈ G). Recall that we thought of {(∅, b)} as a B-set that “contained ∅ to degree b”. Our G-evaluation of this set removed all fuzziness: if b was included in G then we evaluate {(∅, b)} as actually containing ∅. And if not, then we said that it didn’t. Either way, all fuzziness has been removed and we’ve ended up with an ordinary set!

The same happens for every B-set. When we feed it into the function val_G(•), we collapse all its fuzziness and end up with an ordinary set, whose members are determined by our choice of B.

Step 5
5. Define M[G] := {val_G(n) | n ∈ M^B}. M[G] is a countable transitive model of ZFC such that M ⊆ M[G] and G ∈ M[G].

Now we simply evaluate all the B-sets and collect them together, and give the result a name: M[G]. This part is easy: the hard work has already been done in defining M^B and val_G. What remains is to show that this new set is a countable transitive model of ZFC containing all of M as well as G. Let’s do this now.

First of all, why is M[G] countable? Well, M is countable and M^B is a subset of M, so M^B is countable. And val_G is a surjective function from M^B into M[G], so |M[G]| ≤ |M^B|.

Second, why is M[G] transitive? This follows immediately from its definition: the elements of M[G] are G-valuations, and elements of G-valuations are themselves G-valuations. So elements of elements of M[G] are themselves elements of M[G]. That’s transitivity!

Third, why is M[G] a model of ZFC? Er, we’ll come back to this one.

Fourth, why is M ⊆ M[G]? To show this, we’ll take any arbitrary set x in M and show that it exists in M[G]. To do this, we first define a canonical name for x: a B-set N_x that acts as the surrogate of x in M^B. Define N_x to be {(N_y, ⊤) | y ∈ x}. (Make sure that N_x is actually in M^B!) What’s the G-valuation of N_x? It’s just x! Here’s a proof by ∈-induction:

First, N_∅ = {(N_y,⊤) | y ∈ ∅} = ∅.
So val_G(N_∅) = val_G(∅) = {val_G(m) | (m, b) ∈ ∅ for some b in G} = ∅.

Now, assume that val_G(N_y) = y for every y ∈ x.
Then val_G(N_x) = {val_G(N_y) | (N_y, b) ∈ N_x for some b in G} = {y | (N_y, ⊤) ∈ N_x} = {y | y ∈ x} = x.

Thus by ∈-induction, for all x, val_G(N_x) = x.

Finally, why is G ∈ M[G]? We prove this by finding a B-set that G-valuates to G itself. Define A := {(N_b, b) | b ∈ B}. Then val_G(A) = { val_G(m) | (m, b) ∈ A for some b in G } = {val_G(N_b) | b ∈ G} = {b | b ∈ G} = G.

And there it is! We’re done!

Ha ha, tried to pull a fast one on you. We still have one crucial step remaining: proving that M[G] is actually a model of ZFC! As far as I know, there’s no concise way to do this… you have to actually go through each axiom of ZFC and verify that M[G] satisfies it. I’m not going to do that here, but I will provide proofs of four axioms to give you a flavor.

Infinity: M contains a set I satisfying the axiom of infinity and M ⊆ M[G]. So I ∈ M[G], and it still satisfies infinity (still contains ∅ and is closed under successor).

Pairing: Let x and y be any two sets in M[G]. Then for some n₁ and n₂ in M^B, x = val_G(n₁) and y = val_G(n₂). Now define n := {(n₁, ⊤), (n₂, ⊤)} ∈ M^B. Then val_G(n) = {x, y}.

Union: Let x be any element of M[G]. Since M[G] is transitive, every element y of an element of x is in M[G]. So for each such y there’s a B-set n_y such that val_G(n_y) = y. Define n := {(n_y,⊤) | y ∈ z for some z ∈ x}. Then val_G(n) = {val_G(n_y) | y ∈ z for some z ∈ x} = {y | y ∈ z for some z ∈ x} = ⋃x.

Comprehension: Let φ(y) be any first-order formula in the language of ZFC, and x ∈ M[G]. For every element y of x, let n_y be any B-set that G-valuates to y. Define n := {(n_y,⊤) | y ∈ x & φ(y)}. Then val_G(n) = {y ∈ x | φ(y)}.

And I leave you to fill in the rest of the axioms of ZFC for yourself. Much of it looks very similar to the proofs of pairing, union, and comprehension: make the natural choice for a B-set which G-valuates to the particular set whose existence you’re trying to prove. And THEN you’re done!

✯✯✯

Okay! If you’ve made it this far, take a breather and congratulate yourself. You now understand how to adjoin a new set to an existing model of ZFC so long as this new set is an M-generic ultrafilter in a Boolean algebra B ∈ M). And this process works equally well no matter what Boolean algebra you pick!

In fact, the choice of the Boolean algebra in step 1 is the key degree of freedom we have in this whole process. B isn’t the set that we end up adjoining to M, and in fact B is always chosen to be an element of our starting model. But the fact that G is always chosen to be an M-generic filter in B means that the structure of B greatly influences what properties G has.

You might notice that we also technically have another degree of freedom in this process: namely step 3 where I said “choose an M-generic filter G in B”. While this is technically true, in practice much of forcing is about finding really clever choices of B so that any M-generic filter in B has whatever interesting property we’re looking for.

There’s another subtlety about the choice of B, namely that in applications of forcing one generally starts with a partially ordered set (P, ≤) and then extends it to a Boolean algebra B. A really cool result with an even cooler topological proof is that this extension is always possible. For any choice of P one can construct a complete Boolean algebra B such that P is order-isomorphic to a dense subset of B. (In a sentence, we turn P into a topological space whose open sets are the downwards-closed subsets and then let B be the set of regular open sets ordered by inclusion. More on this below.)

What’s great about this is that since P is dense in B, all dense subsets of P are also dense subsets of B. This means that we can analyze properties of our M-generic ultrafilter G in B solely by looking at the dense subsets of P! This means that for many purposes you can simply ignore the Boolean algebra B and let it do its work in the background, while focusing on the partially ordered set (P, ≤) you picked out. The advantage of this should be obvious: there are many more partially ordered sets than there are Boolean algebras, so we have much more freedom to creatively choose P.

Let’s summarize everything by going over the outline we started with and filling in a few details:

Forcing in slightly more than 5 lines

Fix a countable transitive model M ⊨ ZFC and a partial order (P, ≤) ∈ M.
Extend P to a Boolean algebra (B, ≤) ∈ M.
1. Define T := {A ⊆ P | A is closed downwards}.
2. Define B := {S ∈ T | Int(Cl(S)) = S}.
3. (B, ⊆) is a Boolean algebra: ¬A = Int(A^c), A∧B = A⋂B, A∨B = Int(Cl(A∪B)).
4. P is order-isomorphic to a dense subset of B
  1. For p ∈ P, define S(p) := {q ∈ P | q ≤ p} ∈ B.
  2. S is an order-isomorphism from P to B and {S(p) | p ∈ P} is dense in B.
Define M^B := { f ∈ M | f: M^B ⇀ B }.
Choose an M-generic ultrafilter G ⊆ B.
1. Since M is countable, we can enumerate the dense subsets of B in M: (D₀, D₁, D₂, …)
2. Choose any b₀ ∈ D₀.
3. For any n ∈ ℕ, since D_n+1 is dense in B, we can find a d_n+1 ∈ D_n+1 s.t. d_n+1 ≤ d_n.
4. G := { b ∈ B | b ≥ d_n for some n ∈ ℕ } is an M-generic ultrafilter in B.
For every n in M^B, define val_G(n) := { val_G(m) | (m, b) ∈ n for some b in G }.
Define M[G] := { val_G(n) | n ∈ M^B }.
1. M[G] is a countable transitive model of ZFC.
2. M ⊆ M[G]
  1. For any x ∈ M, define N_x := {(N_y,⊤) | y ∈ x} ∈ M^B.
  2. Then val_G(N_x) = x, so x ∈ M[G].
3. G ∈ M[G]
  1. Define A := {(N_b, b) | b ∈ B} ∈ M^B.
  2. val_G(A) = G, so G ∈ M[G].

We are now ready to apply forcing to prove the independence of CH from ZFC! In Part 2 you will learn exactly what choice of P makes the cardinality of the continuum ≥ ℵ₂ (thus making CH false) and what choice of P makes the cardinality of the continuum exactly ℵ₁ (thus making CH true). In fact, I’ll do you one better: in Part 2 you’ll learn what to make P in order to make the cardinality of the continuum larger than ℵ₆₉, ℵ₄₂₀, or virtually any other cardinality you like!

End of Part 1. Part 2 here.

The Ultra Series: Guide

August 22, 2021April 16, 2026 ~ ~ 3 Comments

Assumed background knowledge: basic set theory lingo (∅, singleton, subset, power set, cardinality), what is first order logic (structures, universes, and interpretations), what are ℕ and ℝ, what’s the difference between countable and uncountable infinities, and what “continuum many” means.

1 Introduction
Here I give a high-level description of what an ultraproduct is, and provide a few examples. Skippable if you want to jump straight to the math!

2 Hypernaturals Simplified
Here you get a first glimpse of the hypernaturals. It’s a fuzzy glimpse from afar, and our first attempt to define them is overly simplified and imperfect. Nonetheless, we get some good intuitions for how hypernatural numbers are structured, before eventually confronting the problem at the core of the definition.

3 Hypernaturals in all their glory
We draw some pretty pictures and introduce the concept of an ultrafilter. The concept is put to work immediately, allowing us to give a full definition of the hypernaturals with no simplifications. The issues with the previous definition have now been patched, and the hypernaturals are a well-defined structure ripe to be explored.

4 Ultraproducts and Łoś’s theorem
We describe how to pronounce “Łoś”, define what an ultraproduct is, and see how the hypernaturals are actually just the ultraproduct of the naturals. And then we prove Łoś’s theorem!

5 Infinitely Large Primes
With the newfound power of Łoś’s theorem at our hands, we return to the realm of the hypernaturals and start exploring its structure. We describe some infinitely large prime numbers, and prove that there are infinitely many of them. We find more strange infinitely large hypernatural numbers in our exploration: numbers that can be divided by 2 ad infinitum, numbers that are divisible by every finite number, and more. We learn that there’s a subset of the hypernaturals that is arranged just like the positive rational numbers, but that the hypernaturals are not dense.

6 Ultraproducts and Compactness
We zoom out from the hypernaturals, and show that ultraproducts can be used to give the prettiest proof of the compactness theorem for first order logic. We prove it first for countable theories, and then for all theories. We then get a little wild and discuss some meta-logical results involving ultraproducts, definability, and compactness.

7 All About Countable Saturation
We now describe the most powerful property of ultraproducts: countable saturation. And then we prove it! With our new tool, we dive back into the hypernaturals to learn more about their structure. We show that for any countable set of hypernaturals, there’s a hypernaturals that’s divisible by them all, and see that this entails the existence of uncountably many hypernatural primes. We prove that the hypernaturals have uncountable cofinality and coinitiality. And from this we see that no two hypernaturals are countably infinitely far apart; all distances are finite or uncountable! We wrap up with a quick proof that ultraproducts are always either finite or uncountable, and a mind-blowing result that relates ultraproducts to the continuum hypothesis.

7.5 Shorter Proof of Countable Saturation
I give a significantly shorter and conceptually simpler proof of countable saturation than the previous post. Then I wax philosophical for a few minutes about constructivism in the context of ultraproduct-related proofs.

Ultraproducts and Łoś’s Theorem (Ultra Series 4)

July 14, 2021April 16, 2026 ~ ~ 4 Comments

Previous: Hypernaturals in all their glory

First things first, you’re probably asking yourself… how is Łoś pronounced?? I’m not the most knowledgeable when it comes to Polish pronunciation, but from what I’ve seen the Ł is like a “w”, the o is like the vowel in “thought”, and the ś is like “sh”. So it’s something like “wash”. (I think.)

Ok, on to the math! This is probably going to be the hardest post in this series, so I encourage you to read through it slowly and not give up if you start getting lost. Try to work out some examples for yourself; that’s often the best way to get a grasp on an abstract concept. (In general, my biggest tip for somebody starting to dive into serious mathematical content is to not read it like fiction! In fiction, short sentences can safely be read quickly. But math is a language in which complex ideas can be expressed very compactly. So fight your natural urge to rush through short technical sentences, and don’t feel bad about taking your time in parsing them!)

What the heck is an ultraproduct

Last post we defined an ultrafilter. Now let’s define an ultraproduct.

It turns out we’ve already seen an example of an ultraproduct: the hypernaturals. The hyperreals (denoted *ℝ) are another famous example of an ultraproduct, constructed from ℝ in exactly the same way as we obtained the hypernaturals *ℕ from ℕ. For any structure M whatsoever, there exists a “hyperstructure” *M obtained from M via an ultraproduct. So what exactly is an ultraproduct?

We’ll build up to it in a series of six steps.

(1) Choose an index set I.
(2) Choose a first-order language L and a family of L-structures indexed by I (M_i)_i∈I.
(3) Define sequences of elements of these structures.
(4) Define an equivalence relation between these sequences.
(5) Construct the set of equivalence classes under this equivalence relation.
(6) Define the interpretations of the symbols of L in this set.

Let’s get started!

(1) First we select an index set I. This will be the set of indices we use when constructing our sequences of elements. If we want our sequences to be countably infinite, then we choose I = ℕ.

(2) Now, fix some first-order language L = <constants, relations, functions>. Consider any family of L-structures, indexed by I: (M_i)_i∈I. For instance, if I = ℕ and L is the language of group theory <{e}, ∅, {⋅}>, then our family of L-structures might be (ℤ₁, ℤ₂, ℤ₃, ℤ₄, …), where ℤ_k is the integers-mod-k (i.e. the cyclic group of size k).

(3) Now we consider sequences of the elements of these structures. For instance, hyperreals are built from sequences of real numbers that look like (a₀, a₁, a₂, a₃, …). The set of indices used here is {0, 1, 2, 3, …}, i.e. ℕ.

If our index set isn’t countable, then it’s not possible anymore to visualize sequences like this. For instance, if I = ℝ, then our sequence will have an element for every real number. A more general way to formulate sequences is as maps from the index set I to the elements of the component models. The hyperreal sequence (a₀, a₁, a₂, a₃, …) can be thought of as a map a: ℕ → ℝ, where a(n) = a_n for each n ∈ ℕ.

In general, a sequence will be defined as follows:

f: I → U(M_i)_i∈I such that f(i) ∈ M_i for each i ∈ I.

In the ℤ_k example, a sequence might look like (0, 1, 2, 3, 4, …). Note that 0 ∈ ℤ₁, 1 ∈ ℤ₂, 2 ∈ ℤ₃, and so on. On the other hand (2, 1, 2, 3, 4, …) would not be a valid sequence, because there’s no 2 in ℤ₁.

Now, consider the set of all sequences: { f: I → U(M_i)_i∈I | f(i) ∈ M_i for each i ∈ I }. This set is called the direct product of (M_i)_i∈I and is denoted Π(M_i)_i∈I.

(4) We want to construct an equivalence relation on this set. We do so by first defining a free ultrafilter U on I. From the previous post, we know that every infinite set has a free ultrafilter on it, so as long as our index set is infinite, then we’re good to go.

With U in hand, we define the equivalence relation on Π(M_i)_i∈I:

Let f and g be sequences (f,g: I → U(M_i)_i∈I).
Then f ~ g if and only if { i ∈ I | f(i) = g(i) } ∈ U

You might be getting major deja-vu from the last post. Two sequences are said to be equivalent if the set of places-of-agreement is a member of the chosen ultrafilter. Since every free ultrafilter contains all cofinite sets, any two sequences that agree in all but finitely many places will be equivalent.

(5) Now all the pieces are in place. The ultraproduct of (M_i)_i∈I with respect to U is the set of equivalence classes of Π(M_i)_i∈I with respect to ~. This is typically written Π(M_i)_i∈I/U. For a sequence a: I → U(M_i)_i∈I we’ll denote the equivalence class it belongs to as [a].

Now, it’s not necessary for our indexed sequence (M_i)_i∈I of L-structures to all be distinct. In fact, all the models can be the same, in which case we have (M_i)_i∈I = (M)_i∈I, and the ultraproduct Π(M)_i∈I/U is called an ultrapower. The ultrapower of M with index set I and ultrafilter U can be written compactly as M^I/U.

Some examples: The hypernaturals are the ultrapower ℕ^ℕ/U = Π(ℕ)_i∈ℕ/U where U is any free ultrafilter over ℕ. Similarly, the hyperreals are ℝ^ℕ/U = Π(ℝ)_i∈ℕ/U. The hyperintegers are ℤ^ℕ/U. And so on.

(6) So far we’ve just defined the ultraproduct as a set (the set of equivalence classes of I-indexed sequences of elements from the models (M_i)_i∈I). But we want the ultraproduct to have all the same structure as the models that we used as input. In other words, the ultraproduct of a bunch of L-structures will itself be an L-structure. To make this happen, we need to specify how the relation symbols and function symbols of L work in the ultraproduct model.

Here’s how it works. If f is a unary function symbol in the language L, then we define f on Π(M_i)_i∈I/U by applying the function elementwise to the sequences. So:

f: Π(M_i)_i∈I/U → Π(M_i)_i∈I/U is defined as f([a])(i) = f(a(i)) for every i ∈ I.

What if f is a binary function symbol? Then:

f: (Π(M_i)_i∈I/U)² → Π(M_i)_i∈I/U is defined as f([a], [b])(i) = f(a(i), b(i)) for every i ∈ I.

This generalizes in the obvious way to trinary function symbols, quaternary function symbols, and so on.

What about relations? Suppose R is a unary relation symbol in the language L. We need to define R on the ultraproduct Π(M_i)_i∈I/U, and we do it as follows:

Π(M_i)_i∈I/U ⊨ R([a]) if and only if { i ∈ I | M_i ⊨ R(a(i)) } ∈ U.

In other words, Π(M_i)_i∈I/U affirms R([a]) if and only if the set of indices i such that M_i affirms R(a(i)) is in the ultrafilter. For example, if R holds for cofinitely many members of the sequence a, then R holds of [a].

If R is binary, we define it as follows:

Π(M_i)_i∈I/U ⊨ R([a], [b]) if and only if { i ∈ I | M_i ⊨ R(a(i), b(i)) } ∈ U.

And again, this generalizes in the obvious way.

This fully defines the ultraproduct Π(M_i)_i∈I/U as an L-structure! (If you’re thinking ‘what about constant symbols?’, remember that constants are just 0-ary functions)

Say that again, slower

That was really abstract, so let’s go through it again with the (hopefully now-familiar) example of the hypernaturals.

We start by defining an index set I. We choose I = ℕ.

Now define the language we’ll use. This will be the standard language of Peano arithmetic: one constant symbol (0), one relation symbol (<), and three function symbols (S, +, ×).

The family of structures in this language that we’ll consider (M_i)_i∈I will just be a single structure repeated: for each i, M_i will be the L-structure ℕ (the natural numbers with 0, <, +, and × defined on it). So our family of structures is just (ℕ)_i∈ℕ = (ℕ, ℕ, ℕ, ℕ, …).

Our sequences are functions from I to U(M_i)_i∈I such that for each i∈I, f(i) ∈ M_i. For the hypernaturals, I and U(ℕ)_i∈ℕ are both just ℕ, so our sequences are functions from ℕ to ℕ. We can represent the function f: ℕ → ℕ in the familiar way: (f(0), f(1), f(2), f(3), …).

The set of all sequences is the set of all functions from ℕ to ℕ. This is the direct product Π(ℕ)_i∈ℕ = ℕ^ℕ.

Now, we take any ultrafilter on I = ℕ. Call it U. We use U to define the equivalence relation on the direct product ℕ^ℕ:

(a(0), a(1), a(2), …) ~ (b(0), b(1), b(2), …) if and only if { i ∈ ℕ | a(i) = b(i) } ∈ U

And taking equivalence classes of this relation, we’ve recovered our original definition of the hypernatural numbers! ℕ^ℕ/U = *ℕ. Now we finish up by defining all functions and relations on *ℕ.

Functions are defined pointwise:

0 = [0, 0, 0, 0, …]
S[a(0), a(1), a(2), …] = [Sa(0), Sa(1), Sa(2), …]
[a(0), a(1), a(2), …] + [b(0), b(1), b(2), …] = [a(0) + b(0), a(1) + b(1), a(2) + b(2), …]
[a(0), a(1), a(2), …] ⋅ [b(0), b(1), b(2), …] = [a(0) ⋅ b(0), a(1) ⋅ b(1), a(2) ⋅ b(2), …]

We just have one relation symbol <, and relations are defined according to the ultrafilter:

[a(0), a(1), a(2), …] < [b(0), b(1), b(2), …] iff { i ∈ ℕ | ℕ ⊨ (a(i) < b(i)) } ∈ U

And we’re done!

Łoś’s theorem

Now we’re ready to prove Łoś’s theorem in its full generality. First, let’s state the result:

Fix any index set I, any language L, and any family of L-structures (M_i)_i∈I. Choose a free ultrafilter U on I and construct the ultraproduct structure Π(M_i)_i∈I/U. Łoś’s theorem says:

For every L-sentence φ, Π(M_i)_i∈I/U ⊨ φ if and only if { i ∈ I | M_i ⊨ φ } ∈ U.

A special case of this is where our ultraproduct is an ultrapower of M, in which case it reduces to:

For every L-sentence φ, M^I/U ⊨ φ if and only if M ⊨ φ

In other words, any ultrapower of M is elementary equivalent to M!

The proof is by induction on the set of all L-formulas.

Base case: φ is atomic

Atomic sentences are either of the form R(t₁, …, t_n) or (t₁ = t₂) for an n-ary relation symbol R and terms t₁, … t_n.

Suppose φ is R(t₁, …, t_n). This case is easy: it was literally the way we defined the interpretation of relation symbols in the ultraproduct model that Π(M_i)_i∈I/U ⊨ R(t₁, …, t_n) if and only if {i ∈ I | M_i ⊨ R(t₁, …, t_n)} ∈ U.

Suppose φ is (t₁ = t₂). t₁ and t₂ are terms, so the ultraproduct model (Π(M_i)_i∈I/U) interprets them as I-sequences, i.e. functions from I to U(M_i)_i∈I such that t₁(i) and t₂(i) are both in M_i. We’ll write the denotations of t₁ and t₂ as [t₁] and [t₂]. Now, [t₁] = [t₂] iff { i ∈ I | t₁ = t₂ } ∈ U iff { i ∈ I | M_i ⊨ (t₁ = t₂) } ∈ U, which is what we want.

Inductive step: φ is ¬ψ, ψ∧θ, or ∃x ψ

Assume that Los’s theorem holds for ψ and θ. Now we must show that it holds for φ

Suppose φ is ¬ψ.

Then Π(M_i)_i∈I/U ⊨ φ
iff Π(M_i)_i∈I/U ⊭ ψ
iff { i ∈ I | M_i ⊨ ψ } ∉ U (by the inductive hypothesis)
iff { i ∈ I | M_i ⊨ ψ }^c ∈ U (by the ultra property of U)
iff { i ∈ I | M_i ⊭ ψ } ∈ U
iff { i ∈ I | M_i ⊨ ¬ψ } ∈ U
iff { i ∈ I | M_i ⊨ φ } ∈ U

Suppose φ is ψ∧θ.

Then Π(M_i)_i∈I/U ⊨ φ
iff Π(M_i)_i∈I/U ⊨ ψ and Π(M_i)_i∈I/U ⊨ θ
iff { i ∈ I | M_i ⊨ ψ } ∈ U and { i ∈ I | M_i ⊨ θ } ∈ U (by the inductive hypothesis)
iff { i ∈ I | M_i ⊨ ψ } ⋂ { i ∈ I | M_i ⊨ θ } ∈ U (by closure-under-⋂ of U)
iff { i ∈ I | M_i ⊨ ψ and M_i ⊨ θ } ∈ U
iff { i ∈ I | M_i ⊨ ψ∧θ } ∈ U
iff { i ∈ I | M_i ⊨ φ } ∈ U

Suppose φ is ∃x ψ.

Then Π(M_i)_i∈I/U ⊨ φ
iff Π(M_i)_i∈I/U ⊨ ∃x ψ
iff Π(M_i)_i∈I/U ⊨ ψ(a) for some [a] ∈ Π(M_i)_i∈I/U
iff { i ∈ I | M_i ⊨ ψ(a(i)) } ∈ U (by the inductive hypothesis)
iff { i ∈ I | M_i ⊨ ∃x ψ(x) } ∈ U

And that completes the proof! We don’t need to consider ∨, →, ↔, or ∀, because these can all be defined in terms of ¬, ∧, and ∃.

Now we know that the first-order properties of ultraproducts are tied closely to those of their component structures. The ultraproduct of any collection of two-element structures is itself a two-element structure. Same with the ultraproduct of any collection of structures, cofinitely many of which are two-element structures!

The ultraproduct of any collection of PA models is itself a PA model. The ultraproduct of any collection of groups is itself a group. But the ultraproduct of all finite groups need not itself be finite, because “I am finite” isn’t first-order expressible.

And in particular, an ultrapower of a structure M perfectly mimics ALL of the first-order properties of M!

Łoś’s theorem is an incredibly powerful tool we can wield to illuminate the strange structure of the hypernatural numbers. We’re now positioned to discover nonstandard prime numbers, infinitely even numbers, numbers that are divisible by every standard natural number, and infinitely large prime gaps. All of this (and more) in the next post!

Next: Weird nonstandard numbers

Hypernaturals in all their glory (Ultra Series 3)

July 10, 2021April 16, 2026 ~ ~ 5 Comments

Previous: Hypernaturals simplified

What is an ultrafilter? (with pretty pictures)

To define an ultrafilter we need to first define a filter. Here’s a pretty good initial intuition for what a filter is: a filter on a set X is a criterion for deciding which subsets of X are “large”. In other words, a filter provides us one way of conceptualizing the idea of large and small subsets, and it allows us to do so in a way that gives us more resolution than the cardinality approach (namely, assess size of sets just in terms of their cardinality). For example, in a countably infinite set X, the cofinite subsets of X (those that contain all but finitely many elements of X) have the same cardinality as the subsets of X that are infinite but not cofinite. But there’s some intuitive sense in which a set that contains all but finitely many things is larger than a set that leaves out infinitely many things. Filters allow us to capture this distinction.

Alright, so given a set X, a filter F on X is a collection of subsets of X (i.e. it’s a subset of 𝒫(X)) that satisfies the following four conditions:

(i) X ∈ F … “X is large”
(ii) ∅ ∉ F … “the empty set is not large”
(iii) If A ⊆ B and A ∈ F, then B ∈ F … “supersets of large sets are large”
(iv) If A ∈ F and B ∈ F, then A ⋂ B ∈ F … “intersections of large sets are large”

In other words, a filter on X is a set of subsets of X that contains X, doesn’t contain the empty set, and is closed under supersets and intersection. Note that a filter is also closed under union, because of (iii) (the union of A and B is a superset of A).

An ultrafilter is a filter with one more constraint, namely that for any subset of X, either that subset or its complement is in the filter.

(v) For any A ⊆ X, either A ∈ F or (X\A) ∈ F … “a set is either large, or if not, then its complement is large”

There’s a nice way to visualize filters and ultrafilters that uses the Hasse diagram of the power set of X. For a concrete example, let X = {a, b}. We can draw the power-set of X as follows:

We draw an arrow from A to B when A is a subset of B. Now, what are the possible filters on X? There are three, see if you can find them all before reading on.

Only two of these are ultrafilters. Which two?

Remember that for an ultrafilter U, every subset or its complement is in U. So an ultrafilter always contains half of all subsets. This gives an easy way to rule out the first one.

Another example: let X = {a, b, c}. Then the power-set of X looks like:

Note that we’ve left out some arrows, like the arrow from {a} to {a,b,c}. This is okay, because transitivity of the subset relation makes this arrow redundant. Anyway, what are some filters on X? Here are three of them:

Only one of these is an ultrafilter! You should be able to identify it pretty easily. See if you can pick out the other four filters, and identify which of them are ultrafilters (there should be two). And another exercise: why is the following not a filter?

Does it have any extension that’s an ultrafilter?

One thing to notice is that in all of these examples, when something is in the filter then everything it points to is also in the filter. This corresponds to ultrafilters being closed under supersets. Also, for any two things in the filter, their meet (their greatest lower bound; the highest set on the diagram that points to both of them) is also is the filter. This corresponds to closure under intersections.

Imagine that there is a stream flowing up the Hasse diagram through all the various paths represented by arrows. Choose any point on the diagram and imagine dripping green dye into the water at that point. The green color filters up through the diagram until it reaches the top. And everything that’s colored green is in the filter! This captures the idea that filters are closed under superset, but what about intersection? If X is finite, this corresponds to the dye all coming from a single source, rather than it being dripped in at multiple distinct points. The infinite case is a little trickier, as we’ll see shortly.

One other important thing to notice is that whenever we had an ultrafilter, it always contained a singleton. An ultrafilter that contains a singleton is called a principal ultrafilter, and an ultrafilter that doesn’t contain any singletons is called a free ultrafilter. So far we haven’t seen any free ultrafilters, and in fact as long as X is finite, any ultrafilter on X will be principal. (Prove this!) But the situation changes when X is an infinite set.

The Hasse diagram for an infinite set is a bit harder to visualize, since now we have uncountably many subsets. But let’s try anyway! What does the Hasse diagram of ℕ look like? Well, we know that ∅ is at the bottom and ℕ is at the top, so let’s start there.

Next we can draw all the singleton sets. ∅ points at all of these, so we’re not going to bother drawing each individual arrow.

Next we have all the pair sets, and then the triples. Each singleton points at infinitely many pairs, and each pair points at infinitely many triples.

And so on through all finite cardinalities.

Now what? We’ve only exhausted all the finite sets. We can now start from the top with the cofinite sets, those that are missing only finitely many things. First we have the sets that contain all but a single natural number:

Then the sets containing all but a pair of naturals, and so on through all the cofinite sets.

But we’re not done yet. We haven’t exhausted all of the subsets of ℕ; for instance the set of even numbers is neither finite nor cofinite. In fact, there are only countably many finite and cofinite sets, but there are uncountably many subsets of ℕ, so there must be a thick intermediate section of infinite sets that are not cofinite (i.e. infinite sets with infinite complements).

A sanity check that this diagram makes sense: start with a finite set and then add elements until you have a cofinite set. Between the finite set and the cofinite set there’s always an intermediate set that’s infinite but not cofinite. This matches with our image: any path from the finite to the cofinite passes through the middle section.

Now, what would a filter on the naturals look like on this diagram? If our filter is principal, then we can still roughly sketch it the same way as before:

How about an ultrafilter? Depends on whether it’s principal or free. Any principal ultrafilter must look like the third image above; it must start at the “finite” section and filter upwards (remember that principal means that it contains a singleton).

Any principal ultrafilter on ℕ can be written as { A ⊆ ℕ | n ∈ A } for some n ∈ ℕ.

What about free ultrafilters? A free ultrafilter contains no singletons. This implies that it contains no finite set. See if you can come up with a proof, and only then read on to see mine.

Suppose that U is a free ultrafilter on X and contains some finite set F. U is free, so it contains no singletons. So for every a ∈ F, the singleton {a} ∉ U. By ultra, X\{a} ∈ U. By closure-under-finite-intersection, the intersection of {X\{a} | a ∈ F} is in U. So X\F ∈ U. But now we have F ∈ U and X\F ∈ U, and their intersection is ∅. So ∅ ∈ U, contradicting filter.

So a free ultrafilter must contain no finite sets, meaning that it contains all the cofinite sets. Since it’s ultra, it also contains “half” of all the intermediate sets. So visually it’ll look something like:

That’s what a free ultrafilter on the naturals would look like if such a thing existed. But how do we know that any such object actually does exist? This is not so trivial, and in fact the proof of existence uses the axiom of choice. Here’s a short proof using Zorn’s Lemma (which is equivalent to choice in ZF).

Let F be any filter on X. Consider the set Ω of all filters on X that extend F. (Ω, ⊆) is a partially ordered set, and for any nonempty chain of filters C ⊆ Ω, the union of C is itself a filter on X. (Prove this!) The union of C is also an upper bound on C, meaning that every nonempty chain of filters has an upper bound. Now we apply Zorn’s Lemma to conclude that there’s a maximal filter U in Ω. Maximality of U means that U is not a subset of V for any V ∈ Ω.

Almost done! U is maximal, but is it an ultrafilter? Suppose not. Then there’s some A in X such that A ∉ U and (X\A) ∉ U. Simply extend U by adding in A and all supersets and intersections. This is a filter that extends F and contains U, contradicting maximality. So U is an ultrafilter on X!

Now, F was a totally arbitrary filter. So we’ve shown that every filter on X has an ultrafilter extension. Now let X be infinite and take the filter on X consisting of all cofinite subsets of X (this is called the Fréchet filter). Any ultrafilter extension of the Fréchet filter also contains all cofinite subsets of X, and thus contains no singletons. So it’s free! Thus any infinite set has a free ultrafilter.

Hypernatural numbers

Still with me? Good! Then you’re ready for the full definition of the hypernatural numbers, using ultrafilters. Take any free ultrafilter U on ℕ. U contains all cofinite sets and no finite sets, and is also decisive on all the intermediate sets. If you remember from the last post, this makes U a perfect fit for our desired “decisiveness criterion”.

Now consider the set of all countable sequences of natural numbers. Define the equivalence relation ~ on this set as follows:

(a₁, a₂, a₃, …) ~ (b₁, b₂, b₃, …) iff { k ∈ ℕ | a_k = b_k } ∈ U

Note the resemblance to our definition last post:

(a₁, a₂, a₃, …) ~ (b₁, b₂, b₃, …) iff { k ∈ ℕ | a_k = b_k } is cofinite

This previous definition corresponded to using the Fréchet filter for our criterion. But since it was not an ultrafilter, it didn’t suffice. Now, with an ultrafilter in hand, we get decisiveness!

Addition and multiplication on the hypernaturals is defined very easily:

[a₁, a₂, a₃, …] + [b₁, b₂, b₃, …] = [a₁+b₁, a₂+b₂, a₃+b₃, …]
[a₁, a₂, a₃, …] ⋅ [b₁, b₂, b₃, …] = [a₁⋅b₁, a₂⋅b₂, a₃⋅b₃, …]

Let’s now define < on the hypernaturals.

(a₁, a₂, a₃, …) < (b₁, b₂, b₃, …) if { k ∈ ℕ | a_k < b_k } ∈ U

The proof of transitivity in the previous post still works here. Now let’s prove that < is a total order.

Consider the following three sets:

X = { k ∈ ℕ | a_k < b_k }
Y = { k ∈ ℕ | a_k > b_k }
Z = { k ∈ ℕ | a_k = b_k }

The intersection of any pair of these sets is empty, meaning that at most one of them is in U. Could none of them be in U? Suppose X, Y, and Z are not in U. Then ℕ\X and ℕ\Y are in U. So (ℕ\X) ⋃ (ℕ\Y) is in U as well. But (ℕ\X) ⋃ (ℕ\Y) = Z! So Z is in U, contradicting our assumption.

So exactly one of these three sets is in U, meaning that a < b or b < a or a = b. This proves that using an ultrafilter really has fixed the problem we ran into previously. This problem was that the hypernaturals were quite different from the naturals in undesirable ways (like < not being a total order). The natural question to ask now is “Just how similar are the hypernaturals to the naturals?”

The answer is remarkable. It turns out that there are no first-order expressible differences between the naturals and the hypernaturals! Any first-order sentence that holds true of the natural numbers also holds true of the hypernatural numbers! This result is actually just one special case of an incredibly general result called Łoś’s theorem. And in the next post we are going to prove it!

Next up: Łoś’s theorem and ultraproducts!