The Ultra Series: Guide

Assumed background knowledge: basic set theory lingo (∅, singleton, subset, power set, cardinality), what is first order logic (structures, universes, and interpretations), what are ℕ and ℝ, what’s the difference between countable and uncountable infinities, and what “continuum many” means.

1 Introduction
Here I give a high-level description of what an ultraproduct is, and provide a few examples. Skippable if you want to jump straight to the math!

2 Hypernaturals Simplified
Here you get a first glimpse of the hypernaturals. It’s a fuzzy glimpse from afar, and our first attempt to define them is overly simplified and imperfect. Nonetheless, we get some good intuitions for how hypernatural numbers are structured, before eventually confronting the problem at the core of the definition.

3 Hypernaturals in all their glory
We draw some pretty pictures and introduce the concept of an ultrafilter. The concept is put to work immediately, allowing us to give a full definition of the hypernaturals with no simplifications. The issues with the previous definition have now been patched, and the hypernaturals are a well-defined structure ripe to be explored.

4 Ultraproducts and Łoś’s theorem
We describe how to pronounce “Łoś”, define what an ultraproduct is, and see how the hypernaturals are actually just the ultraproduct of the naturals. And then we prove Łoś’s theorem!

5 Infinitely Large Primes
With the newfound power of Łoś’s theorem at our hands, we return to the realm of the hypernaturals and start exploring its structure. We describe some infinitely large prime numbers, and prove that there are infinitely many of them. We find more strange infinitely large hypernatural numbers in our exploration: numbers that can be divided by 2 ad infinitum, numbers that are divisible by every finite number, and more. We learn that there’s a subset of the hypernaturals that is arranged just like the positive rational numbers, but that the hypernaturals are not dense.

6 Ultraproducts and Compactness
We zoom out from the hypernaturals, and show that ultraproducts can be used to give the prettiest proof of the compactness theorem for first order logic. We prove it first for countable theories, and then for all theories. We then get a little wild and discuss some meta-logical results involving ultraproducts, definability, and compactness.

7 All About Countable Saturation
We now describe the most powerful property of ultraproducts: countable saturation. And then we prove it! With our new tool, we dive back into the hypernaturals to learn more about their structure. We show that for any countable set of hypernaturals, there’s a hypernaturals that’s divisible by them all, and see that this entails the existence of uncountably many hypernatural primes. We prove that the hypernaturals have uncountable cofinality and coinitiality. And from this we see that no two hypernaturals are countably infinitely far apart; all distances are finite or uncountable! We wrap up with a quick proof that ultraproducts are always either finite or uncountable, and a mind-blowing result that relates ultraproducts to the continuum hypothesis.

7.5 Shorter Proof of Countable Saturation
I give a significantly shorter and conceptually simpler proof of countable saturation than the previous post. Then I wax philosophical for a few minutes about constructivism in the context of ultraproduct-related proofs.

Shorter Proof of Countable Saturation (Ultra 7.5)

Previous: All About Countable Saturation

After I had finished writing up and illustrating the proof of countable saturation in the last post, I came up with a significantly simpler proof. Darn! I don’t want to erase all those pretty pictures I already made, so I’ll just include this proof as a separate little bonus.

I’m also going to be a little more careful with notation that I have been thus far, by distinguishing structures from their universes. The fancy 𝔐 will refer to a structure, which has both a universe as well as an interpretation of all the constant, function, and relation symbols. The universe of 𝔐 will be written with a non-fancy M.

The Proof

Let 𝔐 be the ultraproduct ∏(𝔐k)k∈ℕ/U.
Let S = {φ1(x), φ2(x), φ3(x), …} be any countable set of formulas with countably many parameters.

Suppose S is finitely realizable in 𝔐.
Then for each n, 𝔐 ⊨ ∃x (φ1(x)∧…∧φn(x)).
So by Łoś’s theorem, for each n, the set of models that realize φ1(x)∧…∧φn(x) is U-large.
So for each n, { k | 𝔐k ⊨ ∃x (φ1(x)∧…∧φn(x)) } ∈ U.

Here’s a visualization:

In this image, 𝔐1 realizes φ1(x)∧…∧φ5(x), 𝔐3 realizes all of S, and 𝔐5 doesn’t realize φ1(x).
The finite realizability of S tells us that {1, 2, 3, 4, 6, 7, …} (the indices of models realizing φ1) is U-large. So are {1, 2, 3, 4, 7, …}, {1, 3, 4, 7, …}, {3, 4, 7, …}, and so on.

Now we add a new constant c to L.
We want to define c on each 𝔐k in such a way that in 𝔐, c refers to an element of M that realizes all of S.
We do this by saying that in 𝔐k, c refers to an element of Mk that realizes as large an initial segment of S as possible. More precisely:

For each finite k, consider the largest n for which 𝔐k ⊨ ∃x (φ1(x)∧…∧φn(x)). (In the above example, for k = 1 we’d have n = 5. For k = 2 we’d have n = 2. And so on.)
In 𝔐k, c will refer to any object in Mk that realizes φ1 through φn. Let’s call this object ak.
In other words, ak is an element of Mk that realizes as large of an initial segment of S as possible, and c refers to this element.

Two special cases:
What if 𝔐k realizes all of S? Then in 𝔐k, c will refer to any akMk that realizes all of S.
And if 𝔐k doesn’t realize φ1? Then in 𝔐k, c will refer to any arbitrary element of Mk.

One important consequence of this way of defining the “ak“s is that for any k and n, ak realizes φ1(x)∧…∧φn(x) in 𝔐k if and only if 𝔐k ⊨ ∃x (φ1(x)∧…∧φn(x)). This’ll be used in a few lines, marked by a blue =.

The ultraproduct construction now tells us exactly what c refers to in 𝔐: the equivalence class [a1, a2, a3, …].
All that’s left to do is to prove that this element of 𝔐 realizes all of S. Here’s the argument:

Let φn(x) be any arbitrary formula in S.
{ k | ak realizes φn in 𝔐k } ⊇ { k | ak realizes φ1(x)∧…∧φn(x) in 𝔐k } = { k | 𝔐k ⊨ ∃x (φ1(x)∧…∧φn(x)) } ∈ U.
Since ultrafilters are closed under supersets, { k | ak realizes φn(x) in 𝔐k } ∈ U
Since c refers to ak in 𝔐k, { k | 𝔐k ⊨ φn(c) } ∈ U.
So by Łoś’s theorem, 𝔐 φn(c).
So in 𝔐, [a1, a2, a3, …] realizes φn.
Since φn was an arbitrary formula in S, [a1, a2, a3, …] realizes all of S.

And we’re done! Since [a1, a2, a3, …] realizes all of S, S is realizable!

Concluding thoughts

This proof is pretty similar to the one in the last post, but with some unnecessary bells and whistles removed. Unlike the last one, this one was designed by me personally, so it’s possible there’s something important missing. But I doubt it!

It has a similar “constructive” character to the last proof, in that it doesn’t just say that S is realizable in M, it also tells you exactly what the element of M that realizes S is. I’ve noticed that this “constructiveness” is a common feature of proofs in this area of logic. Look back at our proofs of the compactness theorem. They didn’t just tell us that our starting set of sentences was satisfiable; they actually gave us a recipe for constructing a model of that set of sentences.

Now, why am I putting constructive in quotation marks? The thing is, on the traditional notion of constructiveness, none of these proofs are really constructive, because free ultrafilters can’t be shown to exist for arbitrary infinite sets without the axiom of choice. (Remember our usage of Zorn’s lemma in Part 3 of the series?) The axiom of choice is a big no-no for the constructivist, and there is a sense in which you can’t super concretely “get your hands on” a free ultrafilter. There’s no nice general procedure for specifying what’s in the ultrafilter and what’s not; if there was we wouldn’t need choice, we’d just define the ultrafilter with this procedure!

Nonetheless, I think there’s a nice notion of “conditional constructiveness”, where you say “assuming that you somehow get your hands on an ultrafilter, then what can you construct with it”. And in this sense, you can construct all the things I gave examples of earlier. 

Not all nonconstructive objects are equally scary or hard-to-comprehend in my eyes, so it’s nice to be able to say something like “these are both nonconstructive in the absolute sense, but the first is constructive conditional on the existence of a free ultrafilter on the naturals, and the second is constructive conditional on the ZFC-independent existence of a large cardinal.” An approach like this frees you up to care about constructiveness without having to ascribe to the full strict Brouwerian philosophy.

All About Countable Saturation (Ultra Series 7)

Previous: Ultraproducts and Compactness

What is Countable Saturation?

I’ve spent a lot of time in the last posts gushing about how cool and powerful Łoś’s theorem is. But Łoś’s theorem is only the second most powerful property of ultraproducts that I know. The most powerful is what’s known as countable saturation.

Countable saturation is a property of structures, and it’s all about definability and parameters. Here it is in one sentence: M is countably saturated if any set of formulas using countably many parameters which is finitely realized in M is also realized in M.

Said another way, M is countably saturated means that: for any set of formulas X with countably many parameters, if the set of subsets of M defined by finite subsets of X has nonempty intersection, then there’s an object in M that satisfies all of X.

If that was a little too much to parse, don’t worry. We’ll now slowly build up to the definition by discussing definability and parameters.

Definability

Let L be some first-order language, and M be some L-structure. Suppose φ(x) is an L-formula with one free variable, x. We can consider the set of objects in the universe of M that satisfy φ: { m ∈ M | M ⊨ φ(m) }. We say that this is the set defined by φ.

For instance, let L be the language of PA and M be the natural numbers. What are the sets defined by the following formulas?

φ1(x): x > x
φ2(x): x = x
φ3(x): ∃y (y + y + y = x)
φ4(x): ∃y (x + y = x ⋅ y)
φ5(x): ∀y ∀z (y ⋅ z = x → (y = 1 ∨ z = 1))

Answers:

φ1 defines ∅
φ2 defines ℕ
φ3 defines 3ℕ
φ4 defines {0, 2}
φ5 defines the primes

Many many sets of naturals are definable, including uncomputable sets like the set of Busy Beaver numbers. With the language of PA, we are able to define everything on the arithmetic hierarchy, described here.

We’ll denote the set of all definable subsets of a structure M by writing D(M). So D(ℕ) is the set of all arithmetical sets.

Parameters

That’s definability without parameters. What about definability with parameters? Parameters are like additional constants you tack onto a language that refer to a fixed object of your choice. Let’s illustrate this with another example. Suppose that our language is empty, and our structure M contains three objects.

We don’t have any constants, so we have no way to pick out any object by name. We also don’t have any relations or functions, so we have no way to pick out any object by its properties or relationship to other objects. So take a guess, what is D(M)?

Your first thought may have been that D(M) is empty; after all if you can’t pick out any individual objects in the structure, how can you define any sets of objects? But there are still two sets of objects you can always define: the empty set (defined by any contradiction) and the entire structure (defined by any tautology).

D(M) = {∅, M}

Now, suppose that we have a parameter that refers to one particular object. Let’s say the parameter is a.

We now consider formulas φ(x, a) that depend not just on the free variable x, but also this new parameter. For instance, φ(x, a): x = a. What set does this define?

We’ll write D(M, A) for the set of all subsets of M definable with parameters from A. What is D(M, {a}) in our present example?

Answer:

D(M, {a}) = {∅, {a}, M\{a}, M}

We still can’t get every subset of M, because we have no way to single out the objects besides a. Can we do it with two parameters a and b referring to distinct objects in M?

Yes! We don’t need a parameter for the third object, because we can define it with the formula “x ≠ a ∧ x ≠ b”. And once we’ve defined all singletons, we can use disjunctions to define all two-element sets. So with just two parameters, we get to define every possible subset of M.

D(M, {a,b}) = 𝒫(M)

If our structure had 4 objects instead of 3 we could do it with 3 parameters, and if it had 12 objects we could do it with 11 parameters. In general, any finite structure can be fully defined with some finite number of parameters.

Infinite structures are trickier. How many parameters are necessary to fully define ℕ in the language of PA?

Trick question! This just can’t be done in the language of PA. Adding parameters doesn’t actually let us define anything new, because every object can already be referred to: (0, S0, SS0, SSS0, …). Unlike with finite structures, having a name for every object does not guarantee that every set of objects can be defined (though it does guarantee that every finite set of objects can be defined).

But while parameters don’t help with ℕ, they do help with nonstandard models of arithmetic, as these contain objects that can’t be defined with just the symbols in the language. Suppose that K is a parameter that refers to the infinite hypernatural number [0,1,2,3,4,…]. Then { x ∈ *ℕ | x < K } is in D(*ℕ, {K}), but not in D(*ℕ, ∅). We also get { x ∈ *ℕ | x > K ∧ x is prime }, and many other sets.

Finite intersection property

We’re almost ready to define countable saturation. We just need one more concept: the finite intersection property. A set of formulas has the finite intersection property if each of its finite subsets can be realized by some object in M.

Some examples in ℕ:

(i) P = {x < 1, x < 2, x < 3, x < 4, x < 5, x < 6…}
(ii) P = {x > 1, x > 2, x > 3, x > 4, x > 5, x > 6, …}
(iii) P = {∃y (x = y⋅1), ∃y (x = y⋅2), ∃y (x = y⋅3), …}

For each example, we want to verify that for every finite subset of P, we can find an element that satisfies everything in that finite subset.

For instance, to show that (i) really has the finite intersection property, we think about any finite set of formulas in P, such as {x < 12, x < 33, x < 59, x < 64}. What’s an x that satisfies all of these? Obviously anything less than 12 will work, so we can take x = 11. In general, for any finite set F we can just look at the minimum n for which the sentence “x < n” appears in F and then set x = n – 1. More simply, we can just always set x = 0 and this will satisfy every finite subset.

For (ii), we can take any finite set F and find the maximum n for which the sentence x > n appears in F. Then set x = n + 1.

Figure out (iii) for yourself!

There’s an important difference you may have noticed between (i) and (ii)/(iii). Namely, in (i) we have a number (x = 0) that satisfies all of P. But in (ii) no element of c will satisfy all of P, and the same goes for (iii). (For (ii) this would correspond to an infinitely large number, and for (iii) a number that’s a multiple of every standard natural number. Notice that elements like this do exist in *ℕ!)

What this shows is that a set having the finite intersection property does not guarantee that all formulas in it can be simultaneously satisfied by a single element. That is, unless the structure in question is countably saturated!

Countable saturation

And we’ve arrived! Consider any countable set of parameters A, and then any set of formulas using these parameters. Countable saturation of M means that any such set with the finite intersection property will necessarily be satisfied by some object in M.

So for instance, if M is countably saturated, then the fact that {x is a multiple of 2, x is a multiple of 3, x is a multiple of 4, …} has the finite intersection property implies that there’s an object in M that’s a multiple of 2, 3, 4, 5, and all the rest.

Similarly, in a countably saturated model M, {x > 1, x > 2, x > 3, …} having the finite intersection property implies that there’s an object in M that’s bigger than 1, 2, 3, and every other finite number.

Clearly, ℕ isn’t countably saturated. But *ℕ is! So is *ℝ, and *ℤ! And in fact, as long as your ultraproduct is over models with a countable language and has index set ℕ, it will always be countably saturated! This is the big result we’ve been building up to.

For any countable language L and set of L-structures (M1, M2, M3, …),
∏(Mi)i∈ℕ/U is countably saturated

Let’s prove it!

The Proof

Here’s our setup:

L is a countable language
(M1, M2, M3, …) is a family of L-structures
U is a free ultrafilter on ℕ
M = is the ultraproduct ∏(Mi)i∈ℕ/U
A is a countable subset of M
LA is the language L with an added constant for every a ∈ A
P(x) is a countable set of LA-formulas

Suppose that P(x) is finitely realizable in M. We want to show that it’s also realizable in M. We’ll do this by explicitly constructing the element of M that realizes P(x). Since P(x) is countable, we can choose some enumeration of its formulas: P(x) = {φ1(x), φ2(x), φ3(x), …}.

Begin by writing out the sequence of all of our component models.

We’re going to now start a step-by-step process of eliminating models.

Step 1: Remove all models that don’t realize φ1.

For illustration purposes, let’s suppose that all models except M3 and M7 realize φ1.

All remaining models realize φ1.

Step 2: Remove M1, as well as all models that don’t realize φ1∧φ2.

We’ll suppose that of the remaining models, only M5 doesn’t realize φ1∧φ2. So we remove M1 and M5.

We’re now left with a set of models that all realize φ1∧φ2.

Step 3: Remove M2, as well as all models that don’t realize φ1∧φ2∧φ3.

This time we remove M6 and M10, in addition to M2.

Step 4: Remove M3, as well as all models that don’t realize φ1∧φ2∧φ3∧φ4.

M3 had already been removed, so we don’t do anything with it, but we now also remove M9.

Continue doing this process forever, so that there’s a “Step N” for every natural number N. We end up with something like this:

There’s something really important that we should notice about each of these sets of models: they’re all U-large. In other words, if at any stage we consider the set of indices of the models that remain, this set will be in U. This follows from finite realizability, by the following argument:

Suppose we consider the set of models that remain after four steps. We know that φ1∧φ2∧φ3∧φ4 is realizable in M. By Los’s theorem, this means that that the set of component models that realize φ1∧φ2∧φ3∧φ4 must be U-large. And the set of models remaining after Step 4 is exactly the set of component models that realize φ1∧φ2∧φ3∧φ4, excluding M1, M2, M3, and M4. So the remaining models form a U-large set minus a finite number of elements, which is still U-large!

Of course, the choice of “four” here was totally arbitrary. For any finite N, the set of models remaining after Step N is the set of models that realize φ1∧…∧φN (a U-large set), minus the models M1 through MN (a finite set). So these are always U-large!

Ok, we’re basically done with the proof now. All that’s left is to use our image to construct an element of M that realizes all of P(x). This construction is extremely simple! Here’s what we do:

For each model, go as many steps as you can before that model is crossed out. Visually, this is just going down each column and circling the model at the step before it is crossed out.

Look closely at the circled models. M4 is circled after Step 3, which means that it contains an element that realizes φ1∧φ2∧φ3. M8 is circled after Step 6, so it contains an element realizes φ1∧…∧φ6. And so on. For each circled model, we just pick out the element that realizes as many of the “φ”s as is promised to us by the diagram. Let’s call these elements a1, a2, a3, and so on.

Note that M3 and M7 were ruled out in Step 1. This was because they didn’t even realize φ1. These models are useless for our purposes (they aren’t able to realize any of the φs), so we just choose any arbitrary element from them. M11 wasn’t ruled out in any of the first six steps, so we can’t be sure from the diagram exactly how many of the φs it can realize. But we know that at the very latest, it’ll be removed at Step 12.

Anyway, we now have our sequence:

(a1, a2, a3, a4, …)

The equivalence class of this sequence is an element of M, and it realizes all of P(x)! Why?

Well, let’s think about which elements of the sequence (a1, a2, a3, …) realize φ1. a1 clearly does, as we chose it for that purpose. What about a2? It was chosen to realize φ1∧φ2, so of course it also realizes φ1. a3 doesn’t realize φ1, because M3 didn’t contain any element that realized φ1. a4 realizes φ1 because it realizes φ1∧φ2∧φ3∧φ4. And so on. In general, ak realizes φ1 for every k that’s the index of one of the structures in Step 1 (a U-large set!)

It might be helpful to keep in mind that when we circled a model in row k, the element we chose from that model realizes the conditions in all the earlier rows. We can visualize this as follows:

Now we can easily see which elements realize each formula:

Realizes φ1: a1, a2, a4, a5, a6, a8, a9, a10, a11, a12, …
Realizes φ2: a2, a4, a6, a8, a9, a10, a11, a12, …
Realizes φ3: a4, a8, a10, a11, a12, …
Realizes φ4: a4, a8, a11, a12, …
Realizes φ5: a8, a11, …
Realizes φ6: a8, a11, …
Realizes φ7: a11, …

(Bonus question: is it possible for more elements to realize φ5 than those I’ve written? Which elements? And if so, why does that not affect the proof?)

Remember how we showed that the set of models in each row was U-large? Well each of these sets of elements is U-large for the exact same reason! One more application of Łoś’s theorem and we’re done: every φ(x) in P(x) is realized by a U-large set of elements of the sequence (a1, a2, a3, …) , so every φ(x) in P(x) is realized by the element of M that is the equivalence class of this sequence: [a1, a2, a3, …].

And that’s the proof!

Hypernatural Saturation

Countable saturation is an enormously powerful tool for probing the structure of ultraproducts. Let’s see some applications.

We can use countable saturation to prove that there’s a number that’s a multiple of every finite number. We can also use it to prove that there’s a number that’s above every finite number. But we already knew all this. The real power of countable saturation shines through when we use parameters.

Take any countable set of parameters {K1, K2, K3, …}, and choose them to refer to distinct hypernaturals. Now consider the set of sentences {x is divisible by K1, x is divisible by K2, x is divisible by K3, …}. This set of sentences has the finite intersection property. Why? Because finite products always exist. Applying countable saturation, we can conclude that there’s a number that’s divisible by K1, K2, K3, and all the rest!

In other words, every countable set of hypernaturals has a multiple! This is significantly stronger than what we knew before. For one thing, it implies that there are uncountably many hypernatural primes! Suppose that there were countably many hypernatural primes. Collect them in a set P, and find a multiple N of everything in P. Now simply add 1 to N. N+1 is not divisible by any primes. It’s also larger than everything in P (use Łoś to prove this), meaning that it isn’t itself a prime. But by Łoś’s theorem, every hypernatural is either prime or composite! Contradiction.

We can also now show that *N has uncountable cofinality. In other words, every countable set of hypernaturals can be upper bounded. Let {K1, K2, K3, …} be any countable set of hypernaturals. Now consider the set of formulas {x > K1, x > K2, x > K3, …}. Every finite subset can be satisfied; simply take the largest Kn that appears in the subset and add one. Thus there’s some x that’s above the entire set {K1, K2, K3, …}. So every countable set of hypernaturals has an upper bound, meaning that any subset of *ℕ that has no upper bound must be uncountable.

We can get uncountable coinitiality of the infinite hypernaturals by a similar argument. Let {K1, K2, K3, …} be any countable set of infinite hypernaturals and consider the set of formulas {x > 0, x < K1, x > 1, x < K2, x > 2, x < K3, x > 3, …}. Fill out the rest of the argument for yourself!

Uncountable coinitiality of the infinites is a big deal! For one thing, it implies that every infinite hypernatural K is above uncountably many other hypernaturals (as otherwise the set of infinite hypernaturals below K would be a countable coinitial set of infinites). This in turn implies that distances between hypernaturals are either finite or uncountable. They cannot be countably infinite!

Why? Well, ∀x ∀y (x < y → ∃z (z + x = y)) is a sentence of TA that says essentially that for any two numbers x < y, the difference y – x exists as a number. And since we now know that all hypernaturals are either finitely large or uncountably large, we can conclude that differences must be as well!

Applying countable saturation more generally

Let L be any countable language and (Mi)i∈ℕ be any countable sequence of L-structures. We know that the ultraproduct M = ∏(Mi)i∈ℕ/U is countably saturated. Let’s further assume that M is infinite. Consider any countable set {K0, K1, K2, …} of objects in M, and apply countable saturation to the set of sentences {x ≠ K0, x ≠ K1, x ≠ K2, …} to prove that there’s some object in M outside of this set.

But the countable set {K0, K1, K2, …} was fully arbitrary! What this means is that any infinite ultraproduct M must be uncountable! We know that there are finite ultraproducts over countable languages (like the ultrapower of ℤ3, say). So any ultraproducts over a countable language with index set ℕ must be either finite or uncountable. (Exercise: where in the argument did we use the assumption that M is infinite?)

I hope that this next (and final) result blows your mind. It turns out that the continuum hypothesis (ℵ1 = 20) has some huge implications on ultraproduct-related matters. I tend to assume that CH is false, which makes the upcoming result a little less exciting to me, but nonetheless I hope it knocks your socks off.

Continuum hypothesis and isomorphic ultrapowers

Fix some countable language L. Suppose that M1 and M2 are elementarily equivalent L-structures, and that |M1| = |M2| = ℵ0. If the continuum hypothesis is true, then the ultrapowers of M1 and M2 are isomorphic! For example, if ℕ is the standard model of arithmetic, and ℕ’ is some nonstandard countable model of true arithmetic, then *ℕ = *(ℕ’).

The proof of this is pretty nice, but to get there I need to introduce a few more notions.

So far we’ve just talked about countable saturation. But there’s a more general notion of K-saturation for any cardinal K. A structure M is K-saturated if any set of formulas using less than K parameters that’s finitely realizable in M is also realizable in M. So in a slightly annoying terminological quirk, countable saturation is actually ℵ1-saturation, as “fewer than ℵ1 parameters” is the same as “countably many parameters”.

A structure M is called saturated if it’s |M|-saturated. These structures are “as saturated as possible”. Nothing can be K-saturated for a cardinal K larger than or equal to its own size (to prove this, add a parameter for every object in the structure and declare x to differ from each of them).

Saturation is a really strong constraint on a structure. It turns out that if two structures are elementarily equivalent, the same size, and saturated, then they are necessarily isomorphic. This means that for instance there’s at most one saturated countable model of true arithmetic (up to isomorphism). The same holds for saturated models of any cardinality of Tarskian geometry, the theory of real closed fields, and any other complete theory.

And now we’re ready for the proof!

Assume that M1 and M2 are elementarily equivalent structures in a countable language, and that |M1| = |M2| = ℵ0. Take each of their ultrapowers with index set ℕ and arbitrary ultrafilters, and call them *M1 and *M2. What size are these structures? Since they’re ultraproducts, they’re both ℵ1-saturated, which means that they cannot be countable. And they’re not finite, so they must be uncountable.

Furthermore, |*M1| ≤ |M1|, as the elements of *M1 are just equivalence classes on M1. So |*M1| ≤ |M1| = ℵ00 = 20. Assuming the continuum hypothesis, 20 = ℵ1. So ℵ0 < |M1| ≤ ℵ1, meaning that |*M1| = ℵ1. But remember that *M1 is ℵ1-saturated! So it’s as saturated as can be! Thus we can say that *M1 is saturated. The exact same argument goes for *M2. So *M1 and *M2 are saturated elementarily equivalent structures, and they’re both cardinality ℵ1. So they’re isomorphic!

Thus the continuum hypothesis tells us that ultrapowers erase differences between elementarily equivalent countable structures. Even though countable nonstandard models of TA look so different from the standard model, their ultraproducts are isomorphic. In particular, addition and multiplication are computable on ℕ and uncomputable on any nonstandard countable model of arithmetic. So according to CH, computable and uncomputable structures can have the same ultrapower!

Ultraproducts and Compactness (Ultra Series 6)

Previous: Infinitely Large Primes

We’ve seen how to use Łoś’s theorem to prove properties of hyperstructures. We’ve also seen how to use it to prove limitations of first-order logic. Today we’re going to see a major meta-logical application of Łoś’s theorem: proving the compactness theorem! In my opinion, this proof is the most elegant proof of first-order compactness out there. We’ll also see a few other meta-logical results relating to ultraproducts.

What is compactness?

A logic is compact if any finitely satisfiable set of sentences is satisfiable. If somebody hands you an infinite set of sentences, and you know how to construct a model to satisfy any finite subset of these sentences, then you can conclude that there must be a model that satisfies all those sentences at once. Compactness is interesting because it allows you to prove surprising results (like that true arithmetic has nonstandard models) from very obvious facts (that any finite set of natural numbers has an upper bound). It also lets you prove the completeness theorem, which says that what you can prove from first-order axioms is exactly what logically follows from those axioms.

Countable Theories

First we’ll prove the compactness theorem for countable theories. I’m including it because it’s conceptually simpler than the proof of the general case. However, it’s not necessary for understanding the general proof, so if you just want to see that you can jump ahead to the next section.

Suppose T is a countably infinite theory that’s finitely satisfiable.
T is countable, so choose an enumeration of T’s sentences: T = {φ1, φ2, φ3, …}.

T is finitely satisfiable, so there’s a model that satisfies φ1. Call this model M1.
There’s also a model M2 that satisfies both φ1 and φ2.
And a model M3 that satisfies φ1, φ2, and φ3.
And so on.

Take the ultraproduct of (M1, M2, M3, …) and call it M.
For every n in N, φn is satisfied by all models from Mk on. (For instance, φ4 is satisfied by M4, M5, M6, and so on.)
This is a cofinite set, so (by Los’s theorem) φn is satisfied by M.

But φn was a totally arbitrary sentence inside T. So M satisfies all of T!

So T is satisfiable. But not only do we get this result, we also have an explicit example of a model that satisfies all of T, which we can start analyzing to prove other properties of.

Compactness for any theory

The proof uses a fact that I haven’t proven yet: If X is a set whose every finite subset has nonempty intersection, then there’s some ultrafilter that extends it. Luckily, it’s easily proven. Start by closing X under supersets (notice that every finite subset of X still has nonempty intersection). Then close it under intersection, and you now have a filter that extends X. (Verify that this really is a filter for bonus points!) And in Part 3 of this series we proved that every filter has an ultrafilter extension. So there’s an ultrafilter U that contains X.

Alright, on to the proof.

Let L be any first-order language and S be any set of L-sentences that is finitely satisfiable. We’re going to build an ultraproduct that satisfies S. This requires three ingredients: an index set I, and a set of L-structures (Mi)i∈I indexed by I, and an ultrafilter U on I. We’ll go through these one at a time.

1. The index set I

I will be the set of all finite subsets of S. This is interestingly different from what we’ve seen before. Previously our indices were always natural numbers. But here our indices are sets of sentences in the language of the structure! This means that we’re going to end up looking at models that look like M{φ,ψ,θ}, where φ,ψ, and θ are sentences in S.

2. The component structures (Mi)i∈I

So, our indices are finite subsets of S. And S is finitely satisfiable. So for each index F (each finite subset of S), there’s a model MF that satisfies everything in F. This is how we get our component structures.

For instance, if φ, ψ, and θ are three sentences in S, then {φ,ψ,θ} is an index, and M{φ,ψ,θ} is any structure that satisfies φ∧ψ∧θ. ∅ is also a finite subset of S, so it’s an index and has a corresponding structure M, which can be any L-structure whatsoever.

To briefly rehash, the important feature of our choice of component structures is that they have the following property: For any finite subset F of S, MF ⊨ φ for every φ in F.

3. The ultrafilter U on I

The choice of ultrafilter here is really important. Our component structures were carefully chosen so that each structure satisfies the finite set of sentences that is its index. This guarantees us that for any sentence φ in S, the set of component structures whose indices contain φ all satisfy φ. This is the set of component structures whose indices are finite supersets of {φ}.

Remember that we’re ultimately aiming to show that each φ is satisfied by the ultraproduct model. By Łoś’s theorem, it suffices to show that the set of component models that satisfy φ is U-large (is in our chosen ultrafilter). Conclusion: we want to make sure that for any φ in S, the set of all finite supersets of {φ} is U-large.

How do we know that there exists any ultrafilter that for every φ contains the set of all finite supersets of {φ}? We apply the finite intersection property! Denote by SF the set of all finite supersets of F. Take any finite set of singletons {{φ1}, {φ2}, {φ3}, …, {φn}}. Is the intersection of S{φ1}, S{φ2}, S{φ3}, …, S{φn} empty? No: the set 1, φ2, φ3, …, φn} is in of all of them! So the set { S{φ} | φ ∈ S } has the finite intersection property, and thus there’s some ultrafilter that extends it. And that’s the ultrafilter we want!

Finalizing things

We’ve now got everything we need to build the structure that satisfies S. Take the ultraproduct Π(MF)F∈I/U. This structure satisfies S. Why?

Consider any φ in S. By Łoś’s theorem, φ is satisfied if a U-large set of the component models satisfy φ. And we know that a U-large set of component models satisfy φ, because φ is satisfied in every model whose index contains φ, i.e. the set of models whose indices are supersets of {φ}. So we’re done!

More implications

The remainder of this post will not be necessary to understand the rest of the series. I want to present a few miscellaneous meta-logical results that relate to ultraproducts that don’t quite fit anywhere else.

First: every finite structure is its own ultrapower. Let’s see why.

Suppose M is a finite structure. Denote by MI/U the ultrapower of M over any arbitrary index set I. First of all, we can easily see that M and MI/U have the same cardinality: if |M| = n for some finite n, then there’s a sentence affirmed by M that says “There are exactly n distinct things”. For instance, if M has exactly two elements, then M ⊨ ∃x ∃y (x≠y ∧ ∀z (z=x ∨ z=y)). Łoś tells us that MI/U also affirms this sentence, so MI/U also has exactly n distinct things.

Furthermore, any finite structure M of size n can be categorically specified up to isomorphism by a single first-order sentence of the form ∃x1∃x2…∃xn (∀x (x = x1 ∨ … ∨ x = xn) ∧ (x1 ≠ x2 ∧ … xn-1 ≠ xn) ∧ ψ(x1, …, xn) ], where ψ specifies the behavior of all relation, function, and constant symbols. Here’s an example for a simple finite structure:

We have four elements, so we use four variables, x1 x2 x3 and x4 to specify the structure.

Since M satisfies this sentence, Łoś tells us that MI/U also affirms this sentence, meaning that MI/U and M are isomorphic. So in other words, every finite structure is its own ultrapower (up to isomorphism). 

What about infinite structures? The situation is the exact opposite! No infinite structure is its own ultrapower.

Let M be an infinite structure and *M be its ultrapower. Add a constant for every object in M. The ultraproduct construction tells you how constants from M translate to constants in *M: for each constant c, if c refers to m in M, then c refers to [m, m, m, …] in *M.

Now we want to construct a new element of *M by writing a sequence that contains infinitely many elements of M, each appearing only finitely many times. Crucially, here we must assume that the index set is the same size as or smaller than M! If we have too many indices, then we aren’t going to be able to refer to each element in our original model only a finite number of times. So for instance, this argument won’t work for the ultrapower ℕ/U.

Now, this new element we’ve constructed is distinct from each constant, because they only agree in finitely many place. The constants covered all of M, but they don’t cover all of *M. So M and *M are not isomorphic!

And some more

These next results relate ultraproducts to compactness and definability.

Fix a first-order language L and consider a class C of L-structures. Say that a set of sentences X is C-satisfiable if there’s a structure in C that satisfies all of X. We say that C is compact if for every set A of L-formulas, finite C-satisfiability implies C-satisfiability. This is a different notion from the compactness discussed in the compactness theorem; the ordinary compactness theorem tells us that for every first-order language L, the class of all L-structures is compact. But it doesn’t tell us anything about other more restricted classes of L-structures.

For instance, let L be the language with an empty signature (no constants, functions or relations). This language is extremely simple: essentially the only contentful sentences you can make are about finite cardinalities. E.g. you can say “there are at least three things” with “∃x ∃y ∃z (x≠y ∧ x≠z ∧ y≠z)” and “there are at most three things” with “∀x ∀y ∀z ∀w (x=y ∨ x=z ∨ x=w ∨ y=z ∨ y=w ∨ z=w)”.

Now let C be the class of all finite L-structures. Is this class compact?

If you said no, good job! For a finite n, let φn be the sentence that says “there are at least n things”, and consider the set of sentences A = {φ1, φ2, φ3, …}. Every finite subset of A is satisfiable by a finite L-structure, but the entire set is clearly not satisfiable by any finite L-structure.

What if we add to C a single countably infinite L-structure M? Then our previous construction no longer works; the entire set A is now satisfied by this new member of C.

Alright, so that’s compactness for classes of structures. The relationship between this and ultraproducts is the following:

Any class of models that’s closed under ultraproducts is also compact.

For any language L and any finite n, consider the class of all L-structures of size n. The ultraproduct of a bunch of size-n L-structures is itself a size-n L-structure, so this class is closed under ultraproducts. And therefore it’s compact!

Now consider our example of the class of all finite structures in the language with empty signature. This was not compact, meaning that it cannot be closed under ultraproducts. This means that the ultraproduct of a bunch of finite structures is not necessarily finite! In particular, suppose that M1 has one element, M2 has two elements, M3 has three elements, and so on. Then the ultraproduct of (M1, M2, M3, …) will be an infinite model. We can also see this by applying Łoś’s theorem to the set of sentences {“There is at least 1 thing.”, “There are at least 2 things.”, “There are at least 3 things.”, …}.

A class C of L-structures is first-order definable if it is the set of all models of some set of L-sentences. First-order definability also relates to ultraproducts, in the following way:

If a class of L-structures is first-order definable, then it’s closed under ultraproducts.

The converse of this is that any class of structures that’s not closed under ultraproducts is not definable.

For any L, the class of finite L-structures is not closed under ultraproducts, so it’s not definable. This tells us that “finiteness” is not first-order definable.

The class of all models of PA is first-order definable, so it’s closed under ultraproducts. This means that even if you take an ultraproduct of infinitely many nonstandard models of PA, you still end up with a model of PA!

Okay, next up we come to the most exciting and powerful property of ultraproducts: countable saturation!

Next: All About Countable Saturation

Infinitely Large Primes (Ultra Series 5)

Previous: Ultraproducts and Łoś’s Theorem

Let’s quickly review what we’ve learned so far.

The hypernaturals, *ℕ, are the ultrapower of ℕ with index set I = ℕ. The first-order language in use here is the language of Peano arithmetic: L = <{0}, {<}, {S,+,⋅}>. Łoś’s theorem from the last post tells us that *ℕ and ℕ are elementarily equivalent. In other words:

For every first-order sentence φ, ℕ ⊨ φ if and only if *ℕ ⊨ φ.

This tells us that *ℕ is not just a model of Peano arithmetic, which is a popular incomplete theory of arithmetic. It’s a model of TRUE arithmetic, the complete set of all first-order truths of arithmetic! Given the insane uncomputability of this theory (If T0 is a Turing machine, and Tn+1 is a Turing machine with an oracle for Tn‘s halting problem, then true arithmetic is not computable by Tn for any finite n), it’s pretty remarkable that we have an explicit example of one of its nonstandard models!

Review over! Now let’s begin with some basic applications of Łoś’s theorem. We can say that a number x is prime in first-order logic: isPrime(x) can be translated as ∀y ∀z (y⋅z = x → (y = 1 ∨ z = 1)). We can also refer to any finite natural number by applying the successor function enough times to zero (e.g. 2 = S(S(0))). Since “2 is a prime” is first-order expressible and ℕ affirms its truth, so must *ℕ.

But remember that in *ℕ, 2 is the equivalence class of the sequence (2,2,2,2,…). Is it really true that for any a, b ∈ *ℕ, if a⋅b = [2,2,2,2,…] then one of a and b is 2 and the other is 1? This is probably not immediately obvious… What about (2,1,2,1,2,1,…)⋅(1,2,1,2,1,2,…)? Certainly this product is (2,2,2,2,…), but neither (2,1,2,1,2,1,…) nor (1,2,1,2,1,2,…) look very much like (1,1,1,1,…) or (2,2,2,2,…).

In fact they are the same. Let’s prove this very generally. Suppose p is any finite prime number, and consider the hypernatural [p, p, p, …], corresponding to the ordinary natural number p. Consider any two hypernaturals a and b such that a⋅b = p. We can choose representative sequences (a0, a1, a2, …) and (b0, b1, b2, …) such that their product is (p, p, p, …). In other words:

(a0⋅b0, a1⋅b1, a2⋅b2, …) = (p, p, p, …)

Since p is prime, we know that for each k, either (ak = p and bk = 1) or (ak = 1 and bk = p). Consider the set of indices A = { k ∈ ℕ | ak = p }. Since each ak is either 1 or p, the complement ℕ\A is equal to { k ∈ ℕ | ak = 1 }. Now, since U is an ultrafilter, either A or ℕ\A is in U. In the first case, [a0, a1, a2, …] = [p, p, p, …] = p. And in the second case, [a0, a1, a2, …] = [1, 1, 1, …] = 1. The same argument applies for [b0, b1, b2, …]. So for any two hypernaturals a and b such that a⋅b = p, both a and b are either 1 or p.

Okay, so we’ve proven that 2 is a prime number. That’s a relief! Let’s do something a bit more substantial now.

Consider the following first-order sentence: ∀x ∃y (y > x ∧ isPrime(y)). This sentence says that there are infinitely many primes, and (as first proven by Euclid) ℕ affirms its truth. So *ℕ ⊨ ∀x ∃y (y > x ∧ isPrime(y)).

Okay, so there are infinitely many hypernatural primes. This doesn’t sound so exciting… after all every ordinary prime is a hypernatural prime too, so didn’t we already know this? Sure, but if we look closely at what exactly the sentence says, it turns out that we get a bit more out of it than just that.

∀x ∃y (y > x ∧ isPrime(y)) says that for every number x, there’s a larger prime number y. Applying this in *ℕ, this means that all hypernatural numbers, even the infinitely large ones, have larger primes. Thus there are nonstandard primes in *ℕ, and in fact infinitely many nonstandard primes.

See if you can think of one for yourself before reading on.

.

.

.

Ok, here’s one: P = [2, 3, 5, 7, 11, 13, 17, …]. This sequence is eventually always larger than the sequence of any standard prime p = [p, p, p, p, …], so P is a nonstandard number. And P is also prime! The argument is very similar to the previous argument: take any two hypernaturals a and b such that a⋅b = k. Consider any representative sequences for a and b whose product is the sequence (2, 3, 5, 7, 11, …), and then consider the set of indices at which 1 appears in a’s sequence. If this set is in the ultrafilter, then a = 1 and b = k. If not, then a = k and b = 1.

Now, a challenge for you: construct an infinite ascending sequence of nonstandard primes starting at P.

And another challenge: construct an infinite descending sequence of nonstandard primes starting at P!

Hint for the first challenge: think about the hypernatural [3, 5, 7, 11, 13, 17, …].

Hint for the second challenge: think about the hypernatural [2, 2, 3, 5, 7, 11, …], or [2, 2, 3, 3, 5, 5, 7, 7, …].

While infinite ascending sequences of numbers can exist in ℕ, an infinite descending sequence of numbers can not, so we’ve identified a concrete difference between *ℕ and ℕ. And by Łoś’s theorem, this difference must not be first-order expressible in the language of PA. In other words, “there are no infinitely descending sequences of numbers” cannot be expressed in first-order.

This illustrates another usage of Łoś’s theorem: rather than starting with a first-order sentence that’s true of ℕ and using it to learn about *ℕ, we can start with some facts about *ℕ to learn about limitations of first-order logic!

Alright, what else? Suppose we have a hypernatural number k that has a sequence (k0, k1, k2, …) such that some formula φ(x) holds of cofinitely many elements of the sequence. Then φ(x) also holds of k. This gives us some hypernatural numbers with awfully strange properties.

Consider the hypernatural E = [1, 2, 4, 8, 16, 32, 64, …]. Now consider the formulas that say “x is divisible by 2”, “x is divisible by 4”, “x is divisible by 8”, “x is divisible by 16”, and so on. Each of these formulas holds true of all but some initial segment of the sequence (1, 2, 4, 8, 16, 32, 64, …). This means that each of these sentences hold true of E. In other words, E is divisible by 2, by 4, by 8, by 16, by 32, and so on forever. In other words, E can be divided by 2 forever!

But how exactly does this work? The sequence (1, 2, 4, 8, 16, 32, 64, …) starts with a 1. So how could we divide it by 2? The key is to remember that hypernaturals are equivalence classes of sequences, not sequences. So any time we run into trouble with a particular sequence, we can jump to an equivalent sequence that solves our problem. To divide (1, 2, 4, 8, 16, 32, 64, …) by 2, we jump to the equivalent sequence (2, 2, 4, 8, 16, 32, 64, …), which is still a representative of E. Now we can divide by 2: E/2 = [1, 1, 2, 4, 8, 16, 32, …]. To divide by 2 again, we jump to the sequence (2, 2, 2, 4, 8, 16, 32, …). Then we get E/4 = [1, 1, 1, 2, 4, 8, 16, …]. We can keep doing this as many times as we like, and thus E is infinitely even! (Notice: another infinitely descending sequence of numbers.)

We can use a similar trick to construct a number that’s divisible by every standard natural number: simply consider M = [0!, 1!, 2!, 3!, 4!, 5!, …]. For any standard natural number n, the set of indices at which the sequence (0!, 1!, 2!, 3!, 4!, 5!, …) can be divided by n is cofinite. So M can be divided by n as well!

This lets us run a kind of fun argument. Remember how Euclid’s original proof of the infinitude of primes went? Imagine that there are finitely many primes. Then we can take their product and add 1, and we’ve constructed a new prime. Well, in *ℕ we can find multiples of infinite sets of numbers! One multiple of all standard prime numbers is [2, 2⋅3, 2⋅3⋅5, 2⋅3⋅5⋅7, 2⋅3⋅5⋅7⋅11, …] = [2, 6, 30, 210, 2310, …]. This is divisible by each standard prime, and is also not twice-divisible by any prime (e.g. you can’t divide it by 4). (Interestingly, it’s not the product of all finite primes; because it’s also divisible by some infinite primes! See if you can think of one such infinite factor.) Now add 1 to get X = [3, 7, 31, 211, 2311, …]. X cannot be divisible by any standard prime, meaning that it’s either divisible by a nonstandard prime, or is itself a nonstandard prime. So just as Euclid’s original proof showed that there can’t be only finitely many primes, this proves that there can’t be only finitely large primes.

Alright, let’s get a little more wild. In ℕ we know that for any number x, there’s a number y that’s divisible by all positive numbers up to and including x (for instance, y = x!). In other words, ℕ ⊨ ∀x ∃y ∀z ((z ≠ 0 ∧ z ≤ x) → (y div by z)), where “y div by z” is shorthand for ∃u (z⋅u = y). In other words, ℕ affirms that for every initial segment S with maximum element x, there’s a number that’s divisible by everything in S (for shorthand we can call this number “a multiple of S”).

Invoke Łoś: anything true in ℕ must also be true in *ℕ. This means that every initial segment of hypernaturals with a maximum element has a multiple! (See if you can do better: can you also prove that every initial segment of hypernaturals with a max has a product? What about initial segments without maximum elements?) For instance, there’s a number that’s divisible by every hypernatural number up to and including the hypernatural K = [0,1,2,3,4,5,…].

What other first-order properties does ℕ have? Here’s one: ℕ is not dense. We can express this by saying ¬(∀x ∀z (x < z → ∃y (x < y ∧ y < z)). We can do better: ℕ affirms that ∀x ¬∃y (x < y ∧ y < S(x)). In other words, for any number x, x and its successor are counterexamples to density (they’re distinct numbers that have nothing between them).

So this must be true of hypernaturals as well! For any hypernatural, say K = [0,1,2,3,4,5,…], there’s no hypernatural between it and its successor K+1 = [1,2,3,4,5,6,…].

However, even though the hypernaturals are not dense, they do contain a dense subset! In fact, we can say something even stronger: no segment of the hypernaturals are dense, but nonetheless *ℕ has a dense subset!

Here’s a rough sketch for how to construct this dense subset. Take any nonstandard hypernatural, say K = [0, 1, 2, 3, 4, 5, …]. For any nonzero finite m, we can construct a hypernatural Km a finite distance from K, such that K’ is divisible by m. Now we can construct n⋅Km, and then (n/m)⋅Km. The set { (n/m)⋅Km | n, m ∈ ℕ } is a subset of the hypernaturals, and it has the order type of the positive rationals!

What other unusual hypernatural numbers can you discover?

Next, we will prove the compactness theorem using ultraproducts. It’s definitely the prettiest proof of compactness out there, so stay tuned!

Ultraproducts and Compactness

Ultraproducts and Łoś’s Theorem (Ultra Series 4)

Previous: Hypernaturals in all their glory

First things first, you’re probably asking yourself… how is Łoś pronounced?? I’m not the most knowledgeable when it comes to Polish pronunciation, but from what I’ve seen the Ł is like a “w”, the o is like the vowel in “thought”, and the ś is like “sh”. So it’s something like “wash”. (I think.)

Ok, on to the math! This is probably going to be the hardest post in this series, so I encourage you to read through it slowly and not give up if you start getting lost. Try to work out some examples for yourself; that’s often the best way to get a grasp on an abstract concept. (In general, my biggest tip for somebody starting to dive into serious mathematical content is to not read it like fiction! In fiction, short sentences can safely be read quickly. But math is a language in which complex ideas can be expressed very compactly. So fight your natural urge to rush through short technical sentences, and don’t feel bad about taking your time in parsing them!)

What the heck is an ultraproduct

Last post we defined an ultrafilter. Now let’s define an ultraproduct.

It turns out we’ve already seen an example of an ultraproduct: the hypernaturals. The hyperreals (denoted *ℝ) are another famous example of an ultraproduct, constructed from ℝ in exactly the same way as we obtained the hypernaturals *ℕ from ℕ. For any structure M whatsoever, there exists a “hyperstructure” *M obtained from M via an ultraproduct. So what exactly is an ultraproduct?

We’ll build up to it in a series of six steps.

(1) Choose an index set I.
(2) Choose a first-order language L and a family of L-structures indexed by I (Mi)i∈I.
(3) Define sequences of elements of these structures.
(4) Define an equivalence relation between these sequences.
(5) Construct the set of equivalence classes under this equivalence relation.
(6) Define the interpretations of the symbols of L in this set.

Let’s get started!

(1) First we select an index set I. This will be the set of indices we use when constructing our sequences of elements. If we want our sequences to be countably infinite, then we choose I = ℕ.

(2) Now, fix some first-order language L = <constants, relations, functions>. Consider any family of L-structures, indexed by I: (Mi)i∈I. For instance, if I = ℕ and L is the language of group theory <{e}, ∅, {⋅}>, then our family of L-structures might be (ℤ1, ℤ2, ℤ3, ℤ4, …), where ℤk is the integers-mod-k (i.e. the cyclic group of size k).

(3) Now we consider sequences of the elements of these structures. For instance, hyperreals are built from sequences of real numbers that look like (a0, a1, a2, a3, …). The set of indices used here is {0, 1, 2, 3, …}, i.e. ℕ.

If our index set isn’t countable, then it’s not possible anymore to visualize sequences like this. For instance, if I = ℝ, then our sequence will have an element for every real number. A more general way to formulate sequences is as maps from the index set I to the elements of the component models. The hyperreal sequence (a0, a1, a2, a3, …) can be thought of as a map a: ℕ → ℝ, where a(n) = an for each n ∈ ℕ.

In general, a sequence will be defined as follows:

f: I → U(Mi)i∈I such that f(i) ∈ Mi for each i ∈ I.

In the ℤk example, a sequence might look like (0, 1, 2, 3, 4, …). Note that 0 ∈ ℤ1, 1 ∈ ℤ2, 2 ∈ ℤ3, and so on. On the other hand (2, 1, 2, 3, 4, …) would not be a valid sequence, because there’s no 2 in ℤ1.

Now, consider the set of all sequences: { f: I → U(Mi)i∈I | f(i) ∈ Mi for each i ∈ I }. This set is called the direct product of (Mi)i∈I and is denoted Π(Mi)i∈I.

(4) We want to construct an equivalence relation on this set. We do so by first defining a free ultrafilter U on I. From the previous post, we know that every infinite set has a free ultrafilter on it, so as long as our index set is infinite, then we’re good to go.

With U in hand, we define the equivalence relation on Π(Mi)i∈I:

Let f and g be sequences (f,g: I → U(Mi)i∈I).
Then f ~ g if and only if { i ∈ I | f(i) = g(i) } ∈ U

You might be getting major deja-vu from the last post. Two sequences are said to be equivalent if the set of places-of-agreement is a member of the chosen ultrafilter. Since every free ultrafilter contains all cofinite sets, any two sequences that agree in all but finitely many places will be equivalent.

(5) Now all the pieces are in place. The ultraproduct of (Mi)i∈I with respect to U is the set of equivalence classes of Π(Mi)i∈I with respect to ~. This is typically written Π(Mi)i∈I/U. For a sequence a: I → U(Mi)i∈I we’ll denote the equivalence class it belongs to as [a].

Now, it’s not necessary for our indexed sequence (Mi)i∈I of L-structures to all be distinct. In fact, all the models can be the same, in which case we have (Mi)i∈I = (M)i∈I, and the ultraproduct Π(M)i∈I/U is called an ultrapower. The ultrapower of M with index set I and ultrafilter U can be written compactly as MI/U.

Some examples: The hypernaturals are the ultrapower ℕ/U = Π(ℕ)i∈ℕ/U where U is any free ultrafilter over ℕ. Similarly, the hyperreals are ℝ/U = Π(ℝ)i∈ℕ/U. The hyperintegers are ℤ/U. And so on.

(6) So far we’ve just defined the ultraproduct as a set (the set of equivalence classes of I-indexed sequences of elements from the models (Mi)i∈I). But we want the ultraproduct to have all the same structure as the models that we used as input. In other words, the ultraproduct of a bunch of L-structures will itself be an L-structure. To make this happen, we need to specify how the relation symbols and function symbols of L work in the ultraproduct model.

Here’s how it works. If f is a unary function symbol in the language L, then we define f on Π(Mi)i∈I/U by applying the function elementwise to the sequences. So:

f: Π(Mi)i∈I/U → Π(Mi)i∈I/U is defined as f([a])(i) = f(a(i)) for every i ∈ I.

What if f is a binary function symbol? Then:

f: (Π(Mi)i∈I/U)2 → Π(Mi)i∈I/U is defined as f([a], [b])(i) = f(a(i), b(i)) for every i ∈ I.

This generalizes in the obvious way to trinary function symbols, quaternary function symbols, and so on.

What about relations? Suppose R is a unary relation symbol in the language L. We need to define R on the ultraproduct Π(Mi)i∈I/U, and we do it as follows:

Π(Mi)i∈I/U ⊨ R([a]) if and only if { i ∈ I | Mi ⊨ R(a(i)) } ∈ U.

In other words, Π(Mi)i∈I/U affirms R([a]) if and only if the set of indices i such that Mi affirms R(a(i)) is in the ultrafilter. For example, if R holds for cofinitely many members of the sequence a, then R holds of [a].

If R is binary, we define it as follows:

Π(Mi)i∈I/U ⊨ R([a], [b]) if and only if { i ∈ I | Mi ⊨ R(a(i), b(i)) } ∈ U.

And again, this generalizes in the obvious way.

This fully defines the ultraproduct Π(Mi)i∈I/U as an L-structure! (If you’re thinking ‘what about constant symbols?’, remember that constants are just 0-ary functions)

Say that again, slower

That was really abstract, so let’s go through it again with the (hopefully now-familiar) example of the hypernaturals.

We start by defining an index set I. We choose I = ℕ.

Now define the language we’ll use. This will be the standard language of Peano arithmetic: one constant symbol (0), one relation symbol (<), and three function symbols (S, +, ×).

The family of structures in this language that we’ll consider (Mi)i∈I will just be a single structure repeated: for each i, Mi will be the L-structure ℕ (the natural numbers with 0, <, +, and × defined on it). So our family of structures is just (ℕ)i∈ℕ = (ℕ, ℕ, ℕ, ℕ, …).

Our sequences are functions from I to U(Mi)i∈I such that for each i∈I, f(i) ∈ Mi. For the hypernaturals, I and U(ℕ)i∈ℕ are both just ℕ, so our sequences are functions from ℕ to ℕ. We can represent the function f: ℕ → ℕ in the familiar way: (f(0), f(1), f(2), f(3), …).

The set of all sequences is the set of all functions from ℕ to ℕ. This is the direct product Π(ℕ)i∈ℕ = ℕ.

Now, we take any ultrafilter on I = ℕ. Call it U. We use U to define the equivalence relation on the direct product ℕ:

(a(0), a(1), a(2), …) ~ (b(0), b(1), b(2), …) if and only if { i ∈ ℕ | a(i) = b(i) } ∈ U

And taking equivalence classes of this relation, we’ve recovered our original definition of the hypernatural numbers! ℕ/U = *ℕ. Now we finish up by defining all functions and relations on *ℕ.

Functions are defined pointwise:

0 = [0, 0, 0, 0, …]
S[a(0), a(1), a(2), …] = [Sa(0), Sa(1), Sa(2), …]
[a(0), a(1), a(2), …] + [b(0), b(1), b(2), …] = [a(0) + b(0), a(1) + b(1), a(2) + b(2), …]
[a(0), a(1), a(2), …] ⋅ [b(0), b(1), b(2), …] = [a(0) ⋅ b(0), a(1) ⋅ b(1), a(2) ⋅ b(2), …]

We just have one relation symbol <, and relations are defined according to the ultrafilter:

[a(0), a(1), a(2), …] < [b(0), b(1), b(2), …] iff { i ∈ ℕ | ℕ ⊨ (a(i) < b(i)) } ∈ U

And we’re done!

Łoś’s theorem

Now we’re ready to prove Łoś’s theorem in its full generality. First, let’s state the result:

Fix any index set I, any language L, and any family of L-structures (Mi)i∈I. Choose a free ultrafilter U on I and construct the ultraproduct structure Π(Mi)i∈I/U. Łoś’s theorem says:

For every L-sentence φ, Π(Mi)i∈I/U ⊨ φ if and only if { i ∈ I | Mi ⊨ φ } ∈ U.

A special case of this is where our ultraproduct is an ultrapower of M, in which case it reduces to:

For every L-sentence φ, MI/U ⊨ φ if and only if M ⊨ φ

In other words, any ultrapower of M is elementary equivalent to M!

The proof is by induction on the set of all L-formulas.

Base case: φ is atomic

Atomic sentences are either of the form R(t1, …, tn) or (t1 = t2) for an n-ary relation symbol R and terms t1, … tn.

Suppose φ is R(t1, …, tn). This case is easy: it was literally the way we defined the interpretation of relation symbols in the ultraproduct model that Π(Mi)i∈I/U ⊨ R(t1, …, tn) if and only if {i ∈ I | Mi ⊨ R(t1, …, tn)} ∈ U.

Suppose φ is (t1 = t2). t1 and t2 are terms, so the ultraproduct model (Π(Mi)i∈I/U) interprets them as I-sequences, i.e. functions from I to U(Mi)i∈I such that t1(i) and t2(i) are both in Mi. We’ll write the denotations of t1 and t2 as [t1] and [t2]. Now, [t1] = [t2] iff { i ∈ I | t1 = t2 } ∈ U iff { i ∈ I | Mi ⊨ (t1 = t2) } ∈ U, which is what we want.

Inductive step: φ is ¬ψ, ψ∧θ, or ∃x ψ

Assume that Los’s theorem holds for ψ and θ. Now we must show that it holds for φ

Suppose φ is ¬ψ.

Then Π(Mi)i∈I/U ⊨ φ
iff Π(Mi)i∈I/U ⊭ ψ
iff { i ∈ I | Mi ⊨ ψ } ∉ U (by the inductive hypothesis)
iff { i ∈ I | Mi ⊨ ψ }c ∈ U (by the ultra property of U)
iff { i ∈ I | Mi ⊭ ψ } ∈ U

iff { i ∈ I | Mi ⊨ ¬ψ } ∈ U
iff { i ∈ I | Mi ⊨ φ } ∈ U

Suppose φ is ψ∧θ.

Then Π(Mi)i∈I/U ⊨ φ
iff Π(Mi)i∈I/U ⊨ ψ and Π(Mi)i∈I/U ⊨ θ
iff { i ∈ I | Mi ⊨ ψ } ∈ U and { i ∈ I | Mi ⊨ θ } ∈ U (by the inductive hypothesis)
iff { i ∈ I | Mi ⊨ ψ } ⋂ { i ∈ I | Mi ⊨ θ } ∈ U (by closure-under- of U)
iff { i ∈ I | Mi ⊨ ψ and Mi ⊨ θ } ∈ U
iff { i ∈ I | Mi ⊨ ψ∧θ } ∈ U
iff { i ∈ I | Mi ⊨ φ } ∈ U

Suppose φ is ∃x ψ.

Then Π(Mi)i∈I/U ⊨ φ
iff Π(Mi)i∈I/U ⊨ ∃x ψ
iff Π(Mi)i∈I/U ⊨ ψ(a) for some [a] ∈ Π(Mi)i∈I/U
iff { i ∈ I | Mi ⊨ ψ(a(i)) } ∈ U (by the inductive hypothesis)
iff { i ∈ I | Mi ⊨ ∃x ψ(x) } ∈ U

And that completes the proof! We don’t need to consider ∨, →, ↔, or ∀, because these can all be defined in terms of ¬, ∧, and ∃.

Now we know that the first-order properties of ultraproducts are tied closely to those of their component structures. The ultraproduct of any collection of two-element structures is itself a two-element structure. Same with the ultraproduct of any collection of structures, cofinitely many of which are two-element structures!

The ultraproduct of any collection of PA models is itself a PA model. The ultraproduct of any collection of groups is itself a group. But the ultraproduct of all finite groups need not itself be finite, because “I am finite” isn’t first-order expressible.

And in particular, an ultrapower of a structure M perfectly mimics ALL of the first-order properties of M!

Łoś’s theorem is an incredibly powerful tool we can wield to illuminate the strange structure of the hypernatural numbers. We’re now positioned to discover nonstandard prime numbers, infinitely even numbers, numbers that are divisible by every standard natural number, and infinitely large prime gaps. All of this (and more) in the next post!

Next: Weird nonstandard numbers

Hypernaturals in all their glory (Ultra Series 3)

Previous: Hypernaturals simplified

What is an ultrafilter? (with pretty pictures)

To define an ultrafilter we need to first define a filter. Here’s a pretty good initial intuition for what a filter is: a filter on a set X is a criterion for deciding which subsets of X are “large”. In other words, a filter provides us one way of conceptualizing the idea of large and small subsets, and it allows us to do so in a way that gives us more resolution than the cardinality approach (namely, assess size of sets just in terms of their cardinality). For example, in a countably infinite set X, the cofinite subsets of X (those that contain all but finitely many elements of X) have the same cardinality as the subsets of X that are infinite but not cofinite. But there’s some intuitive sense in which a set that contains all but finitely many things is larger than a set that leaves out infinitely many things. Filters allow us to capture this distinction.

Alright, so given a set X, a filter F on X is a collection of subsets of X (i.e. it’s a subset of 𝒫(X)) that satisfies the following four conditions:

(i) X ∈ F
(ii) ∅ ∉ F
(iii) If A ⊆ B and A ∈ F, then B ∈ F
(iv) If A ∈ F and B ∈ F, then A ⋂ B ∈ F

In other words, a filter on X is a set of subsets of X that contains X, doesn’t contain the empty set, and is closed under supersets and intersection. Note that a filter is also closed under union, because of (iii) (the union of A and B is a superset of A).

An ultrafilter is a filter with one more constraint, namely that for any subset of X, either that subset or its complement is in the filter.

(v) For any A ⊆ X, either A ∈ F or (X\A) ∈ F.

There’s a nice way to visualize filters and ultrafilters that uses the Hasse diagram of the power set of X. For a concrete example, let X = {a, b}. We can draw the power-set of X as follows:

We draw an arrow from A to B when A is a subset of B. Now, what are the possible filters on X? There are three, see if you can find them all before reading on.

Only two of these are ultrafilters. Which two?

Remember that for an ultrafilter U, every subset or its complement is in U. So an ultrafilter always contains half of all subsets. This gives an easy way to rule out the first one.

Another example: let X = {a, b, c}. Then the power-set of X looks like:

Note that we’ve left out some arrows, like the arrow from {a} to {a,b,c}. This is okay, because transitivity of the subset relation makes this arrow redundant. Anyway, what are some filters on X? Here are three of them:

Only one of these is an ultrafilter! You should be able to identify it pretty easily. See if you can pick out the other four filters, and identify which of them are ultrafilters (there should be two). And another exercise: why is the following not a filter?

Does it have any extension that’s an ultrafilter?

One thing to notice is that in all of these examples, when something is in the filter then everything it points to is also in the filter. This corresponds to ultrafilters being closed under supersets. Also, for any two things in the filter, their meet (their greatest lower bound; the highest set on the diagram that points to both of them) is also is the filter. This corresponds to closure under intersections.

Imagine that there is a stream flowing up the Hasse diagram through all the various paths represented by arrows. Choose any point on the diagram and imagine dripping green dye into the water at that point. The green color filters up through the diagram until it reaches the top. And everything that’s colored green is in the filter! This captures the idea that filters are closed under superset, but what about intersection? If X is finite, this corresponds to the dye all coming from a single source, rather than it being dripped in at multiple distinct points. The infinite case is a little trickier, as we’ll see shortly.

One other important thing to notice is that whenever we had an ultrafilter, it always contained a singleton. An ultrafilter that contains a singleton is called a principal ultrafilter, and an ultrafilter that doesn’t contain any singletons is called a free ultrafilter. So far we haven’t seen any free ultrafilters, and in fact as long as X is finite, any ultrafilter on X will be principal. (Prove this!) But the situation changes when X is an infinite set.

The Hasse diagram for an infinite set is a bit harder to visualize, since now we have uncountably many subsets. But let’s try anyway! What does the Hasse diagram of ℕ look like? Well, we know that ∅ is at the bottom and ℕ is at the top, so let’s start there.

Next we can draw all the singleton sets. ∅ points at all of these, so we’re not going to bother drawing each individual arrow.

Next we have all the pair sets, and then the triples. Each singleton points at infinitely many pairs, and each pair points at infinitely many triples.

And so on through all finite cardinalities.

Now what? We’ve only exhausted all the finite sets. We can now start from the top with the cofinite sets, those that are missing only finitely many things. First we have the sets that contain all but a single natural number:

Then the sets containing all but a pair of naturals, and so on through all the cofinite sets.

But we’re not done yet. We haven’t exhausted all of the subsets of ℕ; for instance the set of even numbers is neither finite nor cofinite. In fact, there are only countably many finite and cofinite sets, but there are uncountably many subsets of ℕ, so there must be a thick intermediate section of infinite sets that are not cofinite (i.e. infinite sets with infinite complements).

A sanity check that this diagram makes sense: start with a finite set and then add elements until you have a cofinite set. Between the finite set and the cofinite set there’s always an intermediate set that’s infinite but not cofinite. This matches with our image: any path from the finite to the cofinite passes through the middle section.

Now, what would a filter on the naturals look like on this diagram? If our filter is principal, then we can still roughly sketch it the same way as before:

How about an ultrafilter? Depends on whether it’s principal or free. Any principal ultrafilter must look like the third image above; it must start at the “finite” section and filter upwards (remember that principal means that it contains a singleton).

Any principal ultrafilter on ℕ can be written as { A ⊆ ℕ | n ∈ A } for some n ∈ ℕ.

What about free ultrafilters? A free ultrafilter contains no singletons. This implies that it contains no finite set. See if you can come up with a proof, and only then read on to see mine.

Suppose that U is a free ultrafilter on X and contains some finite set F. U is free, so it contains no singletons. So for every a ∈ F, the singleton {a} ∉ U. By ultra, X\{a} ∈ U. By closure-under-finite-intersection, the intersection of {X\{a} | a ∈ F} is in U. So X\F ∈ U. But now we have F ∈ U and X\F ∈ U, and their intersection is ∅. So ∅ ∈ U, contradicting filter.

So a free ultrafilter must contain no finite sets, meaning that it contains all the cofinite sets. Since it’s ultra, it also contains “half” of all the intermediate sets. So visually it’ll look something like:

That’s what a free ultrafilter on the naturals would look like if such a thing existed. But how do we know that any such object actually does exist? This is not so trivial, and in fact the proof of existence uses the axiom of choice. Here’s a short proof using Zorn’s Lemma (which is equivalent to choice in ZF).

Let F be any filter on X. Consider the set Ω of all filters on X that extend F. (Ω, ⊆) is a partially ordered set, and for any nonempty chain of filters C ⊆ Ω, the union of C is itself a filter on X. (Prove this!) The union of C is also an upper bound on C, meaning that every nonempty chain of filters has an upper bound. Now we apply Zorn’s Lemma to conclude that there’s a maximal filter U in Ω. Maximality of U means that U is not a subset of V for any V ∈ Ω.

Almost done! U is maximal, but is it an ultrafilter? Suppose not. Then there’s some A in X such that A ∉ U and (X\A) ∉ U. Simply extend U by adding in A and all supersets and intersections. This is a filter that extends F and contains U, contradicting maximality. So U is an ultrafilter on X!

Now, F was a totally arbitrary filter. So we’ve shown that every filter on X has an ultrafilter extension. Now let X be infinite and take the filter on X consisting of all cofinite subsets of X (this is called the Fréchet filter). Any ultrafilter extension of the Fréchet filter also contains all cofinite subsets of X, and thus contains no singletons. So it’s free! Thus any infinite set has a free ultrafilter.

Hypernatural numbers

Still with me? Good! Then you’re ready for the full definition of the hypernatural numbers, using ultrafilters. Take any free ultrafilter U on ℕ. U contains all cofinite sets and no finite sets, and is also decisive on all the intermediate sets. If you remember from the last post, this makes U a perfect fit for our desired “decisiveness criterion”.

Now consider the set of all countable sequences of natural numbers. Define the equivalence relation ~ on this set as follows:

(a1, a2, a3, …) ~ (b1, b2, b3, …) iff { k ∈ ℕ | ak = bk } ∈ U

Note the resemblance to our definition last post:

(a1, a2, a3, …) ~ (b1, b2, b3, …) iff { k ∈ ℕ | ak = bk } is cofinite

This previous definition corresponded to using the Fréchet filter for our criterion. But since it was not an ultrafilter, it didn’t suffice. Now, with an ultrafilter in hand, we get decisiveness!

Addition and multiplication on the hypernaturals is defined very easily:

[a1, a2, a3, …] + [b1, b2, b3, …] = [a1+b1, a2+b2, a3+b3, …]
[a1, a2, a3, …] ⋅ [b1, b2, b3, …] = [a1⋅b1, a2⋅b2, a3⋅b3, …]

Let’s now define < on the hypernaturals.

(a1, a2, a3, …) < (b1, b2, b3, …) if { k ∈ ℕ | ak = bk } ∈ U

The proof of transitivity in the previous post still works here. Now let’s prove that < is a total order.

Consider the following three sets:

X = { k ∈ ℕ | ak < bk }
Y = { k ∈ ℕ | ak > bk }
Z = { k ∈ ℕ | ak = bk }

The intersection of any pair of these sets is empty, meaning that at most one of them is in U. Could none of them be in U? Suppose X, Y, and Z are not in U. Then ℕ\X and ℕ\Y are in U. So (ℕ\X) ⋃ (ℕ\Y) is in U as well. But (ℕ\X) ⋃ (ℕ\Y) = Z! So Z is in U, contradicting our assumption.

So exactly one of these three sets is in U, meaning that a < b or b < a or a = b. This proves that using an ultrafilter really has fixed the problem we ran into previously. This problem was that the hypernaturals were quite different from the naturals in undesirable ways (like < not being a total order). The natural question to ask now is “Just how similar are the hypernaturals to the naturals?”

The answer is remarkable. It turns out that there are no first-order expressible differences between the naturals and the hypernaturals! Any first-order sentence that holds true of the natural numbers also holds true of the hypernatural numbers! This result is actually just one special case of an incredibly general result called Łoś’s theorem. And in the next post we are going to prove it!

Next up: Łoś’s theorem and ultraproducts!

Hypernaturals Simplified (Ultra Series 2)

Previous: Introduction

What is a hypernatural number? It is a collection of infinitely long sequences of natural numbers. More precisely, it is an equivalence class of these infinite sequences.

An equivalence class under what equivalence relation? This is a little tricky to describe.

I’ll start with a slight lie to simplify the story. When we see the trouble that results from our simple definition, I will reveal the true nature of the equivalence relation that gives the hypernaturals. In the process you’ll see how the notion of an ultrafilter naturally arises.

So, hypernaturals are all about infinite sequences of natural numbers. Some examples of these:

(0,1,2,3,4,5,6,7,8,9,…)
(0,1,0,2,0,3,0,4,0,5,…)
(1,2,1,2,1,2,1,2,1,2,…)
(0,2,4,6,8,10,12,14,…)
(3,1,4,1,5,9,2,6,5,3,…)

We’ll define an equivalence relation ~ between sequences as follows:

Let x and y be infinite sequences of natural numbers.
Then x ~ y iff x and y agree in all but finitely many places.

For example, (0,1,2,3,4,5,6,…) ~ (19,1,2,3,4,5,6,…), because these two sequences only disagree at one spot (the zero index).

(1,1,2,2,4,4,…) and (1,2,4,8,…) are not equivalent, because these sequences disagree at infinitely many indices (every index besides the zeroth index).

Same with (0,1,2,3,4,5,6,…) and (1,2,3,4,5,6,7,…); even though they look similar, these sequences disagree everywhere.

(2,4,6,8,10,12,14,…) and (2,0,6,0,10,0,14,…) are not equivalent, because these sequences disagree at infinitely many indices (every odd index).

One can easily check that ~ is an equivalence relation, and thus it partitions the set of sequences of naturals into equivalence classes. We’ll denote the equivalence class of the sequence (a1, a2, a3, …) as [a1, a2, a3, …]. These equivalence classes are (our first stab at the definition of) the hypernaturals!

For instance, the equivalence class of (0,0,0,0,0,…) contains (1,4,2,0,0,0,0,0,…), as well as (0,2,4,19,0,0,0,0,…), and every other sequence that eventually agrees with (0,0,0,0,…) forever. So all of these correspond to the same hypernatural number: [0,0,0,0,…]. This object is our first hypernatural number! It is in fact the hypernatural number that corresponds exactly to the ordinary natural number 0. In other words 0 = [0,0,0,0,…].

[1,1,1,1,…] is a distinct equivalence class from [0,0,0,0,…]. After all, the sequences (0,0,0,0,…) and (1,1,1,1,…) disagree everywhere. You might guess that [1,1,1,1,…] is the hypernatural analogue to the natural number 1, and you’d be right!

For any standard natural number N, the corresponding hypernatural number is [N,N,N,N,N,…], the equivalence class of the sequence consisting entirely of Ns.

Now consider the hypernatural [0,1,2,3,4,5,6,…]. Let’s call it K. Does K = N for any standard natural number N? In other words, is (0,1,2,3,4,5,6,…) ~ (N,N,N,N,N,…) true for any finite N? No! Whatever N you choose, it will only agree with (0,1,2,3,4,5,6,…) at one location. We need cofinite agreement, and here we have merely finite agreement. Not good enough! This means that K is our first nonstandard natural number!

How does K relate to the standard naturals in terms of order? We haven’t talked about how to define < on the hypernaturals yet, but it’s much the same as our definition of =.

[a1, a2, a3, …] = [b1, b2, b3, …]
iff
{ k∈ℕ | ak = bk } is cofinite

[a1, a2, a3, …] < [b1, b2, b3, …]
iff
{ k∈ℕ | ak < bk } is cofinite

Exercise: Verify that this is in fact a well-defined relation. Every equivalence class has many different possible representatives; why does the choice of representatives not matter for the purpose of describing order?

Now we can see that K > N for every standard N. Look at (0,1,2,3,4,5,…) and (N,N,N,N,…). The elements of the first sequence are only less than the elements of the second sequence at the first N indices. Then the elements of K are greater than the elements of N forever. So elements of K’s representative sequence are greater than elements of N’s representative sequence in a cofinite set of indices. Thus, K > N for every standard N. So K is an infinitely large number!

Here’s another one: K’ = [0,2,4,6,8,…]. You can see that K’ > K, because the elements of K’ are greater than those of K at all but one index (the first one). So we have another, bigger, infinite number.

Addition and multiplication are defined elementwise, so

K + K
= [0,1,2,3,4,…] + [0,1,2,3,4,…]
= [0+0, 1+1, 2+2, 3+3, 4+4, …]
= [0,2,4,6,8,…]
= K’

K’
= [0,2,4,6,8,…]
= [2⋅0, 2⋅1, 2⋅2, 2⋅3, 2⋅4, …]
= 2⋅[0,1,2,3,4,…]
= 2⋅K

Predictably, we get many many infinities. In fact, there are continuum many nonstandard hypernatural numbers!

Proof: we construct an injection f from ℝ to *ℕ. If x is a real number, then f(x) := [floor(x), floor(10x), floor(100x), floor(1000x), …]. For example, f(35.23957…) = [35,352,3523,35239,352395, …]. For any two distinct reals x and y, the sequences x and y will eventually disagree forever. So each real is mapped to a distinct hypernatural, meaning that there are no more reals than hypernaturals. At the same time, there are no more hypernaturals than reals, because there are only continuum many countable sequences of natural numbers. So |*ℕ| = |ℝ|.

It turns out that every nonstandard hypernatural number is also larger than every standard natural number. We’ll see why in a bit, but it’ll take a bit of subtlety that I’ve yet to introduce.

Now, > is transitive in ℕ. Is it also transitive in *ℕ? Yes! Suppose A > B and B > C. Choose any representative sequences (a1, a2, a3, …), (b1, b2, b3, …), and (c1, c2, c3, …) for A, B, and C. Then X = { k∈ℕ | ak > bk } and Y = { k∈ℕ | bk > ck } are both cofinite. The intersection of cofinite sets is also cofinite, meaning that X⋂Y = { k∈ℕ | ak > bk and bk > ck } = { k∈ℕ | ak > ck } is cofinite. So A > C!

It’s a good sign that > is transitive. But unfortunately, the story I’ve told you thus far starts to break down here. The greater-than relation is a total order on the natural numbers. For any naturals a and b, exactly one of the following is true: a = b, a > b, b > a. But this is not true of the hypernaturals!

Consider the two hypernatural numbers n = [0,1,0,1,0,1,…] and m = [1,0,1,0,1,0,…]. Are n and m equal? Clearly not; they disagree everywhere. So n ≠ m.

Is n > m? No. The set of indices where n’s sequence is greater than m’s sequence is {1, 3, 5, 7, …}, which is not cofinite.

So is m > n? No! The set of indices where m’s sequence is greater than n’s sequence is {0, 2, 4, 6, …}, which is also not cofinite!

So as we’ve currently defined the hypernatural numbers, the > relation is not a total relation on them. This might be fine for some purposes, but we’ll be interested in defining the hypernaturals to mirror the properties of the naturals as closely as possible. So we’ll have to tweak our definition of the hypernaturals. The tweak will occur way back at the start where we defined our equivalence relation on sequences of naturals.

Recall: we said that two sequences (a1, a2, a3, …) and (b1, b2, b3, …), are equivalent if they agree in all but finitely many places. Said another way: a ~ b if { k∈ℕ | ak = bk } is cofinite. We defined > similarly: a > b if the agreement set for > is cofinite.

The problem with this definition was that it wasn’t definitive enough. There are cases where the agreement set is neither cofinite nor finite. (Like in our previous example, where the agreement set was the evens.) In such cases, our system gives us no direction as to whether a > b or b > a. We need a criterion that still deals with all the cofinite and finite cases appropriately, but also gives us a definitive answer in every other case. In other words, for ANY possible set X of indices of agreement, either X or X’s complement must be considered “large enough” to decide in its favor.

For example, maybe we say that if { k∈ℕ | ak = bk } = {0,2,4,6,8,…}, then a > b. Now our criterion for whether a > b is: the set of indices for which ak = bk is either cofinite OR it’s the evens. This implies that [1,0,1,0,1,0,…] > [0,1,0,1,0,1,…].

Once we’ve made this choice, consistency forces us to also accept other sets besides the evens as decisive. For instance, now compare (0,0,1,0,1,0,…) and (0,1,0,1,0,1,…). The set of indices where the first is greater than the second is {2,4,6,8,…}. But notice that the first differs only cofinitely from (1,0,1,0,…), meaning that [0,0,1,0,1,0,…] = [1,0,1,0,1,0,…]. The conclusion is that [0,0,1,0,1,0,…] > [0,1,0,1,0,1,…], which says that the set of indices {2,4,6,8,…} must also be decisive. And in general, once we’ve accepted the evens as a decisive set of indices, we must also accept the evens minus any finite set.

The three criterion we’ve seen as desirable for what sets of indices will count as decisive are (1) includes all cofinite sets, (2) for any set X, either X or X’s complement is decisive, and (3) consistency. These requirements turn out to correspond perfectly to a type of mathematical object called a free ultra-filter!

In the next post, we will define ultrafilters and finalize our definition of the hypernatural numbers!

Ultrafilters, Ultraproducts, and Hypernaturals 1: Introduction

This is the series I wish would have existed a month ago when I first started learning about ultrafilters and ultraproducts. First of all, what’s the motivation for learning about these ultra-objects? I imagine that if you’re here, you probably already have some degree of interest in ultrafilters, ultraproducts, or nonstandard models of arithmetic. But I’ll see if I can bolster that interest a little bit.

The most exciting application to me personally is that ultraproducts give you a recipe for constructing new mathematical structures out of familiar ones. Out of some model M one can construct a a new model *M, which typically has a much richer and stranger structure, but is nonetheless elementarily equivalent to M (this is called Łoś’s theorem). Elementary equivalence means that all the expressive resources of first-order logic are insufficient to tell M and *M apart: they agree on all first-order sentences.

For example, the ultraproduct of ℝ (the real numbers) is *ℝ, the hyperreal numbers. The hyperreals contain an enormous supply of infinitesimal quantities clustered around every real, as well as infinitely large quantities. And all the usual operations on ℝ, like addition and multiplication, are already nicely defined in *ℝ, meaning that these infinitesimals and infinities have a well-defined algebraic structure. And Łoś’s theorem tells us that ℝ and *ℝ are elementary equivalent: any first-order sentence true of the reals is also true of the hyperreals.

Another example: the ultraproduct of ℕ (the natural numbers) is *ℕ, the hypernaturals. The hypernaturals don’t contain any infinitesimals, but they do contain an uncountable infinity of infinite numbers. And since *N and N are elementarily equivalent, *ℕ is a model of true arithmetic! This is super exciting to me. True arithmetic is this unimaginably complicated set of sentences; there’s no Turing machine that decides which sentences reside in true arithmetic, nor is there a Turing machine with a halting oracle that does the job, nor even a Turing machine with a halting oracle for Turing machines with halting oracles! The computational complexity of true arithmetic is the limit of this sequence of progressively harder halting problems: you can only decide the set if you have an oracle for the halting problem, plus an oracle for the halting problem for TMs with oracles for the halting problem, plus an oracle for the halting problem for TMs with oracles for the halting problem for TMs with oracles for the halting problem, and so on. This level of complexity is known as 0(ω).

So true arithmetic is this impossibly uncomputable set of sentences. We know trivially that ℕ is a model of true arithmetic, because that’s how true arithmetic is defined { φ | ℕ ⊨ φ }. And we also know that true arithmetic has nonstandard models, because it’s a first-order theory and no first order theory can categorically define ℕ (I think the easiest way to see why this is true is to apply the Löwenheim-Skolem theorem, but – self-promotion alert – there’s also this very nice and simple proof from first principles). And ultraproducts allow us to construct an explicit example of one of these nonstandard models! You can actually write down some of these nonstandard numbers (they look like infinite sequences of natural numbers) and discover their strange properties. For example, (2, 3, 5, 7, 11, …) is a nonstandard prime number, (1, 2, 4, 8, 16, …) is infinitely even, and (0!, 1!, 2!, 3!, 4!, …) is a multiple of every standard natural number. We’ll dive into all of this soon enough.

Ultrafilters and ultraproducts also have applications outside of logic. A fairly basic result about ultrafilters (that every ultrafilter over a finite set contains a singleton) is equivalent to Arrow’s impossibility theorem in voting theory (that any voting system with unanimity, independence of irrelevant alternatives (IIR), and finitely many voters contains a dictator). And the existence of a singleton-free ultrafilter over an infinite set (a free ultrafilter) shows that with infinitely many voters, there’s a non-dictatorial voting system with unanimity and IIR. There’s a pretty good description of those results here, but finish reading my post first!

Nick Bostrom in an old paper titled Infinite Ethics describes how to apply ultrafilters to resolve some of the issues that arise when trying to apply utilitarian ethics in a universe with infinitely many experiencers. Suppose that the current universe contains an infinite number of experiencers, each of whom is having an experience with a valence of +1. By pressing a button, you can immediately transition to a universe where everybody is now experiencing a valence of +2. Clearly pressing the button is a utilitarian good. But 1+1+1+… = ∞ = 2+2+2+…. So it looks like a standard account of utility as real numbers says that the utility of the +2 world is the same as the utility of the +1 world (both just ∞). But if we treat utilities as hyperreal numbers instead of ordinary real numbers, then we can do the comparison and get the right answer. It’s worth noting that this approach doesn’t fix all the problems raised by an infinite universe; for instance, if pressing the button only increases a finite number of experiencers from valence +1 to valence +2, then the net utility of the universe is unchanged, even with hyperreal utilities. (This corresponds to a property that we’ll see soon, which is that a free ultrafilter contains no finite sets.)

Okay, introduction over. Hopefully you’re now excitedly asking yourself the question “So what are these ultrafilters and ultraproducts anyway??” The story begins in the next post!

Hypernaturals Simplified

Even Numbers are Tautologies

To each atomic proposition P assign a natural number p. If that natural number is even, then its corresponding proposition is true. If the number is odd then the proposition is false.

If you have two numbers n and m and you multiply them, the result is even so long as either n or m is even. So multiplication is like or; P∨Q corresponds to pq.

Negation is like adding 1: even becomes odd and odd becomes even. So ¬P corresponds to p+1.

Consider addition: p+q is even if p and q are both even or both odd. So p+q is like the biconditional ↔.

Other connectives can be formed out of these. Take P∧Q: P∧Q is equivalent to ¬(¬P∨¬Q), which is (p+1)(q+1) + 1. So P∧Q corresponds to pq+p+q+2. P→Q is ¬P∨Q, which is (p+1)q.

The logical constants ⊤ and ⊥ correspond to numerical constants: ⊤ can be assigned 0 and ⊥ assigned 1, for instance.

Tautologies translate to algebraic expressions which are always even. For instance, P→P translates to (p+1)p, which is always even. P→(Q→P) translates to (p+1)(q+1)p, which is also always even. P∨¬P translates to p(p+1), always even.

Contradictions translate to algebraic expressions which are always odd. P∧¬P translates to (p+1)(p+2) + 1, which is always odd. And so on.

Inference rules can be understood as saying: if all the premises are even, then the conclusion will be even. Take conjunction elimination: P∧Q ⊨ P. This says that if (p+1)(q+1) + 1 is even then p is even, which ends up being right if you work it out. Modus ponens: P, P→Q ⊨ Q. This says that if p and (p+1)q are both even, then q is even. Again works out!

You can also work the other way: For any number p, p(p+1) is even. Translating into propositional logic, this says that P∨¬P is a tautology. We’ve proved the law of the excluded middle! It’s interesting to note that earlier we saw that P→P translates to (p+1)p. So in some sense the law of the excluded middle is just self-implication but with the two products reversed!