Forcing and the Independence of CH (Part 2)

Part 1 here

Part 2: How big is 𝒫(ω)?

Now the pieces are all in place to start applying forcing to prove some big results. Everything that follows assumes the existence of a countable transitive model M of ZFC.

First, a few notes on terminology.

The language of ZFC is very minimalistic. All it has on top of the first-order-logic connectives is a single binary relation symbol ∈. Nonetheless, people will usually talk about theorems of ZFC as if it has a much more elaborate syntax, involving symbols like ∅ and ω and 1 and terms like “bijection” and “ordinal”. For instance, the Continuum Hypothesis can be written as “There’s a bijection between 𝒫(ω) and 1“, which is obviously using much more than just ∈. But these terms are all simply shorthand for complicated phrases in the primitive language of ZFC: for instance a sentence “φ(∅)” involving ∅ could be translated as “∃x (∀y ¬(y ∈ x) ∧ φ(x))”, or in other words “there’s some set x that is empty and φ(x) is true”.

Something interesting goes on when considering symbols like ∅ and ω and 1 that act like proper names. Each name picks out a unique set, but for some of these names, the set that gets picked out differs in different models of ZFC. For example, the interpretation of the name “1” is model-relative; it’s meant to be the first uncountable cardinal, but there are countable models of ZFC in which it’s actually only countably large! If this makes your head spin, it did the same for Skolem: you can find more reading here.

On the other hand, the name “∅” always picks out the same set. In every model of ZFC, ∅ is the unique set that contains nothing at all. When a name’s interpretation isn’t model-relative, it’s called absolute. Examples include “∅” and “1,729”. When a name isn’t absolute, then we need to take care to distinguish between the name itself as a syntactic object, and the set which it refers to in a particular model of ZFC. So we’ll write “1” when referring to the syntactic description of the first uncountable cardinal, and 1M when referring to the actual set that is picked out by this description in the model M.

With that said, let’s talk about a few names that we’ll be using throughout this post:

In M, ω is the conventional name for the set that matches the description “the intersection of all inductive sets”, where inductive sets are those that contain the ordinal 0 and are closed under successor. In any transitive model of ZFC like M, ω is exactly ℕ, the set of natural numbers. Since we’re restricting ourselves to only considering transitive models, we can treat ω as if it’s unambiguous; its interpretation won’t actually vary across the models we’re interested in.

ω1 corresponds to the description “the smallest uncountable ordinal”. This description happens to perfectly coincide with the description “the first uncountable cardinal” in ZFC, which has the name 1. But unlike with ω, different transitive models pick out different sets for ω1 and 1. So in a model M, we’ll write ω1M for the set that M believes to be the smallest uncountable ordinal and 1M for the first uncountable cardinal.

ω2 is the name for the first ordinal for which there’s no injection into ω1 (i.e. the first ordinal that’s larger than ω1). Again, this is not absolute: ω2M depends on M (although again, ω2M is the same as 2M no matter what M is). And in a countable model like those we’ll be working with, ω2M is countable. This will turn out to be very important!

𝒫(ω) is the name for the the set that matches the description “the set of all subsets of ω”. Perhaps predictably at this point, this is not absolute either. So we’ll have to write 𝒫(ω)M when referring to M’s version of 𝒫(ω).

Finally, ZFC can prove that there’s a bijection from 𝒫(ω) to , meaning that this bijection exists in every model. Thus anything we prove about the size of 𝒫(ω) can be carried over to a statement about the size of . Model-theoretically, we can say that in every model M, |𝒫(ω)M| = |M|. It will turn out to be more natural to prove things about 𝒫(ω) than .

Okay, we’re ready to make the continuum hypothesis false! We’ll do this by choosing a partially ordered set (P, ≤) whose extension as a Boolean algebra (B, ≤) has the property that no matter what M-generic filter G in B you choose, M[G] will interpret its existence as proof that |𝒫(ω)| ≥ ℵ2.

Making |𝒫(ω)| ≥ 2

Let P = { f ∈ M | f is a finite partial function from ω2M × ω to {0,1} }. Let’s have some examples of elements of P. The simplest is the empty function ∅. More complicated is the function {((ω+1, 13), 1)}. This function’s domain has just one element, the ordered pair (ω+1, 13), and this element is mapped to 1. The function {((14, 2), 0), ((ω1M•2, 2), 1)} is defined on two elements: (14, 2) and (ω1M•2, 2). And so on.

We’ll order P by reverse inclusion: f ≤ g iff f ⊇ g. Intuitively, f ≤ g if f is a function extension of g: f is defined everywhere that g is and they agree in those places. For instance, say f = {((ω, 12), 0)}, g = {((ω, 12), 0), ((13, 5), 1)}, h = {((ω, 12), 1)}, and p = {((ω, 11), 1)}. Check your understanding by verifying that (1) g ≤ f, (2) f and h are incomparable and have no common lower bound, and (3) f and p are incomparable but do have a common lower bound (what is it?).

The empty function ∅ is a subset of every function in P, meaning that ∅ is bigger than all functions. Thus ∅ is the top element of this partial order. And every f in P is finite, allowing a nice visualization what the partial order (P, ≤) looks like:

Now, G will be an M-generic ultrafilter in the Boolean extension (B, ≤) of (P, ≤). G is the set that we’re ultimately adding onto M, so we want to know some of its properties. In particular, how will we use G to show that |𝒫(ω)| = ℵ2?

What we’re going to do is construct a new set out of G as follows: first take the intersection of P with G. (remember that G is an ultrafilter in B, so it doesn’t only contain elements of P). P⋂G will be some set of finite partial functions from ω2M × ω to {0,1}. We’ll take the union of all these functions, and call the resulting set F. What we’ll prove of F is the following:

(1) F := ⋃(P⋂G) is a total function from ω2M × ω to {0,1}
(2) For every α, β ∈ ω2M, F(α, •) is a distinct function from F(β, •).

A word on the second clause: F takes as input two things: an element of ω2M and an element of ω. If we only give F its first input, then it becomes a function from ω to {0,1}. For α ∈ ω2M, we’ll give the name Fα to the function we get by feeding α to F. Our second clause says that all of these functions are pairwise distinct.

Now, the crucial insight is that each Fα corresponds to some subset of ω, namely {n ω | Fα(n) = 1}. So F defines |ω2M|-many distinct subsets of ω. So in M[G], it comes out as true that |𝒫(ω)M| ≥ |ω2M|. It’s also true in M[G] that |ω2M| = 2M, and so we get that |𝒫(ω)M| ≠ 1M. This is ¬CH!

Well… almost. There’s one final subtlety: |𝒫(ω)M| ≠ 1M is what ¬CH looks like in the model M. For ¬CH to be true in M[G], it must be that that |𝒫(ω)M[G]| ≠ ℵ1M[G]. So what if |𝒫(ω)M| ≠ |𝒫(ω)M[G]|, or 1M ≠ 1M[G]? This would throw a wrench into our proof: it would mean that M[G] believes that M’s version of 𝒫(ω) is bigger than M’s version of 1, but M[G] might not believe that its own version of 𝒫(ω) is bigger than its version of 1. This is the subject of cardinal collapse, which I will not be going into. However, M[G] does in fact believe that 𝒫(ω)M = 𝒫(ω)M[G] and that 1M1M[G].

Alright, so now all we need to do to show that M[G] believes ¬CH is to prove (1) that F is a total function from ω2M × ω to {0,1}, and (2) that for every α, β ∈ ω2M, Fα is distinct from Fβ. We do this in three steps:

  1. F is a function.
  2. F is total.
  3. For every α, β ∈ ω2M, Fα is distinct from Fβ.

Alright so how do we know that F is a function? Remember that F = ⋃(P⋂G). What if some of the partial functions in P⋂G are incompatible with each other? In this case, their union cannot be a function. So to prove step 1 we need to prove that all the functions in P⋂G are compatible. This follows easily from two facts: that G is an ultrafilter in B and P is dense in B. The argument: G is an ultrafilter, so for any f, g ∈ P there’s some element f∧g ∈ B that’s below both f and g. All we know about f∧g is that it’s an element of B, but we have no guarantee that it’s also a member of P, our set of partial functions. In other words, we can’t say for sure that f∧g is actually a partial function from ω2M × ω to {0,1}. But now we use the fact that P is dense in B! By the definition of density, since f∧g is an element of B, P must contain some h ≤ f∧g. Now by transitivity, h ≤ f and h ≤ g. So f and g have a common function extension, meaning they must be compatible functions! Pretty magical right?

So F is a function. But how do we know that it’s total? We prove this by looking closely at the dense subsets of P. In particular, for any α ∈ ω2M and any n ∈ ω, define Dα,n := {f ∈ P | (α, n) ∈ dom(f)}. This is a dense subset of P. Why? Well, regardless of what α and n are, any function f in P is either already defined on (α, n), in which case we’re done, or it’s not, in which case f has a function extension with (α, n) in its domain. So from any element of P, you can follow the order downwards until you find a function with (α, n) in its domain. Since Dα,n is dense in P and G is M-generic, G must have some element in common with Dα,n. Thus G contains an element f of P that has (α, n) in its domain. This f is one of the functions that we union over to get F, so F must have (α, n) in its domain as well! And since α and n were totally arbitrary, F must be total.

Finally, why are the “Fα”s pairwise distinct? Again we construct dense subsets for our purposes: for any α, β ∈ ω2M, define Dα,β := {f ∈ P | fα ≠ fβ}. This is clearly dense (you can always extend any f in P to make fα and fβ disagree somewhere). So G contains an element f of P for which fα ≠ fβ. And thus it must also be true of F that Fα ≠ Fβ!

And we’re done! We’ve shown that once we add G to M, we can construct a new set F = ⋃(P⋂G) such that F encodes |ω2M|-many distinct subsets of ω, and thus that M[G] ⊨ |𝒫(ω)| ≥ ℵ2.

Making |𝒫(ω)| ≥ ℵ420

What’s great is that this argument barely relied on the “2” in ω2M. We could just as easily have started with P = {f ∈ M | f is a finite partial function from ω420M × ω to {0,1}}, constructed an M-generic filter G in the Boolean extension of P, then defined F to be ⋃(P⋂G).

G is still an ultrafilter in B and P is still dense in B, so F is still a function.

Dα,n := {f ∈ P | (α, n) ∈ dom(f)} is still a dense subset of P for any α ∈ ω420M and any n ∈ ω, so F is still total.

And Dα,β := {f ∈ P | fα ≠ fβ} is still a dense subset of P for any α, β ∈ ω420M, so all the “Fα”s are pairwise distinct.

And now F encodes |ω420M|-many distinct subsets of ω! The final step, which I’m going to skip over again, is showing that M[G] believes that |ω420M| = |ω420M[G]|, i.e. that cardinal collapse doesn’t occur.

And there we have it: this choice of P gives us a new model M[G] of ZFC that believes that |𝒫(ω)| ≥ ℵ420! Note how easy and quick this argument is now that we’ve gone through the argument for how to make M[G] believe |𝒫(ω)| ≥ ℵ2. This is a great thing about forcing: once you really understand one application, other applications become immensely simpler and easier to understand.

Making |𝒫(ω)| = ℵ1

Okay, we’ve made CH false. Now let’s make it true! This time our choice for (P, ≤) will be the following:

P = {f ∈ M | f is an M-countable partial function from ω1M to 𝒫(ω)M}. Once more we order by reverse inclusion: f ≤ g iff f ⊇ g.

A crucial thing to notice is that I’ve merely required the functions in P to be M-countable, rather than countable. For a set to be M-countable is for there to exist an injection in M from set to ω. Any M-countable set is countable, but some countable sets may not be M-countable (like M itself!). In particular, we know that ω1M is actually a countable set, meaning that if we required true countability instead of just M-countability, then P would include some total functions! But M has no injection from ω1M to ω, so no function in P can be total. This will be important in a moment!

We get our M-generic ultrafilter G and define F := ⋂(P⋂G). Now we want to show that F is a total and surjective function from ω1M to 𝒫(ω). As before, we proceed in three steps:

  1. F is a function
  2. F is total
  3. F is surjective

Step 1: G is an ultrafilter and P is dense in G, so the same argument works to show that all elements of P⋂G are compatible: (1) ultrafilter implies that any f, g in P⋂G have a least upper bound f∧g in B, (2) density of P in G implies that f∧g has a lower bound h in P, so (3) f and g have a common function extension . Thus F is a function.

Step 2: Dα := {f ∈ P | α ∈ dom(f)} is dense in P for any α ∈ ω1M, so F is total.

Step 3: DA := {f ∈ P | A ∈ image(f)} is dense in P for any A 𝒫(ω)M. Why? No f in P is total, so any f in P can be extended by adding one more point (α, A) where α ∉ dom(f). Thus A ∈ image(F) for any A 𝒫(ω), so F is surjective.

So F is a total surjective function from ω1M to 𝒫(ω)M, meaning that in M[G] it’s true that |𝒫(ω)M| ≤ ℵ1M. And since ZFC proves that |𝒫(ω)| > ℵ0, it follows that M[G] ⊨ |𝒫(ω)| = ℵ1.

You might have noticed that each of these arguments could just as easily have been made if we had started out with defining P as the finite partial functions in M from ω1M to 𝒫(ω)M. This is true! If we had done this, then we still would have ended up proving that F was a total surjective function from ω1M to 𝒫(ω)M, and therefore that |𝒫(ω)M| = ℵ1M. However, this is where cardinal collapse rears its ugly head: the final step is to prove that |𝒫(ω)M| = |𝒫(ω)M[G]| and that 1M = 1M[G]. This requires that P be the countable partial functions rather than just the finite ones.

And with that final caveat, we see that M[G] believes that |𝒫(ω)| = ℵ1.

Making |𝒫(ω)| ≤ ℵ314

Let P = {f ∈ M | f is an M-countable partial function from ω314M to 𝒫(ω)} and order by reverse inclusion: f ≤ g iff f ⊇ g.

⋂(P⋂G) is again a total and surjective function from ω314M to 𝒫(ω), following the exact same arguments as in the previous section. Cardinal collapse doesn’t occur here either, so M[G] believes that |𝒫(ω)| ≤ ℵ314.

✯✯✯

We’ve proven the independence of the Continuum Hypothesis from ZFC! We’ve done more: we’ve also shown how to construct models of ZFC in which we have lots of control over the size of 𝒫(ω). Want a model of ZFC in which |𝒫(ω)| is exactly 1729? Just force with the right P and you’re good to go! There’s something a bit unsatisfactory about all this though, which is that in each application of forcing we’ve had to skim over some details about cardinal collapse that were absolutely essential to the proof going through. I hope to go into these details in a future post to close these last remaining gaps.

Forcing and the Independence of CH (Part 1)

Part 2 here.

Part 1: What is Forcing?

Forcing is a set-theoretic technique developed by Paul Cohen in the 1960s to prove the independence of the Continuum Hypothesis from ZFC. He won a Fields Medal as a result, and to this day it’s the only Fields Medal to be awarded for a work in logic. It’s safe to say that forcing is one of the most powerful techniques that set theorists have at their disposal as they explore the set theoretic multiverse.

Forcing has an intimidating reputation, which is somewhat deserved. The technique involves many moving parts and it takes some time to see how they all fit together. I found these three resources to be excellent for learning the basics of the technique:

Each of these sources takes a slightly different approach, and I think that they complement each other quite well. I’ve produced a high-level outline of the argument for why forcing works, as well as how to use forcing to prove the independence of the CH. I don’t feel quite competent to present all of forcing from the ground up, so this outline is not meant to be self-contained. In particular, I’m gonna assume you’re familiar with the basics of first order logic and ZFC and are comfortable with Boolean algebras. I’m also going to skim over the details of cardinal collapse, which are still fairly opaque to me.

Forcing in 5 lines!

  1. Fix a countable transitive model M ⊨ ZFC and a Boolean algebra (B, ≤) ∈ M.
  2. Define MB := { f ∈ M | f: MB ⇀ B } ⊆ M.
  3. Choose an M-generic ultrafilter G ⊆ B.
  4. For every n in MB, define valG(n) := { valG(m) | (m, b) ∈ n for some b in G }.
  5. Define M[G] := { valG(n) | n ∈ MB }. This is a countable transitive model of ZFC containing all of M as well as G!

We start with a countable transitive model M of ZFC, and then construct a larger countable transitive model M[G] of ZFC that contains a new set G. Let me say a few words on each step.

Step 1
1. Fix a countable transitive model M ⊨ ZFC and a Boolean algebra (B, ≤) ∈ M.

There’s not much to say about Step 1. I suppose it’s worth noting that we’re assuming the existence of a countable transitive model of ZFC, meaning that the results of forcing are all conditioned not just on the consistency of ZFC, but on the existence of transitive models of ZFC, which is significantly stronger. (The existence of countable transitive models is no stronger than the existence of transitive models, by the downwards Lowenheim-Skolem theorem.)

Moving on!

Step 2
2. Define MB := { f ∈ M | f: MB ⇀ B }

MB is sometimes called a Boolean-valued model of ZFC. The elements of MB are called “B-names” or “B-valued sets” or just “B-sets” (as I’ll call them). These B-sets are partial functions in M that go from MB to B. “Huh? Hold on,” I hear you saying, “Isn’t that a circular definition?”

It might look circular, but it’s not; it’s inductive. First of all, regardless of what MB is, the empty set counts as a partial function from MB to B: all its inputs are in MB, all its outputs are in B, and no input gets sent to multiple outputs! ∅ is also in M, so ∅ is an element of MB. But now that we know that ∅ is in MB, we can also construct the function {(∅, b)} for each b in B. This satisfies the criterion of “being a partial function from MB to B”, and it is in M because B is in M. So for any b in B, {(∅, b)} is in MB. And now we can construct more partial functions with these new elements of MB, like {(∅, b), ({(∅, b’)}, b’’)} and so on.

If you’re still not convinced that the definition of MB is sensible, we can define it without any apparent circularity, making the induction explicit:

  • V0 =
  • Vα+1 = { f ∈ M | f is a partial function from Vα to B } for any ordinal α in M.
  • Vλ = ⋃{Vα | α ∈ λ} for any limit ordinal λ in M.
  • Finally, MB = ⋃{Vα | α ∈ M}.

Okay, so now we know the definition of MB. But what is it? How do we intuitively think about the elements of MB? One way is to think of them as “fuzzy sets”, sets that contain each other to varying degrees. In this way of seeing things, the Boolean algebra B is our extension of the trivial Boolean algebra {True, False} to a more complicated Boolean algebra that allows intermediate truth values as well. So for instance, of the B-set {(∅, b)} we can say “it contains ∅ to degree b, and contains nothing else”. Of the B-set {(∅, b), ({(∅, b’)}, b’’)} we may say “it contains ∅ to degree b, and contains [the set that contains ∅ to degree b’] to degree b’’, and nothing else.” And so on.

Every Boolean algebra has a top element ⊤ corresponding to “definite truth” and a bottom element ⊥ corresponding to “definite falsity”, so our B-sets don’t have to be entirely fuzzy. For instance the B-set {(∅, ⊤)} definitely contains ∅ and definitely doesn’t contain anything else. It can be thought of as intuitively similar to the set {∅} in M. And the B-set {(∅, ⊥)} definitely doesn’t contain ∅, or anything else for that matter. This brings up an interesting question: how is the B-set {(∅, ⊥)} any different from the B-set ∅? They have the same intuitive properties: both contain nothing (or said differently, contain each thing to degree ⊥). Well that’s right! You can think of {(∅, ⊥)} and ∅ as two different names for the same idea. They are technically distinct as names, but in Steps 3 and 4 when we collapse MB back down to an ordinary model of ZFC, the distinction between them will vanish. (The details of how exactly we assign B-values to sentences about MB are interesting in their own right, and might deserve their own post.)

So intuitively what we’ve done is take our starting model M and produce a new structure that looks a lot like M in many ways except that it’s highly ambiguous: in addition to all the old sharp sets we have all these new fuzzy sets that are unsure about their members. In the next two steps we collapse all this ambiguity back down, erasing all of the fuzziness. It turns out that there are many ways to collapse the ambiguity (one for each ultrafilter in B in fact!), but we’ll want to choose our ultrafilter very carefully so that we (1) don’t end up back where we started, (2) end up with a model of ZFC, and (3) end up with an interesting model of ZFC.

Step 3
3. Choose an M-generic ultrafilter G ⊆ B.

There’s some terminology here that you might be unfamiliar with.

Firstly, what’s an ultrafilter? It’s a nonempty proper subset of B that’s closed upwards, closed under intersection, and contains one of b or ¬b for each b in B. I talked about ultrafilters here in the context of power sets, but it generalizes easily to Boolean algebras (and there are some pretty pictures to develop your intuitions a bit).

Okay, so that’s what ultrafilters are. What is it to be a generic ultrafilter? A generic ultrafilter is one that intersects every dense subset of B. What’s a dense subset of B? A dense subset of a Boolean algebra (B, ≤) is any subset of B\{⊥} that lower-bounds all of B\{⊥}. In other words, D is a dense subset of B if for every element of B\{⊥} there’s a lower element of D. Intuitively, D is like a foundation that the rest of B\{⊥} rests upon (where “upwards” corresponds to “greater than”). No matter where you are in B\{⊥}, you can follow the order downwards and eventually find yourself at an element of D. And a generic ultrafilter is an ultrafilter that shares something in common with each of these foundational subsets.Why have we excluded the bottom element of B, ⊥? If we hadn’t done so then the dense subsets would just be any and all sets containing ⊥! Then any generic ultrafilter would have to contain ⊥, but this is inconsistent with the definition of an ultrafilter!

Finally, what’s an M-generic ultrafilter? Recalling that a generic ultrafilter has to intersect every dense subset of B, an M-generic ultrafilter only needs to intersect every dense subset of B that’s in M.

I hear a possible objection! “But you already told us that B is in M! And since ZFC has the power set axiom, don’t we know that M must also contain all of B’s subsets? If so then M-genericity is no different from genericity!” The subtlety here is that the power set axiom guarantees us that M contains a set 𝒫(B) that contains all the subsets of B that are in M. Nowhere in ZFC is there a guarantee that every subset of B exists, and in fact such a guarantee is not possible in any first-order theory whatsoever! (Can you see how this follows from the downward Lowenheim-Skolem theorem?) The power set axiom doesn’t create any subsets of B, it merely collects together all the subsets of B that already exist in M.

Now, why do we care about G being M-generic? A few steps down the line we’ll see why M-genericity is such an important property, but in all honesty my intuition is murky here as to how to motivate it a priori. For now, take it on faith that M-genericity ends up being exactly the right property to require of G for our purposes. We’ll end up making extensive use of the fact that G intersects every dense subset of B in M.

Very keen readers might be wondering: how do we know that an M-generic ultrafilter even exists? This is where the countability of M saves us: since M is countable, it only contains countably many dense subsets of B: (D0, D1, D2, …). We construct G as follows: first we choose any b0 ∈ D0. Then for any n ∈ ℕ, since Dn+1 is dense in B, we can find a dn+1 ∈ Dn+1 such that dn+1 ≤ dn. Now we have an infinite descending chain of elements from dense subsets. Define G to be the upwards closure of {dn | n ∈ ℕ}, i.e. G = {b ∈ B | b ≥ dn for some n ∈ ℕ}. This is M-generic by construction, and it’s an ultrafilter: for any b ∈ B, either {b’ ∈ B | b’ ≤ b} or {b’ ∈ B | b’ ≤ ¬b} is dense, implying that G contains either b or ¬b. (Convince yourself that it’s also closed upwards and under ∧.)

Really important note: this proof of G’s existence relies on the countability of M. But from M’s perspective, it isn’t countable; i.e. M doesn’t believe that there’s a bijection that puts each set in correspondence with an element of omega. So M can’t prove the existence of G, and in fact G is a set that doesn’t exist within M at all.

Step 4
4. For every n in MB, define valG(n) := {valG(m) | (m, b) ∈ n for some b in G}.

Ok, next we use G to define a “valuation function” valG(•). This function takes the B-sets created in Step 2 and “collapses” them into ordinary sets. It’s defined inductively: the G-valuation of a B-set n is defined in terms of the G-valuations of the elements of n. (As a side note, the fact that this definition works relies on the transitivity of our starting model! Can you see why?)

Let’s work out some simple examples. Start with n = ∅. Then valG(∅) = {valG(m) | (m, b) ∈ ∅ for some b in G} = ∅, because there is no (m, b) in ∅. Thus G evaluates ∅ as ∅. So far so good! Now for a slightly trickier one: n = {(∅, b)}, where b is some element of B.

valG( {(∅,b)} ) = {valG(m) | (m, b) ∈ {(∅, b)} for some b in G} = {valG(m) | m = ∅ & b ∈ G}

There are two cases: either b is in G or b is not in G. In the first case, {(∅, b)} is evaluated to be {∅}. In the second, {(∅,b)} is evaluated to be {}. What’s the intuition here? G is a subset of B, and we can think of it as a criterion for determining which Boolean values will be collapsed to True. And since G is an ultrafilter, every Boolean value will either be collapsed to True (if b ∈ G) or to False (if ¬b ∈ G). Recall that we thought of {(∅, b)} as a B-set that “contained ∅ to degree b”. Our G-evaluation of this set removed all fuzziness: if b was included in G then we evaluate {(∅, b)} as actually containing ∅. And if not, then we said that it didn’t. Either way, all fuzziness has been removed and we’ve ended up with an ordinary set!

The same happens for every B-set. When we feed it into the function valG(•), we collapse all its fuzziness and end up with an ordinary set, whose members are determined by our choice of B.

Step 5
5. Define M[G] := {valG(n) | n ∈ MB}. M[G] is a countable transitive model of ZFC such that M ⊆ M[G] and G ∈ M[G].

Now we simply evaluate all the B-sets and collect them together, and give the result a name: M[G]. This part is easy: the hard work has already been done in defining MB and valG. What remains is to show that this new set is a countable transitive model of ZFC containing all of M as well as G. Let’s do this now.

First of all, why is M[G] countable? Well, M is countable and MB is a subset of M, so MB is countable. And valG is a surjective function from MB into M[G], so |M[G]| ≤ |MB|.

Second, why is M[G] transitive? This follows immediately from its definition: the elements of M[G] are G-valuations, and elements of G-valuations are themselves G-valuations. So elements of elements of M[G] are themselves elements of M[G]. That’s transitivity!

Third, why is M[G] a model of ZFC? Er, we’ll come back to this one.

Fourth, why is M ⊆ M[G]? To show this, we’ll take any arbitrary set x in M and show that it exists in M[G]. To do this, we first define a canonical name for x: a B-set Nx that acts as the surrogate of x in MB. Define Nx to be {(Ny, ⊤) | y ∈ x}. (Make sure that Nx is actually in MB!) What’s the G-valuation of Nx? It’s just x! Here’s a proof by ∈-induction:

First, N = {(Ny,⊤) | y ∈ ∅} = ∅.
So valG(N) = valG(∅) = {valG(m) | (m, b) ∈ ∅ for some b in G} = ∅.

Now, assume that valG(Ny) = y for every y ∈ x.
Then valG(Nx) = {valG(Ny) | (Ny, b) ∈ Nx for some b in G} = {y | (Ny, ⊤) ∈ Nx} = {y | y ∈ x} = x.

Thus by ∈-induction, for all x, valG(Nx) = x.

Finally, why is G ∈ M[G]? We prove this by finding a B-set that G-valuates to G itself. Define A := {(Nb, b) | b ∈ B}. Then valG(A) = { valG(m) | (m, b) ∈ A for some b in G } = {valG(Nb) | b ∈ G} = {b | b ∈ G} = G.

And there it is! We’re done!

Ha ha, tried to pull a fast one on you. We still have one crucial step remaining: proving that M[G] is actually a model of ZFC! As far as I know, there’s no concise way to do this… you have to actually go through each axiom of ZFC and verify that M[G] satisfies it. I’m not going to do that here, but I will provide proofs of four axioms to give you a flavor.

Infinity: M contains a set I satisfying the axiom of infinity and M ⊆ M[G]. So I ∈ M[G], and it still satisfies infinity (still contains ∅ and is closed under successor).

Pairing: Let x and y be any two sets in M[G]. Then for some n1 and n2 in MB, x = valG(n1) and y = valG(n2). Now define n := {(n1, ⊤), (n2, ⊤)} ∈ MB. Then valG(n) = {x, y}.

Union: Let x be any element of M[G]. Since M[G] is transitive, every element y of an element of x is in M[G]. So for each such y there’s a B-set ny such that valG(ny) = y. Define n := {(ny,⊤) | y ∈ z for some z ∈ x}. Then valG(n) = {valG(ny) | y ∈ z for some z ∈ x} = {y | y ∈ z for some z ∈ x} = ⋃x.

Comprehension: Let φ(y) be any first-order formula in the language of ZFC, and x ∈ M[G]. For every element y of x, let ny be any B-set that G-valuates to y. Define n := {(ny,⊤) | y ∈ x & φ(y)}. Then valG(n) = {y ∈ x | φ(y)}.

And I leave you to fill in the rest of the axioms of ZFC for yourself. Much of it looks very similar to the proofs of pairing, union, and comprehension: make the natural choice for a B-set which G-valuates to the particular set whose existence you’re trying to prove. And THEN you’re done!

✯✯✯

Okay! If you’ve made it this far, take a breather and congratulate yourself. You now understand how to adjoin a new set to an existing model of ZFC so long as this new set is an M-generic ultrafilter in a Boolean algebra B ∈ M). And this process works equally well no matter what Boolean algebra you pick!

In fact, the choice of the Boolean algebra in step 1 is the key degree of freedom we have in this whole process. B isn’t the set that we end up adjoining to M, and in fact B is always chosen to be an element of our starting model. But the fact that G is always chosen to be an M-generic filter in B means that the structure of B greatly influences what properties G has.

You might notice that we also technically have another degree of freedom in this process: namely step 3 where I said “choose an M-generic filter G in B”. While this is technically true, in practice much of forcing is about finding really clever choices of B so that any M-generic filter in B has whatever interesting property we’re looking for.

There’s another subtlety about the choice of B, namely that in applications of forcing one generally starts with a partially ordered set (P, ≤) and then extends it to a Boolean algebra B. A really cool result with an even cooler topological proof is that this extension is always possible. For any choice of P one can construct a complete Boolean algebra B such that P is order-isomorphic to a dense subset of B. (In a sentence, we turn P into a topological space whose open sets are the downwards-closed subsets and then let B be the set of regular open sets ordered by inclusion. More on this below.)

What’s great about this is that since P is dense in B, all dense subsets of P are also dense subsets of B. This means that we can analyze properties of our M-generic ultrafilter G in B solely by looking at the dense subsets of P! This means that for many purposes you can simply ignore the Boolean algebra B and let it do its work in the background, while focusing on the partially ordered set (P, ≤) you picked out. The advantage of this should be obvious: there are many more partially ordered sets than there are Boolean algebras, so we have much more freedom to creatively choose P.

Let’s summarize everything by going over the outline we started with and filling in a few details:

Forcing in slightly more than 5 lines

  1. Fix a countable transitive model M ⊨ ZFC and a partial order (P, ≤) ∈ M.
  2. Extend P to a Boolean algebra (B, ≤) ∈ M.
    1. Define T := {A ⊆ P | A is closed downwards}.
    2. Define B := {S ∈ T | Int(Cl(S)) = S}.
    3. (B, ⊆) is a Boolean algebra: ¬A = Int(Ac), A∧B = A⋂B, A∨B = Int(Cl(A∪B)).
    4. P is order-isomorphic to a dense subset of B
      1. For p ∈ P, define S(p) := {q ∈ P | q ≤ p} ∈ B.
      2. S is an order-isomorphism from P to B and {S(p) | p ∈ P} is dense in B.
  3. Define MB := { f ∈ M | f: MB ⇀ B }.
  4. Choose an M-generic ultrafilter G ⊆ B.
    1. Since M is countable, we can enumerate the dense subsets of B in M: (D0, D1, D2, …)
    2. Choose any b0 ∈ D0.
    3. For any n ∈ ℕ, since Dn+1 is dense in B, we can find a dn+1 ∈ Dn+1 s.t. dn+1 ≤ dn.
    4. G := { b ∈ B | b ≥ dn for some n ∈ ℕ } is an M-generic ultrafilter in B.
  5. For every n in MB, define valG(n) := { valG(m) | (m, b) ∈ n for some b in G }.
  6. Define M[G] := { valG(n) | n ∈ MB }.
    1. M[G] is a countable transitive model of ZFC.
    2. M ⊆ M[G]
      1. For any x ∈ M, define Nx := {(Ny,⊤) | y ∈ x} ∈ MB.
      2. Then valG(Nx) = x, so x ∈ M[G].
    3. G ∈ M[G]
      1. Define A := {(Nb, b) | b ∈ B} ∈ MB.
      2. valG(A) = G, so G ∈ M[G].

We are now ready to apply forcing to prove the independence of CH from ZFC! In Part 2 you will learn exactly what choice of P makes the cardinality of the continuum ≥ ℵ2 (thus making CH false) and what choice of P makes the cardinality of the continuum exactly ℵ1 (thus making CH true). In fact, I’ll do you one better: in Part 2 you’ll learn what to make P in order to make the cardinality of the continuum larger than ℵ69, ℵ420, or virtually any other cardinality you like!

End of Part 1. Part 2 here.

The Ultra Series: Guide

Assumed background knowledge: basic set theory lingo (∅, singleton, subset, power set, cardinality), what is first order logic (structures, universes, and interpretations), what are ℕ and ℝ, what’s the difference between countable and uncountable infinities, and what “continuum many” means.

1 Introduction
Here I give a high-level description of what an ultraproduct is, and provide a few examples. Skippable if you want to jump straight to the math!

2 Hypernaturals Simplified
Here you get a first glimpse of the hypernaturals. It’s a fuzzy glimpse from afar, and our first attempt to define them is overly simplified and imperfect. Nonetheless, we get some good intuitions for how hypernatural numbers are structured, before eventually confronting the problem at the core of the definition.

3 Hypernaturals in all their glory
We draw some pretty pictures and introduce the concept of an ultrafilter. The concept is put to work immediately, allowing us to give a full definition of the hypernaturals with no simplifications. The issues with the previous definition have now been patched, and the hypernaturals are a well-defined structure ripe to be explored.

4 Ultraproducts and Łoś’s theorem
We describe how to pronounce “Łoś”, define what an ultraproduct is, and see how the hypernaturals are actually just the ultraproduct of the naturals. And then we prove Łoś’s theorem!

5 Infinitely Large Primes
With the newfound power of Łoś’s theorem at our hands, we return to the realm of the hypernaturals and start exploring its structure. We describe some infinitely large prime numbers, and prove that there are infinitely many of them. We find more strange infinitely large hypernatural numbers in our exploration: numbers that can be divided by 2 ad infinitum, numbers that are divisible by every finite number, and more. We learn that there’s a subset of the hypernaturals that is arranged just like the positive rational numbers, but that the hypernaturals are not dense.

6 Ultraproducts and Compactness
We zoom out from the hypernaturals, and show that ultraproducts can be used to give the prettiest proof of the compactness theorem for first order logic. We prove it first for countable theories, and then for all theories. We then get a little wild and discuss some meta-logical results involving ultraproducts, definability, and compactness.

7 All About Countable Saturation
We now describe the most powerful property of ultraproducts: countable saturation. And then we prove it! With our new tool, we dive back into the hypernaturals to learn more about their structure. We show that for any countable set of hypernaturals, there’s a hypernaturals that’s divisible by them all, and see that this entails the existence of uncountably many hypernatural primes. We prove that the hypernaturals have uncountable cofinality and coinitiality. And from this we see that no two hypernaturals are countably infinitely far apart; all distances are finite or uncountable! We wrap up with a quick proof that ultraproducts are always either finite or uncountable, and a mind-blowing result that relates ultraproducts to the continuum hypothesis.

7.5 Shorter Proof of Countable Saturation
I give a significantly shorter and conceptually simpler proof of countable saturation than the previous post. Then I wax philosophical for a few minutes about constructivism in the context of ultraproduct-related proofs.

Ultraproducts and Łoś’s Theorem (Ultra Series 4)

Previous: Hypernaturals in all their glory

First things first, you’re probably asking yourself… how is Łoś pronounced?? I’m not the most knowledgeable when it comes to Polish pronunciation, but from what I’ve seen the Ł is like a “w”, the o is like the vowel in “thought”, and the ś is like “sh”. So it’s something like “wash”. (I think.)

Ok, on to the math! This is probably going to be the hardest post in this series, so I encourage you to read through it slowly and not give up if you start getting lost. Try to work out some examples for yourself; that’s often the best way to get a grasp on an abstract concept. (In general, my biggest tip for somebody starting to dive into serious mathematical content is to not read it like fiction! In fiction, short sentences can safely be read quickly. But math is a language in which complex ideas can be expressed very compactly. So fight your natural urge to rush through short technical sentences, and don’t feel bad about taking your time in parsing them!)

What the heck is an ultraproduct

Last post we defined an ultrafilter. Now let’s define an ultraproduct.

It turns out we’ve already seen an example of an ultraproduct: the hypernaturals. The hyperreals (denoted *ℝ) are another famous example of an ultraproduct, constructed from ℝ in exactly the same way as we obtained the hypernaturals *ℕ from ℕ. For any structure M whatsoever, there exists a “hyperstructure” *M obtained from M via an ultraproduct. So what exactly is an ultraproduct?

We’ll build up to it in a series of six steps.

(1) Choose an index set I.
(2) Choose a first-order language L and a family of L-structures indexed by I (Mi)i∈I.
(3) Define sequences of elements of these structures.
(4) Define an equivalence relation between these sequences.
(5) Construct the set of equivalence classes under this equivalence relation.
(6) Define the interpretations of the symbols of L in this set.

Let’s get started!

(1) First we select an index set I. This will be the set of indices we use when constructing our sequences of elements. If we want our sequences to be countably infinite, then we choose I = ℕ.

(2) Now, fix some first-order language L = <constants, relations, functions>. Consider any family of L-structures, indexed by I: (Mi)i∈I. For instance, if I = ℕ and L is the language of group theory <{e}, ∅, {⋅}>, then our family of L-structures might be (ℤ1, ℤ2, ℤ3, ℤ4, …), where ℤk is the integers-mod-k (i.e. the cyclic group of size k).

(3) Now we consider sequences of the elements of these structures. For instance, hyperreals are built from sequences of real numbers that look like (a0, a1, a2, a3, …). The set of indices used here is {0, 1, 2, 3, …}, i.e. ℕ.

If our index set isn’t countable, then it’s not possible anymore to visualize sequences like this. For instance, if I = ℝ, then our sequence will have an element for every real number. A more general way to formulate sequences is as maps from the index set I to the elements of the component models. The hyperreal sequence (a0, a1, a2, a3, …) can be thought of as a map a: ℕ → ℝ, where a(n) = an for each n ∈ ℕ.

In general, a sequence will be defined as follows:

f: I → U(Mi)i∈I such that f(i) ∈ Mi for each i ∈ I.

In the ℤk example, a sequence might look like (0, 1, 2, 3, 4, …). Note that 0 ∈ ℤ1, 1 ∈ ℤ2, 2 ∈ ℤ3, and so on. On the other hand (2, 1, 2, 3, 4, …) would not be a valid sequence, because there’s no 2 in ℤ1.

Now, consider the set of all sequences: { f: I → U(Mi)i∈I | f(i) ∈ Mi for each i ∈ I }. This set is called the direct product of (Mi)i∈I and is denoted Π(Mi)i∈I.

(4) We want to construct an equivalence relation on this set. We do so by first defining a free ultrafilter U on I. From the previous post, we know that every infinite set has a free ultrafilter on it, so as long as our index set is infinite, then we’re good to go.

With U in hand, we define the equivalence relation on Π(Mi)i∈I:

Let f and g be sequences (f,g: I → U(Mi)i∈I).
Then f ~ g if and only if { i ∈ I | f(i) = g(i) } ∈ U

You might be getting major deja-vu from the last post. Two sequences are said to be equivalent if the set of places-of-agreement is a member of the chosen ultrafilter. Since every free ultrafilter contains all cofinite sets, any two sequences that agree in all but finitely many places will be equivalent.

(5) Now all the pieces are in place. The ultraproduct of (Mi)i∈I with respect to U is the set of equivalence classes of Π(Mi)i∈I with respect to ~. This is typically written Π(Mi)i∈I/U. For a sequence a: I → U(Mi)i∈I we’ll denote the equivalence class it belongs to as [a].

Now, it’s not necessary for our indexed sequence (Mi)i∈I of L-structures to all be distinct. In fact, all the models can be the same, in which case we have (Mi)i∈I = (M)i∈I, and the ultraproduct Π(M)i∈I/U is called an ultrapower. The ultrapower of M with index set I and ultrafilter U can be written compactly as MI/U.

Some examples: The hypernaturals are the ultrapower ℕ/U = Π(ℕ)i∈ℕ/U where U is any free ultrafilter over ℕ. Similarly, the hyperreals are ℝ/U = Π(ℝ)i∈ℕ/U. The hyperintegers are ℤ/U. And so on.

(6) So far we’ve just defined the ultraproduct as a set (the set of equivalence classes of I-indexed sequences of elements from the models (Mi)i∈I). But we want the ultraproduct to have all the same structure as the models that we used as input. In other words, the ultraproduct of a bunch of L-structures will itself be an L-structure. To make this happen, we need to specify how the relation symbols and function symbols of L work in the ultraproduct model.

Here’s how it works. If f is a unary function symbol in the language L, then we define f on Π(Mi)i∈I/U by applying the function elementwise to the sequences. So:

f: Π(Mi)i∈I/U → Π(Mi)i∈I/U is defined as f([a])(i) = f(a(i)) for every i ∈ I.

What if f is a binary function symbol? Then:

f: (Π(Mi)i∈I/U)2 → Π(Mi)i∈I/U is defined as f([a], [b])(i) = f(a(i), b(i)) for every i ∈ I.

This generalizes in the obvious way to trinary function symbols, quaternary function symbols, and so on.

What about relations? Suppose R is a unary relation symbol in the language L. We need to define R on the ultraproduct Π(Mi)i∈I/U, and we do it as follows:

Π(Mi)i∈I/U ⊨ R([a]) if and only if { i ∈ I | Mi ⊨ R(a(i)) } ∈ U.

In other words, Π(Mi)i∈I/U affirms R([a]) if and only if the set of indices i such that Mi affirms R(a(i)) is in the ultrafilter. For example, if R holds for cofinitely many members of the sequence a, then R holds of [a].

If R is binary, we define it as follows:

Π(Mi)i∈I/U ⊨ R([a], [b]) if and only if { i ∈ I | Mi ⊨ R(a(i), b(i)) } ∈ U.

And again, this generalizes in the obvious way.

This fully defines the ultraproduct Π(Mi)i∈I/U as an L-structure! (If you’re thinking ‘what about constant symbols?’, remember that constants are just 0-ary functions)

Say that again, slower

That was really abstract, so let’s go through it again with the (hopefully now-familiar) example of the hypernaturals.

We start by defining an index set I. We choose I = ℕ.

Now define the language we’ll use. This will be the standard language of Peano arithmetic: one constant symbol (0), one relation symbol (<), and three function symbols (S, +, ×).

The family of structures in this language that we’ll consider (Mi)i∈I will just be a single structure repeated: for each i, Mi will be the L-structure ℕ (the natural numbers with 0, <, +, and × defined on it). So our family of structures is just (ℕ)i∈ℕ = (ℕ, ℕ, ℕ, ℕ, …).

Our sequences are functions from I to U(Mi)i∈I such that for each i∈I, f(i) ∈ Mi. For the hypernaturals, I and U(ℕ)i∈ℕ are both just ℕ, so our sequences are functions from ℕ to ℕ. We can represent the function f: ℕ → ℕ in the familiar way: (f(0), f(1), f(2), f(3), …).

The set of all sequences is the set of all functions from ℕ to ℕ. This is the direct product Π(ℕ)i∈ℕ = ℕ.

Now, we take any ultrafilter on I = ℕ. Call it U. We use U to define the equivalence relation on the direct product ℕ:

(a(0), a(1), a(2), …) ~ (b(0), b(1), b(2), …) if and only if { i ∈ ℕ | a(i) = b(i) } ∈ U

And taking equivalence classes of this relation, we’ve recovered our original definition of the hypernatural numbers! ℕ/U = *ℕ. Now we finish up by defining all functions and relations on *ℕ.

Functions are defined pointwise:

0 = [0, 0, 0, 0, …]
S[a(0), a(1), a(2), …] = [Sa(0), Sa(1), Sa(2), …]
[a(0), a(1), a(2), …] + [b(0), b(1), b(2), …] = [a(0) + b(0), a(1) + b(1), a(2) + b(2), …]
[a(0), a(1), a(2), …] ⋅ [b(0), b(1), b(2), …] = [a(0) ⋅ b(0), a(1) ⋅ b(1), a(2) ⋅ b(2), …]

We just have one relation symbol <, and relations are defined according to the ultrafilter:

[a(0), a(1), a(2), …] < [b(0), b(1), b(2), …] iff { i ∈ ℕ | ℕ ⊨ (a(i) < b(i)) } ∈ U

And we’re done!

Łoś’s theorem

Now we’re ready to prove Łoś’s theorem in its full generality. First, let’s state the result:

Fix any index set I, any language L, and any family of L-structures (Mi)i∈I. Choose a free ultrafilter U on I and construct the ultraproduct structure Π(Mi)i∈I/U. Łoś’s theorem says:

For every L-sentence φ, Π(Mi)i∈I/U ⊨ φ if and only if { i ∈ I | Mi ⊨ φ } ∈ U.

A special case of this is where our ultraproduct is an ultrapower of M, in which case it reduces to:

For every L-sentence φ, MI/U ⊨ φ if and only if M ⊨ φ

In other words, any ultrapower of M is elementary equivalent to M!

The proof is by induction on the set of all L-formulas.

Base case: φ is atomic

Atomic sentences are either of the form R(t1, …, tn) or (t1 = t2) for an n-ary relation symbol R and terms t1, … tn.

Suppose φ is R(t1, …, tn). This case is easy: it was literally the way we defined the interpretation of relation symbols in the ultraproduct model that Π(Mi)i∈I/U ⊨ R(t1, …, tn) if and only if {i ∈ I | Mi ⊨ R(t1, …, tn)} ∈ U.

Suppose φ is (t1 = t2). t1 and t2 are terms, so the ultraproduct model (Π(Mi)i∈I/U) interprets them as I-sequences, i.e. functions from I to U(Mi)i∈I such that t1(i) and t2(i) are both in Mi. We’ll write the denotations of t1 and t2 as [t1] and [t2]. Now, [t1] = [t2] iff { i ∈ I | t1 = t2 } ∈ U iff { i ∈ I | Mi ⊨ (t1 = t2) } ∈ U, which is what we want.

Inductive step: φ is ¬ψ, ψ∧θ, or ∃x ψ

Assume that Los’s theorem holds for ψ and θ. Now we must show that it holds for φ

Suppose φ is ¬ψ.

Then Π(Mi)i∈I/U ⊨ φ
iff Π(Mi)i∈I/U ⊭ ψ
iff { i ∈ I | Mi ⊨ ψ } ∉ U (by the inductive hypothesis)
iff { i ∈ I | Mi ⊨ ψ }c ∈ U (by the ultra property of U)
iff { i ∈ I | Mi ⊭ ψ } ∈ U

iff { i ∈ I | Mi ⊨ ¬ψ } ∈ U
iff { i ∈ I | Mi ⊨ φ } ∈ U

Suppose φ is ψ∧θ.

Then Π(Mi)i∈I/U ⊨ φ
iff Π(Mi)i∈I/U ⊨ ψ and Π(Mi)i∈I/U ⊨ θ
iff { i ∈ I | Mi ⊨ ψ } ∈ U and { i ∈ I | Mi ⊨ θ } ∈ U (by the inductive hypothesis)
iff { i ∈ I | Mi ⊨ ψ } ⋂ { i ∈ I | Mi ⊨ θ } ∈ U (by closure-under- of U)
iff { i ∈ I | Mi ⊨ ψ and Mi ⊨ θ } ∈ U
iff { i ∈ I | Mi ⊨ ψ∧θ } ∈ U
iff { i ∈ I | Mi ⊨ φ } ∈ U

Suppose φ is ∃x ψ.

Then Π(Mi)i∈I/U ⊨ φ
iff Π(Mi)i∈I/U ⊨ ∃x ψ
iff Π(Mi)i∈I/U ⊨ ψ(a) for some [a] ∈ Π(Mi)i∈I/U
iff { i ∈ I | Mi ⊨ ψ(a(i)) } ∈ U (by the inductive hypothesis)
iff { i ∈ I | Mi ⊨ ∃x ψ(x) } ∈ U

And that completes the proof! We don’t need to consider ∨, →, ↔, or ∀, because these can all be defined in terms of ¬, ∧, and ∃.

Now we know that the first-order properties of ultraproducts are tied closely to those of their component structures. The ultraproduct of any collection of two-element structures is itself a two-element structure. Same with the ultraproduct of any collection of structures, cofinitely many of which are two-element structures!

The ultraproduct of any collection of PA models is itself a PA model. The ultraproduct of any collection of groups is itself a group. But the ultraproduct of all finite groups need not itself be finite, because “I am finite” isn’t first-order expressible.

And in particular, an ultrapower of a structure M perfectly mimics ALL of the first-order properties of M!

Łoś’s theorem is an incredibly powerful tool we can wield to illuminate the strange structure of the hypernatural numbers. We’re now positioned to discover nonstandard prime numbers, infinitely even numbers, numbers that are divisible by every standard natural number, and infinitely large prime gaps. All of this (and more) in the next post!

Next: Weird nonstandard numbers

Hypernaturals in all their glory (Ultra Series 3)

Previous: Hypernaturals simplified

What is an ultrafilter? (with pretty pictures)

To define an ultrafilter we need to first define a filter. Here’s a pretty good initial intuition for what a filter is: a filter on a set X is a criterion for deciding which subsets of X are “large”. In other words, a filter provides us one way of conceptualizing the idea of large and small subsets, and it allows us to do so in a way that gives us more resolution than the cardinality approach (namely, assess size of sets just in terms of their cardinality). For example, in a countably infinite set X, the cofinite subsets of X (those that contain all but finitely many elements of X) have the same cardinality as the subsets of X that are infinite but not cofinite. But there’s some intuitive sense in which a set that contains all but finitely many things is larger than a set that leaves out infinitely many things. Filters allow us to capture this distinction.

Alright, so given a set X, a filter F on X is a collection of subsets of X (i.e. it’s a subset of 𝒫(X)) that satisfies the following four conditions:

(i) X ∈ F
(ii) ∅ ∉ F
(iii) If A ⊆ B and A ∈ F, then B ∈ F
(iv) If A ∈ F and B ∈ F, then A ⋂ B ∈ F

In other words, a filter on X is a set of subsets of X that contains X, doesn’t contain the empty set, and is closed under supersets and intersection. Note that a filter is also closed under union, because of (iii) (the union of A and B is a superset of A).

An ultrafilter is a filter with one more constraint, namely that for any subset of X, either that subset or its complement is in the filter.

(v) For any A ⊆ X, either A ∈ F or (X\A) ∈ F.

There’s a nice way to visualize filters and ultrafilters that uses the Hasse diagram of the power set of X. For a concrete example, let X = {a, b}. We can draw the power-set of X as follows:

We draw an arrow from A to B when A is a subset of B. Now, what are the possible filters on X? There are three, see if you can find them all before reading on.

Only two of these are ultrafilters. Which two?

Remember that for an ultrafilter U, every subset or its complement is in U. So an ultrafilter always contains half of all subsets. This gives an easy way to rule out the first one.

Another example: let X = {a, b, c}. Then the power-set of X looks like:

Note that we’ve left out some arrows, like the arrow from {a} to {a,b,c}. This is okay, because transitivity of the subset relation makes this arrow redundant. Anyway, what are some filters on X? Here are three of them:

Only one of these is an ultrafilter! You should be able to identify it pretty easily. See if you can pick out the other four filters, and identify which of them are ultrafilters (there should be two). And another exercise: why is the following not a filter?

Does it have any extension that’s an ultrafilter?

One thing to notice is that in all of these examples, when something is in the filter then everything it points to is also in the filter. This corresponds to ultrafilters being closed under supersets. Also, for any two things in the filter, their meet (their greatest lower bound; the highest set on the diagram that points to both of them) is also is the filter. This corresponds to closure under intersections.

Imagine that there is a stream flowing up the Hasse diagram through all the various paths represented by arrows. Choose any point on the diagram and imagine dripping green dye into the water at that point. The green color filters up through the diagram until it reaches the top. And everything that’s colored green is in the filter! This captures the idea that filters are closed under superset, but what about intersection? If X is finite, this corresponds to the dye all coming from a single source, rather than it being dripped in at multiple distinct points. The infinite case is a little trickier, as we’ll see shortly.

One other important thing to notice is that whenever we had an ultrafilter, it always contained a singleton. An ultrafilter that contains a singleton is called a principal ultrafilter, and an ultrafilter that doesn’t contain any singletons is called a free ultrafilter. So far we haven’t seen any free ultrafilters, and in fact as long as X is finite, any ultrafilter on X will be principal. (Prove this!) But the situation changes when X is an infinite set.

The Hasse diagram for an infinite set is a bit harder to visualize, since now we have uncountably many subsets. But let’s try anyway! What does the Hasse diagram of ℕ look like? Well, we know that ∅ is at the bottom and ℕ is at the top, so let’s start there.

Next we can draw all the singleton sets. ∅ points at all of these, so we’re not going to bother drawing each individual arrow.

Next we have all the pair sets, and then the triples. Each singleton points at infinitely many pairs, and each pair points at infinitely many triples.

And so on through all finite cardinalities.

Now what? We’ve only exhausted all the finite sets. We can now start from the top with the cofinite sets, those that are missing only finitely many things. First we have the sets that contain all but a single natural number:

Then the sets containing all but a pair of naturals, and so on through all the cofinite sets.

But we’re not done yet. We haven’t exhausted all of the subsets of ℕ; for instance the set of even numbers is neither finite nor cofinite. In fact, there are only countably many finite and cofinite sets, but there are uncountably many subsets of ℕ, so there must be a thick intermediate section of infinite sets that are not cofinite (i.e. infinite sets with infinite complements).

A sanity check that this diagram makes sense: start with a finite set and then add elements until you have a cofinite set. Between the finite set and the cofinite set there’s always an intermediate set that’s infinite but not cofinite. This matches with our image: any path from the finite to the cofinite passes through the middle section.

Now, what would a filter on the naturals look like on this diagram? If our filter is principal, then we can still roughly sketch it the same way as before:

How about an ultrafilter? Depends on whether it’s principal or free. Any principal ultrafilter must look like the third image above; it must start at the “finite” section and filter upwards (remember that principal means that it contains a singleton).

Any principal ultrafilter on ℕ can be written as { A ⊆ ℕ | n ∈ A } for some n ∈ ℕ.

What about free ultrafilters? A free ultrafilter contains no singletons. This implies that it contains no finite set. See if you can come up with a proof, and only then read on to see mine.

Suppose that U is a free ultrafilter on X and contains some finite set F. U is free, so it contains no singletons. So for every a ∈ F, the singleton {a} ∉ U. By ultra, X\{a} ∈ U. By closure-under-finite-intersection, the intersection of {X\{a} | a ∈ F} is in U. So X\F ∈ U. But now we have F ∈ U and X\F ∈ U, and their intersection is ∅. So ∅ ∈ U, contradicting filter.

So a free ultrafilter must contain no finite sets, meaning that it contains all the cofinite sets. Since it’s ultra, it also contains “half” of all the intermediate sets. So visually it’ll look something like:

That’s what a free ultrafilter on the naturals would look like if such a thing existed. But how do we know that any such object actually does exist? This is not so trivial, and in fact the proof of existence uses the axiom of choice. Here’s a short proof using Zorn’s Lemma (which is equivalent to choice in ZF).

Let F be any filter on X. Consider the set Ω of all filters on X that extend F. (Ω, ⊆) is a partially ordered set, and for any nonempty chain of filters C ⊆ Ω, the union of C is itself a filter on X. (Prove this!) The union of C is also an upper bound on C, meaning that every nonempty chain of filters has an upper bound. Now we apply Zorn’s Lemma to conclude that there’s a maximal filter U in Ω. Maximality of U means that U is not a subset of V for any V ∈ Ω.

Almost done! U is maximal, but is it an ultrafilter? Suppose not. Then there’s some A in X such that A ∉ U and (X\A) ∉ U. Simply extend U by adding in A and all supersets and intersections. This is a filter that extends F and contains U, contradicting maximality. So U is an ultrafilter on X!

Now, F was a totally arbitrary filter. So we’ve shown that every filter on X has an ultrafilter extension. Now let X be infinite and take the filter on X consisting of all cofinite subsets of X (this is called the Fréchet filter). Any ultrafilter extension of the Fréchet filter also contains all cofinite subsets of X, and thus contains no singletons. So it’s free! Thus any infinite set has a free ultrafilter.

Hypernatural numbers

Still with me? Good! Then you’re ready for the full definition of the hypernatural numbers, using ultrafilters. Take any free ultrafilter U on ℕ. U contains all cofinite sets and no finite sets, and is also decisive on all the intermediate sets. If you remember from the last post, this makes U a perfect fit for our desired “decisiveness criterion”.

Now consider the set of all countable sequences of natural numbers. Define the equivalence relation ~ on this set as follows:

(a1, a2, a3, …) ~ (b1, b2, b3, …) iff { k ∈ ℕ | ak = bk } ∈ U

Note the resemblance to our definition last post:

(a1, a2, a3, …) ~ (b1, b2, b3, …) iff { k ∈ ℕ | ak = bk } is cofinite

This previous definition corresponded to using the Fréchet filter for our criterion. But since it was not an ultrafilter, it didn’t suffice. Now, with an ultrafilter in hand, we get decisiveness!

Addition and multiplication on the hypernaturals is defined very easily:

[a1, a2, a3, …] + [b1, b2, b3, …] = [a1+b1, a2+b2, a3+b3, …]
[a1, a2, a3, …] ⋅ [b1, b2, b3, …] = [a1⋅b1, a2⋅b2, a3⋅b3, …]

Let’s now define < on the hypernaturals.

(a1, a2, a3, …) < (b1, b2, b3, …) if { k ∈ ℕ | ak = bk } ∈ U

The proof of transitivity in the previous post still works here. Now let’s prove that < is a total order.

Consider the following three sets:

X = { k ∈ ℕ | ak < bk }
Y = { k ∈ ℕ | ak > bk }
Z = { k ∈ ℕ | ak = bk }

The intersection of any pair of these sets is empty, meaning that at most one of them is in U. Could none of them be in U? Suppose X, Y, and Z are not in U. Then ℕ\X and ℕ\Y are in U. So (ℕ\X) ⋃ (ℕ\Y) is in U as well. But (ℕ\X) ⋃ (ℕ\Y) = Z! So Z is in U, contradicting our assumption.

So exactly one of these three sets is in U, meaning that a < b or b < a or a = b. This proves that using an ultrafilter really has fixed the problem we ran into previously. This problem was that the hypernaturals were quite different from the naturals in undesirable ways (like < not being a total order). The natural question to ask now is “Just how similar are the hypernaturals to the naturals?”

The answer is remarkable. It turns out that there are no first-order expressible differences between the naturals and the hypernaturals! Any first-order sentence that holds true of the natural numbers also holds true of the hypernatural numbers! This result is actually just one special case of an incredibly general result called Łoś’s theorem. And in the next post we are going to prove it!

Next up: Łoś’s theorem and ultraproducts!

Hypernaturals Simplified (Ultra Series 2)

Previous: Introduction

What is a hypernatural number? It is a collection of infinitely long sequences of natural numbers. More precisely, it is an equivalence class of these infinite sequences.

An equivalence class under what equivalence relation? This is a little tricky to describe.

I’ll start with a slight lie to simplify the story. When we see the trouble that results from our simple definition, I will reveal the true nature of the equivalence relation that gives the hypernaturals. In the process you’ll see how the notion of an ultrafilter naturally arises.

So, hypernaturals are all about infinite sequences of natural numbers. Some examples of these:

(0,1,2,3,4,5,6,7,8,9,…)
(0,1,0,2,0,3,0,4,0,5,…)
(1,2,1,2,1,2,1,2,1,2,…)
(0,2,4,6,8,10,12,14,…)
(3,1,4,1,5,9,2,6,5,3,…)

We’ll define an equivalence relation ~ between sequences as follows:

Let x and y be infinite sequences of natural numbers.
Then x ~ y iff x and y agree in all but finitely many places.

For example, (0,1,2,3,4,5,6,…) ~ (19,1,2,3,4,5,6,…), because these two sequences only disagree at one spot (the zero index).

(1,1,2,2,4,4,…) and (1,2,4,8,…) are not equivalent, because these sequences disagree at infinitely many indices (every index besides the zeroth index).

Same with (0,1,2,3,4,5,6,…) and (1,2,3,4,5,6,7,…); even though they look similar, these sequences disagree everywhere.

(2,4,6,8,10,12,14,…) and (2,0,6,0,10,0,14,…) are not equivalent, because these sequences disagree at infinitely many indices (every odd index).

One can easily check that ~ is an equivalence relation, and thus it partitions the set of sequences of naturals into equivalence classes. We’ll denote the equivalence class of the sequence (a1, a2, a3, …) as [a1, a2, a3, …]. These equivalence classes are (our first stab at the definition of) the hypernaturals!

For instance, the equivalence class of (0,0,0,0,0,…) contains (1,4,2,0,0,0,0,0,…), as well as (0,2,4,19,0,0,0,0,…), and every other sequence that eventually agrees with (0,0,0,0,…) forever. So all of these correspond to the same hypernatural number: [0,0,0,0,…]. This object is our first hypernatural number! It is in fact the hypernatural number that corresponds exactly to the ordinary natural number 0. In other words 0 = [0,0,0,0,…].

[1,1,1,1,…] is a distinct equivalence class from [0,0,0,0,…]. After all, the sequences (0,0,0,0,…) and (1,1,1,1,…) disagree everywhere. You might guess that [1,1,1,1,…] is the hypernatural analogue to the natural number 1, and you’d be right!

For any standard natural number N, the corresponding hypernatural number is [N,N,N,N,N,…], the equivalence class of the sequence consisting entirely of Ns.

Now consider the hypernatural [0,1,2,3,4,5,6,…]. Let’s call it K. Does K = N for any standard natural number N? In other words, is (0,1,2,3,4,5,6,…) ~ (N,N,N,N,N,…) true for any finite N? No! Whatever N you choose, it will only agree with (0,1,2,3,4,5,6,…) at one location. We need cofinite agreement, and here we have merely finite agreement. Not good enough! This means that K is our first nonstandard natural number!

How does K relate to the standard naturals in terms of order? We haven’t talked about how to define < on the hypernaturals yet, but it’s much the same as our definition of =.

[a1, a2, a3, …] = [b1, b2, b3, …]
iff
{ k∈ℕ | ak = bk } is cofinite

[a1, a2, a3, …] < [b1, b2, b3, …]
iff
{ k∈ℕ | ak < bk } is cofinite

Exercise: Verify that this is in fact a well-defined relation. Every equivalence class has many different possible representatives; why does the choice of representatives not matter for the purpose of describing order?

Now we can see that K > N for every standard N. Look at (0,1,2,3,4,5,…) and (N,N,N,N,…). The elements of the first sequence are only less than the elements of the second sequence at the first N indices. Then the elements of K are greater than the elements of N forever. So elements of K’s representative sequence are greater than elements of N’s representative sequence in a cofinite set of indices. Thus, K > N for every standard N. So K is an infinitely large number!

Here’s another one: K’ = [0,2,4,6,8,…]. You can see that K’ > K, because the elements of K’ are greater than those of K at all but one index (the first one). So we have another, bigger, infinite number.

Addition and multiplication are defined elementwise, so

K + K
= [0,1,2,3,4,…] + [0,1,2,3,4,…]
= [0+0, 1+1, 2+2, 3+3, 4+4, …]
= [0,2,4,6,8,…]
= K’

K’
= [0,2,4,6,8,…]
= [2⋅0, 2⋅1, 2⋅2, 2⋅3, 2⋅4, …]
= 2⋅[0,1,2,3,4,…]
= 2⋅K

Predictably, we get many many infinities. In fact, there are continuum many nonstandard hypernatural numbers!

Proof: we construct an injection f from ℝ to *ℕ. If x is a real number, then f(x) := [floor(x), floor(10x), floor(100x), floor(1000x), …]. For example, f(35.23957…) = [35,352,3523,35239,352395, …]. For any two distinct reals x and y, the sequences x and y will eventually disagree forever. So each real is mapped to a distinct hypernatural, meaning that there are no more reals than hypernaturals. At the same time, there are no more hypernaturals than reals, because there are only continuum many countable sequences of natural numbers. So |*ℕ| = |ℝ|.

It turns out that every nonstandard hypernatural number is also larger than every standard natural number. We’ll see why in a bit, but it’ll take a bit of subtlety that I’ve yet to introduce.

Now, > is transitive in ℕ. Is it also transitive in *ℕ? Yes! Suppose A > B and B > C. Choose any representative sequences (a1, a2, a3, …), (b1, b2, b3, …), and (c1, c2, c3, …) for A, B, and C. Then X = { k∈ℕ | ak > bk } and Y = { k∈ℕ | bk > ck } are both cofinite. The intersection of cofinite sets is also cofinite, meaning that X⋂Y = { k∈ℕ | ak > bk and bk > ck } = { k∈ℕ | ak > ck } is cofinite. So A > C!

It’s a good sign that > is transitive. But unfortunately, the story I’ve told you thus far starts to break down here. The greater-than relation is a total order on the natural numbers. For any naturals a and b, exactly one of the following is true: a = b, a > b, b > a. But this is not true of the hypernaturals!

Consider the two hypernatural numbers n = [0,1,0,1,0,1,…] and m = [1,0,1,0,1,0,…]. Are n and m equal? Clearly not; they disagree everywhere. So n ≠ m.

Is n > m? No. The set of indices where n’s sequence is greater than m’s sequence is {1, 3, 5, 7, …}, which is not cofinite.

So is m > n? No! The set of indices where m’s sequence is greater than n’s sequence is {0, 2, 4, 6, …}, which is also not cofinite!

So as we’ve currently defined the hypernatural numbers, the > relation is not a total relation on them. This might be fine for some purposes, but we’ll be interested in defining the hypernaturals to mirror the properties of the naturals as closely as possible. So we’ll have to tweak our definition of the hypernaturals. The tweak will occur way back at the start where we defined our equivalence relation on sequences of naturals.

Recall: we said that two sequences (a1, a2, a3, …) and (b1, b2, b3, …), are equivalent if they agree in all but finitely many places. Said another way: a ~ b if { k∈ℕ | ak = bk } is cofinite. We defined > similarly: a > b if the agreement set for > is cofinite.

The problem with this definition was that it wasn’t definitive enough. There are cases where the agreement set is neither cofinite nor finite. (Like in our previous example, where the agreement set was the evens.) In such cases, our system gives us no direction as to whether a > b or b > a. We need a criterion that still deals with all the cofinite and finite cases appropriately, but also gives us a definitive answer in every other case. In other words, for ANY possible set X of indices of agreement, either X or X’s complement must be considered “large enough” to decide in its favor.

For example, maybe we say that if { k∈ℕ | ak = bk } = {0,2,4,6,8,…}, then a > b. Now our criterion for whether a > b is: the set of indices for which ak = bk is either cofinite OR it’s the evens. This implies that [1,0,1,0,1,0,…] > [0,1,0,1,0,1,…].

Once we’ve made this choice, consistency forces us to also accept other sets besides the evens as decisive. For instance, now compare (0,0,1,0,1,0,…) and (0,1,0,1,0,1,…). The set of indices where the first is greater than the second is {2,4,6,8,…}. But notice that the first differs only cofinitely from (1,0,1,0,…), meaning that [0,0,1,0,1,0,…] = [1,0,1,0,1,0,…]. The conclusion is that [0,0,1,0,1,0,…] > [0,1,0,1,0,1,…], which says that the set of indices {2,4,6,8,…} must also be decisive. And in general, once we’ve accepted the evens as a decisive set of indices, we must also accept the evens minus any finite set.

The three criterion we’ve seen as desirable for what sets of indices will count as decisive are (1) includes all cofinite sets, (2) for any set X, either X or X’s complement is decisive, and (3) consistency. These requirements turn out to correspond perfectly to a type of mathematical object called a free ultra-filter!

In the next post, we will define ultrafilters and finalize our definition of the hypernatural numbers!

Even Numbers are Tautologies

To each atomic proposition P assign a natural number p. If that natural number is even, then its corresponding proposition is true. If the number is odd then the proposition is false.

If you have two numbers n and m and you multiply them, the result is even so long as either n or m is even. So multiplication is like or; P∨Q corresponds to pq.

Negation is like adding 1: even becomes odd and odd becomes even. So ¬P corresponds to p+1.

Consider addition: p+q is even if p and q are both even or both odd. So p+q is like the biconditional ↔.

Other connectives can be formed out of these. Take P∧Q: P∧Q is equivalent to ¬(¬P∨¬Q), which is (p+1)(q+1) + 1. So P∧Q corresponds to pq+p+q+2. P→Q is ¬P∨Q, which is (p+1)q.

The logical constants ⊤ and ⊥ correspond to numerical constants: ⊤ can be assigned 0 and ⊥ assigned 1, for instance.

Tautologies translate to algebraic expressions which are always even. For instance, P→P translates to (p+1)p, which is always even. P→(Q→P) translates to (p+1)(q+1)p, which is also always even. P∨¬P translates to p(p+1), always even.

Contradictions translate to algebraic expressions which are always odd. P∧¬P translates to (p+1)(p+2) + 1, which is always odd. And so on.

Inference rules can be understood as saying: if all the premises are even, then the conclusion will be even. Take conjunction elimination: P∧Q ⊨ P. This says that if (p+1)(q+1) + 1 is even then p is even, which ends up being right if you work it out. Modus ponens: P, P→Q ⊨ Q. This says that if p and (p+1)q are both even, then q is even. Again works out!

You can also work the other way: For any number p, p(p+1) is even. Translating into propositional logic, this says that P∨¬P is a tautology. We’ve proved the law of the excluded middle! It’s interesting to note that earlier we saw that P→P translates to (p+1)p. So in some sense the law of the excluded middle is just self-implication but with the two products reversed!

A Compact Proof of the Compactness Theorem

I’ve written up a proof of the compactness theorem before, but I recently looked it over and think that the proof can be expressed more, heh heh, compactly, than before.

So, what is the compactness theorem? It is the statement that if a set of statements is finitely satisfiable (each of its finite subsets has a model), then it’s satisfiable. The converse of this (a satisfiable set of statements is also finitely satisfiable) is trivial: a model of a set of sentences is also a model of every subset of that set. The following proof will be for propositional logic, but can be easily extended to first order logic.

The short version

Suppose that a set of sentences A is finitely satisfiable. We’ll extend A to a larger set B by giving A an “opinion” on every sentence. We can build up this extension in a series of stages: B0 is just A. Now, take any sentence a. If B0 ⋃ {a} is finitely satisfiable, define B1 to be B0 ⋃ {a}. Otherwise, define B1 to be B0 ⋃ {¬a}. Either way, B1 will be finitely satisfiable, because B0 cannot be inconsistent with both a and ¬a. When we’ve gone through every sentence, we take the union of all these extensions to form B. B is finitely satisfiable, since every finite subset of B is also a finite subset of one of the extensions that make it up. Now, we define a truth assignment V that assigns True to each propositional variable p if and only if p is in B. V satisfies B (which you can show by induction on sentences), and since A is a subset of B, V also satisfies A. So A is satisfiable.

The long(er) version

Suppose that a set of sentences A is finitely satisfiable. If a set of sentences A’ is finitely satisfiable, then for any sentence a, at least one of A’ ⋃ {a} and A’ ⋃ {¬a} is finitely satisfiable. If neither were, then we’d have two finite sets, one that entails ¬a and the other that entails a, and the union of these would be an unsatisfiable finite subset of A’.) So given any well-ordering of the sentences of the language, we can extend A one sentence at a time, maintaining finite satisfiability at each step. The union of all these extensions, call it B, is still finitely satisfiable, because any finite subset of B is also a finite subset of one of the extensions. B is also complete: for any sentence b either b ∈ B or ¬b ∈ B.

Now, define the truth assignment V as follows: for any atomic sentence b, V(b) = True iff b ∈ B. V satisfies B, which we prove by induction on sentences:

  • If b is an atomic sentence, then V(b) = True iff b ∈ B, by construction.
  • If b = ¬c for some c (for which V(c) = True iff c ∈ B), then V(b) = True iff V(c) = False iff c ∉ B iff b ∈ B.
  • If b = c ∧ d for some c and d that satisfy the induction hypothesis, then V(b) = V(c ∧ d) = True iff V(c) = V(d) = True iff c ∈ B and d ∈ B. If both c and d are in B, then ¬(c ∧ d) can’t be in B by finite satisfiability, so c ∧ d ∈ B by completeness of B. And if c ∧ d ∈ B, then neither ¬c nor ¬d can be in B by finite satisfiability. So c and d are both in B by completeness of B.

This proof by induction covers the connectives ¬ and ∧, with which you can express all other connectives, so this shows that V satisfies B. And since B is a superset of A, V satisfies A as well. This shows that any finitely satisfiable set of sentences is also satisfiable. Note that we used the axiom of choice implicitly with the instruction to well-order the language. As the language can be any size, this is equivalent to the well-ordering principle, which is equivalent in ZF to the axiom of choice. The same sort of assumption arises in the proof of the completeness theorems for first-order and propositional logics. If you reject choice, then you should also be skeptical that propositional logic and first-order logics are complete!

Defining uncountability

Most of the shocking results in mathematical logic have to do with how limited we are in terms of producing good deductive systems for mathematical concepts. (Good here means sound, complete, finitary, and computable.) No logic with a good deductive system can categorically define the natural numbers. Adding a “for finitely many” quantifier to first-order logic makes it impossible to have a good deductive system. Ditto for a “for all subsets of” quantifier. And so on.

Once you’ve accustomed yourself to the extreme logical limitations imposed on us, it comes as a bit of a shock when you realize that we still have the ability to describe ANY complicated mathematical concept. What I learned recently was that you can extend first-order logic by adding a “for uncountably many” quantifier, and still have a good deductive system for the logic! It says something that I was so impressed by this fairly moderate ability to distinguish between the countable and the uncountable. But it certainly does feel puzzling that while you can distinguish between the countable and the uncountable, you can’t distinguish between the finite and the infinite, which seems like a much more basic and fundamental division!

What’s even cooler is that the deductive system for this extended logic is extremely simple. It amounts to taking the axioms and inference rules of first-order logic, and then tacking on four additional axiom schemas. We’ll write “for uncountably many x, Φ(x)” as Ux Φ(x).

  1. ∀y∀z¬Ux (x = y ∨ x = z)
  2. ∀x (Φ(x) → Ψ(x)) → (Ux Φ(x) → Ux Ψ(x))
  3. Ux Φ(x) ↔ Uy Φ(y), where y isn’t free in Φ(x)
  4. Uy∃x Φ(x,y) → (∃xUy Φ(x,y) ∨ Ux∃y Φ(x,y))

Each of these axioms schemas is conceptually quite easy to make sense of. The first says that for any two elements y and z, there’s not uncountably many things that are each equal to one or the other. In other words, uncountable is bigger than two.

The second says that if the set of things that satisfy Φ is a subset of the set of things that satisfy Ψ, and uncountably many things satisfy Φ, then uncountably many things satisfy Ψ. In other words, if a set has an uncountable subset, then it must be uncountable itself.

The third is obvious enough that it doesn’t need explanation. The fourth, on the other hand, does. I find it helpful to think of Φ(x,y) as saying “x points at y.” Then Uy∃x Φ(x,y) becomes “uncountably many things are pointed at”, ∃xUy Φ(x,y) becomes “some person points at uncountably many things”, and Ux∃y Φ(x,y) becomes “there are uncountably many people pointing.”

So the fourth axiom just says that if uncountably many things are pointed at, then either somebody is pointing at uncountably many things, or there are uncountably many people pointing at things. Otherwise you’d only have countably many people pointing at countably many things each, and that’s not enough to get uncountably many things pointed at! So this fourth axiom can really be understood as saying that the union of countably many countable sets is countable.

✯✯✯

There’s a caveat to the above claims. This is that first-order logic extended by the “for uncountably many” quantifier only has a good deductive system IF the language is restricted to only allow countably many constant symbols. Here’s the reasoning:

First of all, if a logic has a sound, complete and finitary deductive system, then it is compact. Discussed here.

Second, if a logic L is compact and a set of sentences Σ in L has a countably infinite model, then Σ also has an uncountable model. Why? Simply append to L an uncountable set A of constant symbols, and add to Σ an uncountable set of sentences declaring that each of these constants refer to a distinct object. Now we have an uncountable model of Σ’s extension by compactness (the countably infinite model is a model of every finite subset of Σ’s extension), and this uncountable model must also satisfy Σ. (This is the only place where we make use of uncountably many constant symbols.)

Third, if a logic L is compact, and a set of sentences Σ in L has models of arbitrarily large finite size, then Σ has infinite models. The proof of this is virtually identical to the last paragraph; add infinitely many constant symbols and declare them all to refer to distinct objects. Every finite subset of this extension of Σ has a model, so by compactness the extension of Σ also has a model. This model is infinite by construction, and as a model of an extension of Σ must also be a model of Σ.

That sets up all the background model theory we need. Now, suppose that FOL+U had a good deductive system that applied even with uncountably many constant symbols. By soundness, completeness, and finiteness, FOL + U would be compact. Consider now the FOL+U theory T consisting of the single axiom ¬Ux (x = x). T cannot have uncountable models, by assumption that the quantifier Ux is really capturing the semantics of “for uncountably many x”.

But if T can’t have uncountable models, then it can’t have a countably infinite models either, by the second result above! So T doesn’t have any infinite models. And then by the third result, T cannot have models of arbitrarily large finite size! This means that asserting “there aren’t uncountably many objects”, we actually end up ruling out all countably infinite models, as well as all models larger than some finite size! This is inconsistent with the claim that the U quantifier is correctly capturing the semantics of “uncountably many”; after all, the negation of “for uncountably many” should be “for countably many”, not “for small enough finite”!

Summing up the argument:

  1. If a logic has a sound, complete, and finitary deductive system, then it is compact.
  2. If a logic L is compact, and a set of sentences Σ in L has a countable model, then Σ has an uncountable model.
  3. If a logic L is compact, and a set of sentences Σ in L has models of arbitrary finite size, then Σ has infinite models.
  4. FOL+U = First order logic + “for uncountably many” is sound, complete, and finitary.
  5. The FOL+U theory T = {¬Ux (x = x)} doesn’t have uncountable models.
  6. By 1 and 4, FOL+U is compact.
  7. By 2 and 6, if a set of sentences Σ in FOL+U has a countable model, then Σ has an uncountable model.
  8. Suppose T had countably infinite models. Then by 7, it would also have to have uncountable models, contradicting 5. So T doesn’t have countably infinite models.
  9. Suppose T had arbitrarily large finite models. Then by 3 and 6, T would have infinite models, contradicting 5 and 8. So T doesn’t have arbitrarily large finite models.
  10. Rephrasing 9, there’s some finite n such that T has no models of cardinality larger than n.

✯✯✯

What’s great about the logic FOL+U is that it allows you to talk about the property of uncountability, so long as your theories are all only countably large (which isn’t too bad a restriction). So for instance, consider the FOL+U theory consisting of the axioms of first-order Peano arithmetic plus one additional axiom: ¬Ux (x = x). This theory successfully rules out all uncountable models! This is something that you couldn’t do in ordinary first order logic, since it obeys the upward Lowenheim-Skolem property (a model of one infinite cardinality implies models of every larger infinite cardinality). The order type of all countable nonstandard models of PA is known (each is ω + ℤ⋅ℚ, a natural number line followed by countably many integer lines arranged like the rationals), and much of the difficulty of the model theory of PA involves the uncountable models. So this extension of PA is a big improvement over first-order PA!

Remember that ZFC has those pesky countable models, in which “the set of all subsets” turns out to be missing some subsets? We can get rid of all of those countable models with the FOL+U theory consisting of the axioms of ZFC + the additional axiom Ux (x = x). I’m pretty sure that this does not fix all the issues with correctly pinning down the notion of the power set — that would be too good to be true — but it is certainly an improvement!

A Coloring Problem Equivalent to the Continuum Hypothesis

Summary
Is there an infinity in between ℵ0 and 20? Is there a way to color the plane with countably many colors so that there are no monochromatic triangles lined up with the x and y axes? If these questions seem unrelated to you, then we had the same initial reaction. But it turns out that an intermediate infinity exists if and only if no such coloring exists.

CH and Coloring

Imagine taking the plane ℝ2 and coloring it entirely. You are allowed infinitely many colors, but only a countable infinity (so no using the whole continuous spectrum). Some example colorings:

The colorings don’t have to be contiguous on the plane like in the ones you just saw. They can have individual points of one color such that neighborhoods of arbitrarily small radius around them are colored differently. So we can get wacky colorings like:

Think about right triangles on the plane that line up with the x and y axes. For instance:

For each of these triangles, of the three sides one must be parallel to the x-axis and another to the y-axis. Now, for a given coloring, we can ask if there exist any triangles whose corners are all the same color. Importantly, we’re only interested in the three corner points, not the entire side length. For the earlier colorings we looked at, we can find such triangles fairly easily:

Even for our wacky coloring, it shouldn’t be too hard to find three points of the same color that meet the criterion.

Now here’s a question for you to ponder: is there any way to color the plane so that no such monochromatic right triangles exist?

I encourage you to pause here and try your best to come up with such a coloring. Use this as a chance to develop some intuitions on whether you think that a coloring like this should exist or not. It’s important to keep in mind that the amount of possible right triangles on the plane is 20, while the number of colors you have available to you is just ℵ0.

(…)

(pause for thought)

(…)

Now, I shall prove to you that the existence of such a coloring is equivalent to the Continuum Hypothesis! The rest of this post will be more involved, and even though I’ve tried to make it less scary with pictures aplenty, it will most likely take a slow and attentive read-through to grasp. But the proof is really cool, so it’s worth it. Oh and also, I have assumed some things as background knowledge for the sake of space. In particular, understanding the proof requires that you understand what ordinals are and why the set of all countable ordinals is the first uncountable ordinal.

If the Continuum Hypothesis is true, then a coloring with no monochromatic right triangles exists.

The proof outline: Consider the first uncountable ordinal, ω1, which is the set of all countable ordinals. We construct a coloring on ω1 × ω1 that satisfies the no-monochromatic-right-triangles property. If the Continuum Hypothesis is true, then |ω1| = |ℝ|, so we form a bijection from ω1 to ℝ. Finally, we use this bijection to map our coloring on ω1 × ω1 to a coloring on ℝ × ℝ, and show that coloring also satisfies the same property.

Alright, so consider ω1 × ω1. If we were to visualize it, it would look something this:

This “plane” is discrete, so in a certain sense it much more closely resembles ℕ2 than ℝ2. The set of colors we’ll be using will be labeled with integers. We split the colors into two categories, the negatives N and the positives P.

N = {-1, -2, -3, …}
P = {0, 1, 2, 3,…}

For each α in ω1, we construct two bijections f: α → N and g: α → P. These bijections exist because every element of ω1 is countable.

We use f and g to construct our coloring. For each α in ω1, we assign the color f(β) to (α, β) for each β < α, and we assign the color g(β) to (β, α), for each β ≤ α.

Since f and g are both bijections, no colors are repeated along the path from (α, 0) to (α, α) to (0, α). And we can easily show that no monochromatic right triangles exist under this coloring!

If a triangle crosses y = x, then it has at least one point colored from N and another from P. N and P are disjoint, so the triangle cannot be monochromatic.

And if a triangle doesn’t cross y = x, then it has two points lined up so as to have different colors:

So we have no monochromatic triangles! Now, assuming the Continuum Hypothesis, |ω1| = |ℝ|, so we can form a bijection M from ω1 to ℝ. We use M to map our original coloring to a new coloring on ℝ × ℝ. If (a, b) ∈ ω1 × ω1 is colored C, then we also color (M(a), M(b)) ∈ ℝ × ℝ. Under this map, non-monochromatic triangles stay non-monochromatic. So we’ve constructed a coloring on ℝ × ℝ with no monochromatic triangles!

If a coloring with no monochromatic right triangles exists, then the Continuum Hypothesis is true.

The proof outline here is to show that any infinite subset A of ℝ either has cardinality ℵ0 or 20. This shows that there are no intermediate cardinalities between these two.

Consider any infinite subset A ⊆ ℝ. For any a, define Ya to be the y-coordinates of points along the vertical line through (a, 0) that have no color repeats on that line. So Ya = {y ∈ ℝ | the color of (a, y) is different from (a, y’) for all y’ ≠ y}.

Notice that Ya must be countable, because all its elements have different colors and there are only countably many different colors to have. So |Ya| = ℵ0.

Define YA to be the union of Ya, for each a in A. Since each Ya is countable and A is at least ℵ0, |YA| ≤ |A|⋅ℵ0 = |A|. There are two cases: either YA = ℝ, or YA ≠ ℝ.

Case 1: YA = ℝ. In this case |ℝ| = |YA| ≤ |A|, so |A| ≥ |ℝ|. Since A is a subset of ℝ, this implies that |A| = |ℝ|.

Case 2: YA ≠ ℝ. In this case, there’s some real number y* in ℝ – YA. By definition y* is outside YA, so for each vertical line {x} × ℝ there is at least one point with the same color as (x, y*) besides itself.

Each (x, y*) with x ∈ A must have a different color. Proof: Suppose not. Then we have two elements of A, x1 and x2, such that the color of (x1, y*) is the same as the color of (x2, y*). By the last paragraph’s result, we can pick a third point (x1, y’) that has the same color as (x1, y*). These three points form a monochromatic right triangle, but we’ve assumed that’s impossible! Contradiction

So each point in A × {y} must have a different color. Thus, the cardinality of A × {y} must be countable. A is an infinite set, so |A| = ℵ0.

So in case one |A| = |ℝ| = 20, and in case two |A| = ℵ0. And this is true for every infinite subset A of ℝ. So every subset of ℝ is either finite, countable, or has the same cardinality as ℝ. And this is the continuum hypothesis!