A Compact Proof of the Compactness Theorem

I’ve written up a proof of the compactness theorem before, but I recently looked it over and think that the proof can be expressed more, heh heh, compactly, than before.

So, what is the compactness theorem? It is the statement that if a set of statements is finitely satisfiable (each of its finite subsets has a model), then it’s satisfiable. The converse of this (a satisfiable set of statements is also finitely satisfiable) is trivial: a model of a set of sentences is also a model of every subset of that set. The following proof will be for propositional logic, but can be easily extended to first order logic.

The short version

Suppose that a set of sentences A is finitely satisfiable. We’ll extend A to a larger set B by giving A an “opinion” on every sentence. We can build up this extension in a series of stages: B0 is just A. Now, take any sentence a. If B0 ⋃ {a} is finitely satisfiable, define B1 to be B0 ⋃ {a}. Otherwise, define B1 to be B0 ⋃ {¬a}. Either way, B1 will be finitely satisfiable, because B0 cannot be inconsistent with both a and ¬a. When we’ve gone through every sentence, we take the union of all these extensions to form B. B is finitely satisfiable, since every finite subset of B is also a finite subset of one of the extensions that make it up. Now, we define a truth assignment V that assigns True to each propositional variable p if and only if p is in B. V satisfies B (which you can show by induction on sentences), and since A is a subset of B, V also satisfies A. So A is satisfiable.

The long(er) version

Suppose that a set of sentences A is finitely satisfiable. If a set of sentences A’ is finitely satisfiable, then for any sentence a, at least one of A’ ⋃ {a} and A’ ⋃ {¬a} is finitely satisfiable. If neither were, then we’d have two finite sets, one that entails ¬a and the other that entails a, and the union of these would be an unsatisfiable finite subset of A’.) So given any well-ordering of the sentences of the language, we can extend A one sentence at a time, maintaining finite satisfiability at each step. The union of all these extensions, call it B, is still finitely satisfiable, because any finite subset of B is also a finite subset of one of the extensions. B is also complete: for any sentence b either b ∈ B or ¬b ∈ B.

Now, define the truth assignment V as follows: for any atomic sentence b, V(b) = True iff b ∈ B. V satisfies B, which we prove by induction on sentences:

  • If b is an atomic sentence, then V(b) = True iff b ∈ B, by construction.
  • If b = ¬c for some c (for which V(c) = True iff c ∈ B), then V(b) = True iff V(c) = False iff c ∉ B iff b ∈ B.
  • If b = c ∧ d for some c and d that satisfy the induction hypothesis, then V(b) = V(c ∧ d) = True iff V(c) = V(d) = True iff c ∈ B and d ∈ B. If both c and d are in B, then ¬(c ∧ d) can’t be in B by finite satisfiability, so c ∧ d ∈ B by completeness of B. And if c ∧ d ∈ B, then neither ¬c nor ¬d can be in B by finite satisfiability. So c and d are both in B by completeness of B.

This proof by induction covers the connectives ¬ and ∧, with which you can express all other connectives, so this shows that V satisfies B. And since B is a superset of A, V satisfies A as well. This shows that any finitely satisfiable set of sentences is also satisfiable. Note that we used the axiom of choice implicitly with the instruction to well-order the language. As the language can be any size, this is equivalent to the well-ordering principle, which is equivalent in ZF to the axiom of choice. The same sort of assumption arises in the proof of the completeness theorems for first-order and propositional logics. If you reject choice, then you should also be skeptical that propositional logic and first-order logics are complete!

ZFC, and getting the right answer by accident

There’s something I’m confused about with regards to ZFC. We already know that no first order theory allows you to perfectly capture the semantics of concepts like the natural numbers or finitely many. The best we can do if restricted to first-order logic (which is where most of modern mathematics tries to stay) is to have theories with surrogates for those concepts.

For instance, there’s this object called ω that stands in for the natural numbers in ZFC, even though there are all these models of ZFC in which ω contains many more objects besides the natural numbers (in fact, uncountably many in some models). And “X is finite” is replaced by “there’s a bijection from X to an element of ω.”

In formalizing these surrogate concepts, we make sure to not include any false statements about the concepts, which gives us a type of soundness of our surrogate concept. I.e., ZFC isn’t going to prove things about ω that are false of the set of natural numbers, because in one model ω is the set of natural numbers.

But it doesn’t give us completeness. We aren’t going to be able to prove ALL the true first-order sentences about the natural numbers, or about the concept of finiteness. (This is of course just a product of the first-order nature of the theories in which we’re defining these surrogates.)

So in each of these cases there will be true statements involving the concept that our theory will be agnostic about. We can look for “really good” surrogates, surrogates which allow us to prove most of the important and interesting true statements about the concept, and only fail in very abstract and unusual settings. The degree to which we can make good surrogates is the degree to which a first-order thinker can make sense of and usefully apply non-first-order concepts. (A first-order thinker being one that has no built-in understanding of non-first-order concepts.)

So one general question is: how good is a given surrogate? And another is, how do we know based on the axioms how good of a surrogate we’ll be getting? This is the thing I’m confused about.

In ZFC, there’s this weird phenomenon of the theory “getting the right answers accidentally.” It’s a little tough to put into words, but here’s an example:

ZFC can prove that |P(X)| > |X| for all X. So for instance ZFC can prove that |P(ω)| > |ω|. Meaning that ZFC can prove that the power set of the naturals is uncountable. But ZFC has countable models! (As does every first-order theory.) In those models, the power set of the naturals is NOT uncountable.

First order logic is sound, so what’s going on ISN’T that ZFC is proving a sentence that’s false in some of its models. It’s that the sentence is false in that model, if interpreted to be about the actual concept, and true in that model if interpreted to be about the surrogate concept. The surrogate for “P(ω) is uncountable” in ZFC is “there’s no bijection from P(ω) to ω”. And in the countable models of ZFC, the bijection that would prove the equinumerosity of ω and P(ω) is missing! So in those models, it’s actually TRUE that “there’s no bijection from P(ω) to ω”, even though P(ω) and ω really do have the same number of elements.

This is the sort of subtle nonsense you get used to in mathematical logic. Two accidents cancel out: first ZFC makes the mistake of having models where P(ω) is countable, and second it makes the mistake of losing track of the bijection from P(ω) and ω. And as a result, ZFC is able to prove the correct thing.

This seems really weird to me, and it’s an unsettling way for a supposed foundations of mathematics to be getting the right answers. This type of thing happens a lot, and it feels like we keep getting “lucky” in that our unintended interpretations of a sentence interfere with each other and cancel out their problematic features. It makes me wonder why we should be confident that ZFC will continue giving us the right answers (as opposed to being agnostic on important basic questions). And in fact we do have some examples of important and basic questions that ZFC is agnostic on, most dramatically that if |X| < |Y|, then |P(X)| < |P(Y)|!

It’s not that I doubt that ZFC’s surrogates for non-first order concepts end up allowing us to prove an enormous amount of the true facts about these concepts, because they clearly do. But is there some principled reason we can give for WHY the axioms of ZFC lead us to such a robust framework for mathematics in which we can prove many true statements?

One thing that suggests this is the power of the axiom of replacement. It’s just an extremely strong and useful axiom that allows us to prove a whole lot. But that doesn’t seem to help explain the “right-by-accident” phenomenon. So what does?

Polish Notation and Garden-Path Sentences

Polish notation is a mathematical notation system that allows you to eliminate parentheses without ambiguity. It’s called “Polish” because the name of its Polish creator, Jan Łukasiewicz, was too difficult for people to pronounce.

A motivating example: Suppose somebody says “p and q implies r”. There are two possible interpretations of this: “(p and q) implies r” and “p and (q implies r)”. The usual way to disambiguate these two is to simply add in parentheses like I just did. Another way is to set an order-of-operations convention, like that “and” always applies before “implies”. This is what’s used in basic algebra, and what allows you to write 2 + 2 ⋅ 4 without any fear that you’ll be interpreted as meaning (2 + 2) ⋅ 4.

Łukasiewicz’s method was to make all binary connectives into prefixes. So “A and B” because “and A B”, “P implies Q” becomes “implies P Q”, and so on. In this system, “(p and q) implies r” translates to “implies and p q r”, and “p and (q implies r)” translates to “and p implies q r”. Since the two expressions are different, there’s no need for parentheses! And in general, no ambiguity ever arises from lack of parentheses when using Polish notation.

If this is your first time encountering Polish notation, your first reaction might be to groan and develop a slight headache. But there’s something delightfully puzzling about reading an expression written in Polish notation and trying to understand what it means. Try figuring out what this means: “implies and not p or q s r”. Algebra can be written in Polish notation just as easily, removing the need for both parentheses AND order-of-operations. “2 + 2 = 4” becomes “+ 2 2 = 4”, or even better, “= + 2 2 4”.

Other binary connectives can be treated in Polish notation as well, creating gems like: “If and you’re happy you know it clap your hands!” “When life is what happens you’re busy making plans.” “And keep calm carry on.” “Therefore I think, I am.” (This last one is by of the author the Meditations). Hopefully you agree with me that these sentences have a nice ring to them, though the meaning is somewhat obscured.

But putting connectives in front of the two things being connected is not unheard of. Some examples in English: “ever since”, “because”, “nonwithstanding”, “whenever”, “when”, “until”, “unless”. Each of these connects two sentences, and yet can appear in front of both. When we hear a sentence like “Whenever he cheated on a test the professor caught him”, we don’t have any trouble parsing it. (And presumably you had no trouble parsing that entire last sentence either!) One could imagine growing up in a society where “and” and “or” are treated the same way as “ever since” and “until”, and perhaps in this society Polish notation would seem much more natural!

Slightly related to sentential connectives are verbs, which connect subjects and objects. English places its verbs squarely between the subject and the object, as does Chinese, French, and Spanish. But in fact the most common ordering is subject-object-verb! 45% of languages, including Hindi, Japanese, Korean, Latin, and Ancient Greek, use this pattern. So for instance, instead of “She burned her hand”, one would say “she her hand burned”. This is potentially weirder to English-speakers than Polish notation; it’s reverse Polish notation!

9% of languages use Polish notation for verbs (the verb-subject-object pattern). These include Biblical Hebrew, Arabic, Irish, and Filipino. In such languages, it would be grammatical to say “Loves she him” but not “She loves him”. (3% of languages are VOS – loves him she – 1% are OVS – him loves she – and just a handful are OSV – him she loves).

Let’s return to English. Binary prepositions like “until” appear out front, but they also swap the order of the two things that they connect. For instance, “Until you do your homework, you cannot go outside” is the same as “You cannot go outside until you do your homework”, not “You do your homework until you cannot go outside”, which sounds a bit more sinister.

I came up with some examples of sentences with several layers of these binary prepositions to see if the same type of confusion as we get when examining Polish notation for “and” or “implies” sets in here, and oh boy does it.

Single connective
Since when the Americans dropped the bomb the war ended, some claimed it was justified.

Two connectives, unlayered
Since when the Americans dropped the bomb the war ended, when some claimed it was an atrocity others argued it was justified.

Still pretty readable, no? Now let’s layer the connectives.

One layer
Whenever he was late she would weep.
She would weep whenever he was late.

Two layers
Since whenever he was late she would weep, he hurried over.
He hurried over, since she would weep whenever he was late.

Three layers
Because since whenever he was late she would weep he hurried over, he left his wallet at home.
He left his wallet at home, because he hurried over since she would weep whenever he was late.

Four layers
Because because since whenever he was late she would weep he hurried over he left his wallet at home, when he was pulled over the officer didn’t give him a ticket.
The officer didn’t give him a ticket when he was pulled over, because he left his wallet at home because he hurried over since she would weep whenever he was late.

Five layers
When he heard because because since whenever he was late she would weep he hurried over he left his wallet at home, when he was pulled over the officer didn’t give the man a ticket, the mayor was outraged at the lawlessness.
The mayor was outraged at the lawlessness when he heard the officer didn’t give the man a ticket when he was pulled over because he left his wallet at home because he hurried over since she would weep whenever he was late.

Read that last one out loud to a friend and see if they believes you that it makes grammatical sense! With each new layer, things become more and more… Polish. That is, indecipherable. (Incidentally, Polish is SVO just like English). Part of the problem is that when we have multiple layers like this, phrases that are semantically connected can become more and more distant in the sentence. It reminds me of my favorite garden-path sentence pattern:

The mouse the cat the dog chased ate was digested.
(The mouse that (the cat that the dog chased) ate) was digested.
The mouse (that the cat (that the dog chased) ate) was digested.

The phrases that are meant to be connected, like “the mouse” and “was digested” are sandwiched on either side of the sentence, and can be made arbitrarily distant by the addition of more “that the X verbed” clauses.

Does anybody know of any languages where “and” comes before the two conjuncts? What about “or”? English does this with “if”, so it might not be too much of a stretch.

Defining uncountability

Most of the shocking results in mathematical logic have to do with how limited we are in terms of producing good deductive systems for mathematical concepts. (Good here means sound, complete, finitary, and computable.) No logic with a good deductive system can categorically define the natural numbers. Adding a “for finitely many” quantifier to first-order logic makes it impossible to have a good deductive system. Ditto for a “for all subsets of” quantifier. And so on.

Once you’ve accustomed yourself to the extreme logical limitations imposed on us, it comes as a bit of a shock when you realize that we still have the ability to describe ANY complicated mathematical concept. What I learned recently was that you can extend first-order logic by adding a “for uncountably many” quantifier, and still have a good deductive system for the logic! It says something that I was so impressed by this fairly moderate ability to distinguish between the countable and the uncountable. But it certainly does feel puzzling that while you can distinguish between the countable and the uncountable, you can’t distinguish between the finite and the infinite, which seems like a much more basic and fundamental division!

What’s even cooler is that the deductive system for this extended logic is extremely simple. It amounts to taking the axioms and inference rules of first-order logic, and then tacking on four additional axiom schemas. We’ll write “for uncountably many x, Φ(x)” as Ux Φ(x).

  1. ∀y∀z¬Ux (x = y ∨ x = z)
  2. ∀x (Φ(x) → Ψ(x)) → (Ux Φ(x) → Ux Ψ(x))
  3. Ux Φ(x) ↔ Uy Φ(y), where y isn’t free in Φ(x)
  4. Uy∃x Φ(x,y) → (∃xUy Φ(x,y) ∨ Ux∃y Φ(x,y))

Each of these axioms schemas is conceptually quite easy to make sense of. The first says that for any two elements y and z, there’s not uncountably many things that are each equal to one or the other. In other words, uncountable is bigger than two.

The second says that if the set of things that satisfy Φ is a subset of the set of things that satisfy Ψ, and uncountably many things satisfy Φ, then uncountably many things satisfy Ψ. In other words, if a set has an uncountable subset, then it must be uncountable itself.

The third is obvious enough that it doesn’t need explanation. The fourth, on the other hand, does. I find it helpful to think of Φ(x,y) as saying “x points at y.” Then Uy∃x Φ(x,y) becomes “uncountably many things are pointed at”, ∃xUy Φ(x,y) becomes “some person points at uncountably many things”, and Ux∃y Φ(x,y) becomes “there are uncountably many people pointing.”

So the fourth axiom just says that if uncountably many things are pointed at, then either somebody is pointing at uncountably many things, or there are uncountably many people pointing at things. Otherwise you’d only have countably many people pointing at countably many things each, and that’s not enough to get uncountably many things pointed at! So this fourth axiom can really be understood as saying that the union of countably many countable sets is countable.

✯✯✯

There’s a caveat to the above claims. This is that first-order logic extended by the “for uncountably many” quantifier only has a good deductive system IF the language is restricted to only allow countably many constant symbols. Here’s the reasoning:

First of all, if a logic has a sound, complete and finitary deductive system, then it is compact. Discussed here.

Second, if a logic L is compact and a set of sentences Σ in L has a countably infinite model, then Σ also has an uncountable model. Why? Simply append to L an uncountable set A of constant symbols, and add to Σ an uncountable set of sentences declaring that each of these constants refer to a distinct object. Now we have an uncountable model of Σ’s extension by compactness (the countably infinite model is a model of every finite subset of Σ’s extension), and this uncountable model must also satisfy Σ. (This is the only place where we make use of uncountably many constant symbols.)

Third, if a logic L is compact, and a set of sentences Σ in L has models of arbitrarily large finite size, then Σ has infinite models. The proof of this is virtually identical to the last paragraph; add infinitely many constant symbols and declare them all to refer to distinct objects. Every finite subset of this extension of Σ has a model, so by compactness the extension of Σ also has a model. This model is infinite by construction, and as a model of an extension of Σ must also be a model of Σ.

That sets up all the background model theory we need. Now, suppose that FOL+U had a good deductive system that applied even with uncountably many constant symbols. By soundness, completeness, and finiteness, FOL + U would be compact. Consider now the FOL+U theory T consisting of the single axiom ¬Ux (x = x). T cannot have uncountable models, by assumption that the quantifier Ux is really capturing the semantics of “for uncountably many x”.

But if T can’t have uncountable models, then it can’t have a countably infinite models either, by the second result above! So T doesn’t have any infinite models. And then by the third result, T cannot have models of arbitrarily large finite size! This means that asserting “there aren’t uncountably many objects”, we actually end up ruling out all countably infinite models, as well as all models larger than some finite size! This is inconsistent with the claim that the U quantifier is correctly capturing the semantics of “uncountably many”; after all, the negation of “for uncountably many” should be “for countably many”, not “for small enough finite”!

Summing up the argument:

  1. If a logic has a sound, complete, and finitary deductive system, then it is compact.
  2. If a logic L is compact, and a set of sentences Σ in L has a countable model, then Σ has an uncountable model.
  3. If a logic L is compact, and a set of sentences Σ in L has models of arbitrary finite size, then Σ has infinite models.
  4. FOL+U = First order logic + “for uncountably many” is sound, complete, and finitary.
  5. The FOL+U theory T = {¬Ux (x = x)} doesn’t have uncountable models.
  6. By 1 and 4, FOL+U is compact.
  7. By 2 and 6, if a set of sentences Σ in FOL+U has a countable model, then Σ has an uncountable model.
  8. Suppose T had countably infinite models. Then by 7, it would also have to have uncountable models, contradicting 5. So T doesn’t have countably infinite models.
  9. Suppose T had arbitrarily large finite models. Then by 3 and 6, T would have infinite models, contradicting 5 and 8. So T doesn’t have arbitrarily large finite models.
  10. Rephrasing 9, there’s some finite n such that T has no models of cardinality larger than n.

✯✯✯

What’s great about the logic FOL+U is that it allows you to talk about the property of uncountability, so long as your theories are all only countably large (which isn’t too bad a restriction). So for instance, consider the FOL+U theory consisting of the axioms of first-order Peano arithmetic plus one additional axiom: ¬Ux (x = x). This theory successfully rules out all uncountable models! This is something that you couldn’t do in ordinary first order logic, since it obeys the upward Lowenheim-Skolem property (a model of one infinite cardinality implies models of every larger infinite cardinality). The order type of all countable nonstandard models of PA is known (each is ω + ℤ⋅ℚ, a natural number line followed by countably many integer lines arranged like the rationals), and much of the difficulty of the model theory of PA involves the uncountable models. So this extension of PA is a big improvement over first-order PA!

Remember that ZFC has those pesky countable models, in which “the set of all subsets” turns out to be missing some subsets? We can get rid of all of those countable models with the FOL+U theory consisting of the axioms of ZFC + the additional axiom Ux (x = x). I’m pretty sure that this does not fix all the issues with correctly pinning down the notion of the power set — that would be too good to be true — but it is certainly an improvement!