# First Order Truth Trees

The analytic tableaux (also called truth tree) style of proof system is really nice. The tree structure tends to resemble the actual process of mathematical reasoning better than the Hilbert-style linear proof systems, and I find that the formal proofs are much easier to follow. For first-order logic (without equality), the system involves four new inference rules to deal with quantifiers. There are a few different logically equivalent choices for these rules, but here is my favorite:

Universal instantiation ∀
∀x φ(x)

φ(t)
(where t is any closed term – any term without free variables)

Existential instantiation ∃
∃x φ(x)

φ(a)
(where a is any new constant)

Negated universal ¬∀
¬∀x φ

∃x ¬φ

Negated existential ¬∃
¬∃x φ

∀x ¬φ

The instantiation rules are actually quite interesting. Suppose that you’re working in the language of ZFC. ZFC has no constant or function symbols, only a single binary relation symbol: ∈, interpreted as “is an element of”. This seems to raise a problem for using the instantiation rules. For universal instantiation, we can only apply it to closed terms. But a language with no constants has no closed terms! And looking at existential instantiation, there’s a similar problem: we’re supposed to derive φ(a), where a is a new constant, but our language has no constants!

So what’s up? Does the standard first-order analytical tableaux approach not work for simple languages like ZFC? Insofar as ZFC is a shining beacon of what first-order logic can accomplish, this is a little disconcerting.

Actually, the instantiation rules work exactly as stated for ZFC. The subtlety in their interpretation is that when running proofs, you are allowed to (and in fact, often must) expand the language. So when you do existential instantiation, you don’t choose a constant symbol that already exists inside your language. Instead, you add a totally new constant symbol to your language, and then declare that φ holds of this symbol. Similarly, you can do universal instantiation in a language with no closed terms, simply by expanding the language to include some constants.

Now, if we’re expanding the language in the course of the proof, then who’s to say that in the end our proof will still be valid for our original language? Well, suppose we’re working in some minimal language L that has no constants. Let T be our background theory (the set of non-tautologies that we’re taking as assumptions). When we do existential instantiation, we always create a new constant. Let’s call it a. We then expand the language into L’ = L ⋃ {a}, and we expand our assumptions to T’ = T ⋃ {φ(a)}. When we do universal instantiation, we either use an existing closed term or create a new one. In the second case, we create a new constant b and form a new language L’ = L ⋃ {b}. We also expand our assumptions to a new set T’ = T ⋃ {φ(b)}.

The important thing to note is that we haven’t done anything that invalidates any models of T! If T originally contained a sentence of the form ∀x φ(x), then adding c and declaring φ(c) doesn’t conflict with this. And if T originally contained a sentence of the form ∃x φ(x), then in every model of T at least one thing satisfies φ. So when we add a new constant c, that constant can just refer back to any of these φ-satisfiers.

You might think: “But hold on! How can we be sure that it’s safe to just add new constants? Couldn’t we expand the domain too much, in a way that’s inconsistent with our original theory?” The answer to this is that the domain doesn’t have to expand to accommodate new constants! These constants can refer to existing elements of the domain. For instance, suppose T contains six existential statements and a sentence that says there are exactly five objects. Each time we run existential instantiation on one of the six existential sentences, we create a new constant. So we’ll get six new constants. But these constants can refer to the same value! And since our theory already says that there are five objects, the models of our expanded theory will also contain exactly five objects, meaning that in every model of our original theory, the new constants will refer to elements of the domain that already exist. No domain expansion!

Ok, but how do we know that there isn’t some very complicated sentence ψ that was originally true in every model of T, but becomes false in some models of T’? To fully prove that this never happens, we could do induction over all sentences in the language, showing that any sentence that is entailed by T is also entailed by T’. But considering the details of the expansion process, and trying to come up with ways that the expansion might fail, is a good way to get an intuition for why this proof system works.

I’ll close with a few examples of using analytic tableaux to prove basic results in PA and ZFC. Think of this as a proof of concept! (Each of these proofs also uses some rules regarding equality, which are obvious enough despite that I haven’t explicitly defined them).

First, we show that PA proves ∀x (0 + x = x). This is nontrivial, despite that ∀x (x + 0 = x) is an axiom!

Next, a proof from ZFC that the empty set exists and is unique:

Now a proof from PA that addition is commutative. This one is shockingly complicated for such a simple concept, and requires three uses of induction, but this goes to show how basic the axioms of PA really are! If you look closely, you’ll also notice that this proof replicates the proof of ∀x (0 + x = x) from above!

Final example, here’s a PA proof that no number is its own successor. This rules out nonstandard models of arithmetic with one nonstandard number!

# One nonstandard is worth infinitely many standards

Suppose that M is a nonstandard model of true arithmetic (the set of all first-order sentences in the language of PA that are true in the standard model of arithmetic, ℕ). Now, take any sentence φ(x) with one free variable. Suppose that there’s some nonstandard number k in M such that φ holds of k. Since k is larger than every standard natural, the following infinite set of sentences are all true:

∃x (x > 0 ∧ φ(x))
∃x (x > 1 ∧ φ(x))
∃x (x > 2 ∧ φ(x))

∃x (x > 1000000 ∧ φ(x))

Since these sentences are true in M and M is a model of true arithmetic, these sentences must also be true in the standard model ℕ. So it must be true for every standard natural that there’s a larger standard natural that satisfies φ. In other words, you can guarantee that there are infinitely many standard naturals that satisfy a property φ just by finding a single nonstandard number k that satisfies φ in a model of true arithmetic!

Furthermore, since in ℕ it is true that every standard natural has a larger standard natural satisfying φ, the sentence ∀x ∃y (y > x ∧ φ(y)) is true in ℕ. So this sentence must be true in every model of true arithmetic, including M! This means that just by finding a single nonstandard satisfying φ, you can immediately be sure that there are infinitely many standard numbers AND infinitely many nonstandard numbers (in every nonstandard model of TA!) satisfying φ. This is pretty dramatic!

As an example, consider the twin prime conjecture. We can construct the predicate isPrime(x) in first-order PA with the formula ∀y (∃z (y⋅z = x) → (y=1 ∨ y=x)). Then the predicate isTwinPrime(x) is just: isPrime(x) ∧ isPrime(x+2). Now the twin prime conjecture just says that ∀x ∃y (y > x ∧ isTwinPrime(y)), which is exactly the form we saw in the last paragraph! So to prove the twin prime conjecture, it suffices to demonstrate a single nonstandard twin prime in a model of true arithmetic.

# This post is lying to you

In classical logic there are exactly two truth values, True and False, and every sentence has exactly one of these truth values. Consider the Liar sentence: “This sentence is false.” If this sentence is True, then it must be False. And if it’s False, then it must not be False. If we apply classical logic to this sentence, then it must have one of the two truth values. But whichever way we go, we find ourselves in trouble. And so the Liar sentence glitches out classical logic and produces an error.

But not so fast. For the Liar sentence to glitch out classical logic, it must first be the case that the sentence can actually be produced in classical logic. We start with an English sentence “This sentence is not true” and reason intuitively about what classical logic would do with this sentence, but we must be able to form this sentence in classical logic in order for classical logic to have anything to say about it.

There are two major hurdles with importing “This sentence is not true” into classical logic: translating “this sentence” and translating “is not true”. The first requires us to be able to have self-referential sentences. It’s not so obvious to see how we accomplish this with the machinery of classical logic. The second requires us to have a truth predicate. This perhaps seems prima facie easier to do, but it turns out that it has its own host of issues.

Let’s deal with the second issue first: how do we make sense of “is not true”? We’ll start by looking at “is true” (once we have this, then we can just negate it to get “is not true”). So, how do we make sense of “is true”?

When we say “X is true”, we aren’t assigning equality between an object X and an object True. We’re instead attributing the property of truthiness to the sentence X. So “is true” seems to act kind of like a predicate in first-order logic. But there’s a problem. Predicates apply to objects in the domain of a model, not to sentences in the language itself. For instance, the predicate “is red” might apply to firetrucks and Clifford, as objects in the universe, but it’s not going to apply to sentences in the language.

With this in mind, it appears that “is true” is actually more like a modal operator. It applies to sentences, not objects. We could introduce a modal operator T such that TX is interpreted as “X is true”, and add an axiom schema that says for every sentence φ, φ ↔ Tφ. The problem with this is that we will never get self reference with this approach. We want to create a sentence P that says “P is not true”. We could try something like P ↔ ¬TP, but this ends up just being a false sentence, not paradoxical. What’s missing is that the sentence “P ↔ ¬TP” is not equivalent to the sentence P: they’re just different sentences.

So the modal approach to interpreting “is true” has failed for our purposes. It’s simply not subtle enough to allow us to express self-reference. So let’s return to the predicate approach. The problem was that predicates apply to objects, not sentences. But what if the sentences were themselves objects? Of course, the sentences cannot literally be objects: they are purely syntactical items, whereas objects exist in the semantics (the interpretation of the language). But each sentence could have some sort of representative object in the domain.

What Gödel showed is that this is indeed possible. He designed a coding technique such that every sentence in the language gets assigned a particular natural number, and no two sentences have the same number. And if sentences correspond to numbers, then properties of those sentences can be translated into properties of those numbers!

Now if our language is sufficiently expressive to talk about natural number arithmetic, then our sentences can express properties of other sentences! In other words, we want a theory in a logic that has ℕ as a model. And we also want it to be sufficiently expressive to be able to talk about properties of numbers, like “being prime” or “being twice-divisible by 7”. Then we can imagine a predicate True(x), such that True(x) is True if and only if the sentence encoded by the number x is True.

For notational convenience, we’ll write “the number that encodes the sentence P” as ⟦P⟧. Then what we want of our truth predicate is that for every sentence φ, True(⟦φ⟧) ↔ φ.

Now, returning to the Liar sentence, we’ve dealt with “is true”, but now have to deal with “this sentence”. Remember that we want a sentence φ that asserts that the truth predicate does not apply to itself. In other words, we want φ to be the same thing as ¬True(⟦φ⟧). But how can this be? Clearly these are two different sentences, no?

Well, it’s not so obvious that φ and ¬True(⟦φ⟧) are actually distinct sentences. Remember that ⟦φ⟧ is just some number. So the sentence φ might be ¬True(9129828475651384). This is only a genuine liar sentence if 9129828475651384 encodes the sentence φ.

So really what we need to do is to look for some natural number n such that the sentence encoded by n is exactly “¬True(n)”. This would be a sentence which if true must be false, and if false must be true. It’s not at all obvious that such a natural number exists. But in 1934, Carnap proved the diagonal lemma, the tool necessary to construct such a number.

The diagonal lemma says that in any theory that can express natural number arithmetic (specifically, a theory that can define all primitive recursive functions), and for every predicate P(x), there’s a sentence ψ such that ψ ↔ P(⟦ψ⟧) is provable. Let P(x) be equal to ¬True(x), and we get that there’s a sentence ψ such that ψ ↔ ¬True(⟦ψ⟧) is provable!

In other words, there’s a sentence ψ encoded by a number n, such that ψ is true if and only if ¬True(n). This is exactly the liar paradox! We’ve succeeded at sneaking in a contradiction to classical logic! So what does this mean? Is classical logic ultimately all inconsistent? Do we need to rebuild logic from the ground up?

Not quite! Notice that to actually get to the point where we could express the Liar sentence, we had to take on a lot of assumptions. Let’s list them all out:

1. Our language allows us to express natural number arithmetic.
2. Our theory of natural numbers is strong enough to allow Gödel coding.
3. Our theory of natural numbers is strong enough to express every primitive recursive function.
4. There is a truth predicate.

From these assumptions, we were able to prove an inconsistency. But this doesn’t mean that classical logic is therefore inconsistent! Rather, it means that any consistent theory has to violate at least one of these assumptions! In particular, if we have a consistent theory that allows us to do both Gödel coding and to express primitive recursive functions, then this theory cannot have a truth predicate!

It’s important to understand that #4 here really is an assumption. When I described a truth predicate, I said “we can imagine that such a predicate exists.” I never showed you how to explicitly construct it! We could always explicitly add in a truth predicate T to a theory of arithmetic, and then assert as axioms φ ↔ T(⟦φ⟧) for every sentence φ. And the above line of reasoning shows us that doing so will render our theory inconsistent. If we don’t explicitly add in a truth predicate, then we could try to construct it from primitive relation and function symbols of the language. But the above line of reasoning shows us that no matter how hard we try, we will never succeed in this construction!

It’s interesting to note that (2) and (3) are actually different assumptions. (3) implies (2), but (2) doesn’t imply (3). In other words, you can have very weak theories of arithmetic that are expressive enough to do Gödel coding, but not expressive enough to prove the diagonal lemma! The amazing feature of these theories is that it’s possible for them to prove their own consistency without becoming inconsistent!

Finally, notice that the diagonal lemma was quite a bit more powerful than what we strictly required for our reasoning above. In particular, it allowed us to talk about ANY predicate whatsoever, not just a truth predicate. Consider what happens when instead of using “is true” as our predicate, we use “is provable”. You might get a somewhat interesting result!

# Some half-baked thoughts on moral arbitrariness

I find that there’s often a crucial assumption implicit in discussions of abortion ethics. It comes up at mentions of when personhood arises in the development from zygote to fetus to baby. One person claims that some particular moment is the threshold at which personhood arises. The other points out that zooming in to that moment and looking at extremely nearby moments, we see no particular reason to privilege one over the others. This arbitrariness is taken to be a fatal blow for the account of personhood.

This raises an interesting question. Could the fundamental moral laws be arbitrary? By analogy, think about the laws of physics. The laws of physics contain certain parameters like the gravitational constant G and me/mp, the ratio of the mass of an electron to the mass of a proton, whose values are likely arbitrary to some degree. Even taking into account fine-tuning for life, it’s probable that the fine-tuning isn’t infinitely precise and there’ll be some level of arbitrariness in the 1000th decimal place value of G.

If the laws of physics can be arbitrary, why not the laws of morality? Perhaps there’s just one arbitrary point at which moral personhood emerges, and there’s not much motivation for that point over any other. How strange you consider this to be will likely depend on what your meta-ethical theory is. Trivially, if you don’t think there are moral facts at all, then this puzzle never even arises for you. If you think there are moral facts, but there’s somehow socially or biologically determined, then it’s not so puzzling that there would be arbitrariness in the moral facts. But if you’re a moral realist that believes in an objectively true set of laws governing morality, then this view starts to look strange.

Among moral objectivists, it seems to me like anti-Humeans would not be okay with arbitrariness in the laws of morality. In meta-ethics, anti-Humeans are those who believe that moral facts are intrinsically motivating. This doesn’t mesh well with arbitrariness. If the moral laws are arbitrary, then why should I follow them rather than a neighboring set of laws that work just as well? Almost by definition, arbitrariness in the moral laws implies a lack of motivation, both motivation for the letter of the laws and motivation to live by the laws. On the other hand, if one takes a Humean stance on meta-ethics, perhaps arbitrariness is not so puzzling.

Moral arbitrariness might also be troubling to divine command theorists, who believe that the moral rules are set by God. There’s something that seems quite strange about saying that God’s commands are arbitrary to some extent (though to be fair, I say this from a very atheistic perspective, so perhaps my intuitions differ from theists here). But if this feels strange, then why shouldn’t it feel just as strange to say that the laws of the physical universe are arbitrary? Presumably God also decided on the precise values of all the physical parameters, and there seems to be arbitrariness there. Is there something particularly troubling about the idea that God’s choice of moral laws is arbitary?

Moral arbitrariness seems like an inevitable consequence of most, maybe all, moral systems. A rights-based approach has to deal with tradeoffs between different rights: how severe a breach of bodily autonomy is severe enough that it’s better to violate a person’s right to life? Any binary account of personhood seems bound to end up drawing the line at some arbitrary point. And gradualist accounts of personhood come with their own type of arbitrariness: why should the curve of increasing personhood with time look like precisely this, rather than some other very similar curve? Virtue theoretic approaches talk about virtues arising in a happy medium between two vices (e.g. bravery arising between cowardice and foolhardiness), but where is the precise middle point? If one were to completely codify virtue ethics, they would have to say precisely what level of riskiness is bravery, and when it tips over into foolhardiness. But there will always be thought experiments that place you just barely on either side of this threshold and reveal that there is no apparent moral difference between one side and the other.

Perhaps the framework that has the least trouble with moral arbitrariness is consequentialism. Something like utilitarianism says that the threshold for when you should choose Act 1 over Act 2 is exactly when the expected net utility produced by Act 1 exceeds the expected net utility produced by Act 2 (where utility is something like “happiness minus sadness”). Unfortunately, I think that this approach runs in to problems as well. Happiness is not one-dimensional, and neither is suffering. How do you make different types of happiness commensurable? How many sips of hot chocolate are equivalent to a roller-coaster ride? How many minutes in front of a fire on a cold night are equivalent to the moment of insight when you solve a tough mathematical problem? I find it hard to imagine that non-arbitrary answers to these types of questions exist.

If it’s true that most all moral frameworks contain fundamental arbitrariness, as I believe it is, then I think that this turns into a powerful argument against many types of moral realism. If you’re an anti-Humean, then you have to either deny the arbitrariness or explain why arbitrary moral laws would be intrinsically motivating to us. If you think that God created the moral laws, then you have to reckon with the apparent arbitrariness of those laws. Presumably God always makes the optimal choice when one exists, but what does God do when faced with a choice where there is no optimum?

# Asymmetry between the infinite and finite

This is well-trod territory for this blog, but I want to give a little reminder of some of the subtleties involving infinity in logic. In particular, there’s a fun asymmetry between the finite and the infinite in terms of the expressive power of first order logic.

Consider the set of all first order sentences that are true in every infinite model. This includes any tautology (like P ∨ ¬ P), as well as some other sentences like ∃x ∃y (x ≠ y) (which says that there are at least two distinct things). Are there any finite models that satisfy this entire set of sentences?

Answer is no: for any finite n you can form a sentence that says “there are at least n things”. E.g. for n = 4 we say ∃x ∃y ∃z ∃w (x≠y ∧ x≠z ∧ x≠w ∧ y≠z ∧ y≠w ∧ z≠w). This basically says “there exists an x, y, z, and w that are all distinct”, i.e. there are at least four things. This can be easily generalized to any finite n. Sentences of this form all appear in our set, because they’re true in every infinite model. But of course, no finite model satisfies all of them. Any finite model contains only so many elements, say N, and the sentence which says “there are at least N+1 things” will be false in this model.

What about the reverse question? If you consider the set of all first order sentences that are true in every fiinite model, are there any infinite models that satisfy this set?

The answer is yes! If there weren’t, then you’d have a collection of sentences that suffice to rule out all infinite models while leaving all finite models. And ruling out all and only the infinite models of a theory turns out to be like a superpower that’s inconsistent with any logic that’s sound and complete like first-order logic. The reasoning, in ultra-condensed form: finitary proofs + soundness + completeness imply compactness, and compactness gives you a quick proof that any theory with arbitrarily large finite models has infinite models (add an infinity of constant symbols to your theory and declare all of them unequal. Since every finite subset of these declarations is satisfied by some model, compactness says that the whole lot is satisfied by some model, which is an infinite model of the starting theory).

This is an interesting asymmetry between the finite and the infinite. If you assert every sentence that’s true in all infinite models, you’ve automatically excluded all finite models. But if you assert every sentence that’s true in all finite models, you haven’t excluded all infinite models. A listener that patiently waited for you to finish saying all those infinitely many sentences might still take you to be talking about some infinite model. (And in fact, by the Lowenheim-Skolem theorem, they could understand you to be talking about a model of any infinite cardinality, which is pretty dramatic.)

TL;DR: There are no finite models that satisfy the set of all first-order sentences that are true in every infinite model. On the other hand, there are infinite models that satisfy the set of all first-order sentences that are true in every finite model.

# PA doesn’t trust its provability predicate

Here’s a fun thing: PA has a predicate Provable(n) such that when fed a number n, it returns true if and only if the sentence coded by that number is provable from PA. In other words:

PA ⊢ X iff PA ⊢ Provable(⟦X⟧)

Here we’re using two notational conveniences: ⟦X⟧ means “the number that is the Gödel code of the sentence X”, and PA ⊢ X means “PA proves X”.

Now, if PA can prove something, then it’s true in every model of PA. So perhaps we should expect that for every X, Provable(⟦X⟧) implies that X is true. Or more precisely, Provable(⟦X⟧) → X But PA can’t prove this! Löb’s theorem says that “Provable(⟦X⟧) → X” is provable only when X is provable. So if PA could prove that for every X, then it could also prove every X and would thus be inconsistent!

In other words, if PA is consistent then the following two things are both true:

(1) PA ⊢ X iff PA ⊢ Provable(⟦X⟧)
(2) PA ⊬ Provable(⟦X⟧) → X

By the deduction theorem for first order logic, this also implies that PA conjoined with the assumption that Provable(⟦X⟧) is not in general sufficient to prove X. To put the weirdness in non-technical terms, (2) says that PA doesn’t trust its provability predicate to ensure truth, while (1) says that what PA can prove about its provability predicate is exactly the true statements about provability.

What’s up with this? The explanation is kinda subtle: PA proves X if and only if PA proves Provable(⟦X⟧). But nonetheless “PA proves X” is not equivalent to Provable(⟦X⟧). “PA proves X” implies Provable(⟦X⟧), but it’s possible for Provable(⟦X⟧) to be true (just not provable!) despite PA not actually proving X.

To put it another way:

PA ⊢ X
iff
PA ⊢ Provable(⟦X⟧)
iff
Provable(⟦X⟧) is true in every model of PA

But it’s possible for Provable(⟦X⟧) to be true in some models of PA and false in others! “True in every model of PA” is much stronger than “true in at least one model of PA.”

Best way to think about this is probably in terms of nonstandard models. The statement Provable(⟦X⟧) is an existential statement: It says “there exists a number n that encodes a proof of X from PA”. In nonstandard models, there exist a bunch of nonstandard numbers, and some of these numbers trick our Gödel-encoding scheme, convincing the model that they are actually proofs. So it’s possible in a nonstandard model for Provable(⟦X⟧) to be true and X false. But for the sentences X for which this is true, PA doesn’t actually prove that they’re provable! Why? Because PA can only proves things that are true in every model, and Provable(⟦X⟧) is false in the standard model.

This holds quite generally. Suppose you have any theory T that can express arithmetic. If T is consistent, then no provability predicate can have all four of the following properties:

1. If T ⊢ X then T ⊢ Prov(X)
2. T ⊢ (Prov(X) → Prov(Prov(X)))
3. T ⊢ Prov(A→B) → (Prov(A) → Prov(B))
4. T ⊢ Prov(X) → X

Now, (1) through (3) jointly entail Löb’s theorem. This means that by adding (4) we allow T to prove every X, contradicting its consistency. So no theory of arithmetic can have a provability predicate that it trusts to ensure truth, so long as that provability predicate satisfies the first three desiderata!

# No more than half of anti-sets are big

Background on anti-set theory here.

Suppose that somebody tells you that they have in their hands a model of anti-set theory containing 50 anti-sets. The axiom of anti-infinity gives us some nice immediate restrictions on how large these anti-sets can be. As a quick reminder, the axiom of anti-infinity tells us that every anti-set X must contain another anti-set Y that has a successor that’s not in X. The successor of X is defined to be X ⋃ {X} (an anti-set containing everything in X plus X itself). One big difference between successors in ZFC and successors in anti-set-theory is that anti-sets aren’t always guaranteed to have successors. Another is that anti-sets can be their own successors (this happens if the anti-set contains itself).

Ok, so anti-infinity immediately tells us that there can be no universal set (i.e. a set containing the entire universe). Since every set X must contain a set that has a successor outside X, no set can contain everything, or else it would contain all the successors!

So there are no anti-sets of size 50. How about anti-sets of size 49? It’s recently been proved that there can only be at most 25 sets of size 49. And in general, in a universe of size N no more than half of the anti-sets can be size N-1. Let’s prove it!

First, a naming convention: in a universe of size N, let’s call any sets of size N-1 big sets, and any other sets small sets. Big sets are as big as an anti-set can be. To satisfy anti-infinity, any set X must contain an element Y that has a successor that’s not in X. We’ll call Y the anti-infinity-satisfier for X.

Now, some observations. Suppose that a big set X has a successor. This successor is either size N-1 (if X contains itself) or size N (if not). But no set is size N! So any big set that has a successor, must be its own successor. But this means that big sets cannot be anti-infinity-satisfiers. If a big set X was an anti-infinity-satisfier for some other set Y, then its successor could not be in Y. But its successor is X, and X is in Y!

What this means is that every set must contain a small set as its anti-infinity-satisfier. Now, consider any big set X. X must contain some small set z that serves as an anti-infinity-satisfier for it. But for z to serve as an anti-infinity-satisfier, z must have a successor that’s not in X. What this means is that X must be missing z’s successor.

In other words, every big set must be missing at least one small set’s successor. But remember, big sets are size N-1. This means that they contain every set in the universe besides one. So now we know that the “missing set” that characterizes each big set is always a successor of a small set!

Let’s say that our universe contains k small sets. Then there are at most k successors of small sets. Any big set must be the set of all sets besides one of these successors. Thus by extensionality, there are at most k big sets. Therefore there can never be more big sets than small sets!

And that’s the result: In any finite universe, at most half of the anti-sets are big.

This resolves a conjecture that I posted earlier as an open question: in an N-element model, are there always at least three small sets? The answer is yes. If we had fewer than three small sets, then we’d have to also have fewer than three big sets. This means that our universe is at most size four (two small sets and two big sets). But no four-element models exist! QED.

# A Dating Puzzle, and Bayesian Experimental Design

A puzzle for you!

A group of six men (A, B, C, D, E, and F) and six women (a, b, c, d, e, and f) are chosen so that everybody has a perfect match of the opposite gender. They are given the following clues:

1. E and e are a match.
2. A and c are a match.
3. The pairing (A,b), (B,a), (C,c), (D,d), (E,e), (F,f) contains exactly 2 matches.
4. The pairing (A,c), (B,d), (C,a), (D,b), (E,e), (F,f) contains exactly 2 matches.
5. The pairing (A,c), (B,f), (C,b), (D,d), (E,e), (F,a) contains exactly 3 matches.

If they can figure out everybody’s perfect matches, the group will get a million dollars to share! Can they figure it out?

If you figure that one out, then here’s the next level, with ten pairs and twelve clues!

A group of ten men (ABCDEFGHIJ) and ten women (abcdefghij) are chosen so that everybody has a perfect match of the opposite gender. They are given the following twelve clues:

1. F and f are not a match
2. J and h are not a match
3. B and e are not a match
4. D and d are a match
5. H and c are a match
6. The pairing (Ai Bb Ca Dd Ee Fc Gj Hg If Jh) contains exactly 2 matches.
7. The pairing (Af Be Cg Dj Eh Fd Gi Hc Ia Jb) contains exactly 4 matches.
8. The pairing (Af Be Ca Dd Ej Fh Gi Hb Ig Jc) contains exactly 2 matches.
9. The pairing (Aa Bc Ci Dd Eg Fj Gf Hb Ih Je) contains exactly 2 matches.
10. The pairing (Af Bi Ce Dd Eh Fg Gj Hc Ia Jb) contains exactly 5 matches.
11. The pairing (Af Ba Cb Dd Eh Fi Gj Hc Ie Jg) contains exactly 5 matches.
12. The pairing (Af Bi Ch Dd Eb Fg Gj Hc Ia Je) contains exactly 7 matches.

Can you help them get their million dollars?

✯✯✯

Some background on the puzzle:

I didn’t actually come up with it out of nowhere, I encountered in the wild! A few days ago I started watching a new Netflix reality show called “Are You The One?” The premise of the show: ten men and ten women are paired up via a matchmaking algorithm, and if they can all figure out their other halves after living together for a month, then they win one million dollars. It’s an extremely corny show, but there was one aspect of it which I found pretty interesting: it’s a perfect setting for Bayesian experimental design! Me being me, I spent the whole show thinking about the math of how contestants get information about their potential “perfect matches”.

Let’s start with a very basic look at the underlying math. Ten pairs, chosen from two groups of ten, gives ten factorial possible matchings. 10! is 3,628,800, which is a lot of possibilities. If you were to randomly choose the ten matches, with no background information, you’d have a .000028% chance of getting everything right. This is about as likely as you are to correctly guess the outcome of 22 tosses of an unbiased coin in a row!

Of course, the show isn’t so cruel as to force them to just guess with no information. For one thing, they have a whole month to get to know each other and gather lots of that subtle tricky-to-quantify social evidence. But more importantly, every episode they get two crucial pieces of information:

First, they get to choose one pair (a man and a woman) to go into the “Truth Booth”. The group is then informed whether these two are a genuine match or not.

And second, at the end of each episode the entire group gathers together, everybody pairs off with one another, and they are told how many pairs they got right (though not which ones). On the final episode, the matching they choose determines whether they get the million dollars or not.

I call the first type of test the individual pair test and the second the group test. And this is where the two types of clues in the above puzzles come from! The clues for the second puzzle are actually just a translation of the tests that the group decided to do in the first seven episodes. (So if you successfully solved it, then feel good about yourself for being the type of person that would have won the million.) Interestingly, it turns out that by the seventh episode they could had already figured out everybody’s perfect match, but it took them three more episodes to get to that point! Silly non-logically-omniscient humans.

Putting aside the puzzle we started with, the show setup naturally lends itself to some interesting questions. First of all, is there a strategy that guarantees that the group will get the million dollars by the tenth episode? In other words, is ten pair tests and nine group tests sufficient to go from 3,628,800 possibilities to 1?

I am not yet sure of the answer. The fact that the group in season one managed to narrow it down to a single possible world with only seven episodes seems like evidence that yes, those 19 tests do provide enough evidence. In addition, there have been eight seasons and in only one did the group fail to find the correct matches. (And in the season whose group failed, they started with eleven pairs – multiplying the number of possible worlds by 11 – and used two fewer pair tests than previous seasons.)

However, it’s worth keeping in mind that while the group of twenty individuals was far from logically omniscient, they did have a great wealth of additional social evidence, and that evidence may have allowed them to make choices for their tests that yielded much more expected information than the optimal test in the absence of social information. (I’m also suspicious that the producers might be a little more involved in the process than they seem. There are a few seasons where the group is consistently getting 2 or 3 matches until the last episode where they suddenly get everything right. This happens in season 3, and by my calculations there were still four possible worlds consistent with all their information by the time of the last matching ceremony!)

We can also ask the Bayesian experimental design question. What’s the optimal pair test/group test to perform, given the set of remaining possible worlds?

We can solve this fairly easily using a brute force approach. Consider the problem of finding the optimal pair test. First we actually generate the set of all possible worlds (where each world is a specification of all ten matches). Let N be the size of this set. Then we look through all possible pairs of individuals (i, j), and for each pair calculate the number of worlds in which this pair is a match. Call this quantity nij. Then the expected number of worlds remaining after performing a pair test on (i, j) is:

nij Pr(i and j are a match) + (N – nij) Pr(i and j are not a match) = nij2/N+ (N – nij)2/N

So we simply search for the pair (i, j) that minimizes nij2 + (N – nij)2. This is equivalent to maximizing nij (N – nij): the product of the number of worlds where (i, j) is a match and the number of worlds where (i, j) is not a match.

We do pretty much the same thing for the group test. But here’s where we run into a bit of trouble: though our algorithm is guaranteed to return an optimal test, the brute force approach has to search through 10!10! possibilities, and this is just not feasible. The time taken to solve the problem grows exponentially in the number of worlds, which grows exponentially in the number of individuals.

So we have another interesting question on our hands: Is there an efficient algorithm for calculating the optimal pair/group test? In the more general setting, where the tests are not restricted to just being pair tests or group tests, but can be a test of ANY boolean expression, this is the question of whether SAT can be solved efficiently. And given that SAT is known to be NP-complete (it was in fact the first problem to be proven NP-complete!), this more general question ends up being equivalent to whether P = NP!

# What precedent does the Trump Twitter ban set?

I don’t know if anybody needs me to tell them this, but Trump has recently been permanently banned from Twitter, following a violent attack on the capitol inspired by his conduct since his election loss. This decision by Twitter has inspired a lot of discussion about free speech and censorship, and I want to chime in with one small point about a common argument you hear in these discussions.

A lot of the anti-ban discussion that I’ve seen relates to fears about setting a bad precedent for future decisions by tech companies. I think that these are legitimate concerns, but that they are often stated over-confidently. When trying to judge the precedent that an action sets, there’s a type of scope ambiguity that arises. What exactly is the precedent that Twitter’s ban of Trump sets? Is it that any time social media authorities find somebody’s ideas repugnant enough they will ban them? That’s probably too broad, but somebody that thinks that this is the precedent will find it overly subjective and judge it to be overreach. Is the precedent that Twitter will ban the President of the United States if his conduct leads to an assault on the Capitol? That’s probably too narrow, but might be a standard we find acceptable.

Sometimes there’s less ambiguity, like when the action fits nicely within a predetermined paradigm for precedent-breaking (e.g., perhaps making a new amendment to the Constitution), and other times there’s more ambiguity. Sometimes actions come with a detailed description of why they’re being taken, and this can be some guide to what precedent we consider the action to be setting. But other times they don’t, and even in those cases that they do we might have reasons to distrust the description, or to think the precedent being set is actually different from the description.

I guess we could say: there’s the intent in the mind of the Twitter authorities, and there’s their actual statement of the reasons for the action. There’s the explicit policy they are citing for their action (in other cases, they might be creating a new policy), and there’s the way that that action is understood among the general public. What we really care about is the likely future consequences of the act, and all of these things are relevant to judging this to one degree or another. Framing it this way, it’s a very complicated prediction about the world that one makes when they say that the Trump Twitter ban is setting a bad precedent: it’s saying that the future actions of Twitter and other tech companies are worse than they would have been in the counterfactual world where they had held off on the ban. This is a claim I have a lot of uncertainty about, and I think probably most other people should as well.

# Divisibility Tricks and Orderly Numbers

A fun exercise is to think about the divisibility tricks you learned in grade school and generalize them to different bases. In base 10 we have:

N is divisible by 2 if its last digit is divisible by 2
N is divisible by 3 if the sum of its digits is divisible by 3
N is divisible by 4 if its last two digits are divisible by 4
N is divisible by 5 if its last digit is 0 or 5
N is divisible by 6 if N is divisible by 2 and 3
N is divisible by 7 if N = Xy for a single digit y and X – 2y is divisible by 7
N is divisible by 8 if its last three digits are divisible by 8
N is divisible by 9 if the sum of its digits is divisible by 9
N is divisible by 10 if its last digit is 0
N is divisible by 11 if the alternating sum of its digits is divisible by 11

There are a few natural categories of divisibility rules here. Divisibility by 3 and 9 are both tested by taking a sum of the digits of N. Divisibility by 11 is checked with an alternating sum. Divisibility by 2, 4, 5, and 8 is tested by looking at the last few digits of N. Divisibility by 6 is a combination of two other divisibility rules. And divisibility by 7 is in its own category.

Sum Test
To test divisibility of N by k, check if the sum of digits of N is divisible by k.

In base B, this test can be used whenever k divides B – 1. For example, in base B = 10, we use the sum test only to test divisibility by numbers that divide 9, i.e. 3 and 9.

Alternating Sum Test
To test divisibility of N by k, check if the alternating sum of digits of N is divisible by k.

In base B, this test can be used only if k divides B+1. For example, in base B = 10, we use the alternating sum test only to test divisibility by numbers that divide 11, i.e. 11.

Ending Test
To test divisibility of N by k, check if the last m digits of N is divisible by k.

In base B, this test can only be used if k divides B, or is a power of some number that divides B. For example, in base B = 10, we use the ending test only to test divisibility by numbers that divide 10, i.e. 2, 5, and 10, and powers of those numbers, i.e. 4, 8, 16, and 25.

Combination Tests
In any base B, to test divisibility of N by k = ij, where i and j are co-prime, it suffices to check divisibility by i and divisibility by j.

So for instance, divisibility by 6 can be checked by performing the tests for 2 and 3, but divisibility by 12 can’t be checked by performing the tests for 2 and 6.

Other Tests
The test for divisibility by 7 in base 10 doesn’t fit into any of the previous categories.

We’ll only be considering the first four categories.

There’s a nice way to see why these tests generalize in the way that they do. Let’s think about the sum test first.

Sum Test

We reason it through by induction. Suppose that we’re in base B, and checking divisibility by K < B. Our base case is that the sum of the digits of K is just K, which is divisible by K. For our inductive case, assume that every number n less than N, n is divisible by K if the digits of n add up to something divisible by K. Now, N = (N-K) + K, and (N-K) < N, so (N-K) satisfies the inductive hypothesis. This means that we just need to show that if we start with a number whose digits add to a multiple of K, and add K to this number, the sum of the digits of the resulting number will also be a multiple of K.

Break this into a few cases: First, adding K only affects the last digit. Then the last digit increases by K, so the sum of digits increases by K, and is thus still divisible by K.

What if adding K makes the last digit larger than B? Then we subtract B from the last digit, and add 1 to the second-to-last digit. So the net change in the digit sum is +1 + (K – B) = K – (B – 1)

But adding 1 to the second-to-last digit might also have resulted in a digit larger than B. In this case, we decrease the second-to-last digit is by B and add 1 to the the third-to-last digit. The net change here is -(B – 1).

If this causes the third-to-last digit to exceed B, then the same thing happens, and the net change in the sum of digits from those two is another -(B – 1)

In sum, this tells us that by adding K, the change in the digit sum is one of the following:

+K
+K – (B – 1)
+K – 2(B – 1)
+K – 3(B – 1)

In general, the change in the digit sum is +K – n(B – 1) for some n. If we now assume that (B-1) is a multiple of K, then the change in the digit-sum is also a multiple of K. And thus the resulting digit sum will be a multiple of K as well!

So for any K such that B – 1 is a multiple of K, if the digit sum of N is a multiple of K, then N is a multiple of K!

Alternating Sum Test

We can reason it through similarly for the alternating sum test. The possible changes to digits are:

+K
+1, +K – B
+1, +1 – B, +K – B
+1, +1 – B, +1 – B, +K – B
+1, +1 – B, +1 – B, +1 – B, +K – B

The alternating sum of this pattern, starting from the far right, gives:

K
(K – B) – 1
(K – B) – (1 – B) + 1
(K – B) – (1 – B) + (1 – B) – 1
(K – B) – (1 – B) + (1 – B) – (1 – B) + 1

The result of this sum is always either K or K – (B + 1). So as long as B+1 is a multiple of K, then the alternating sum will also change by a multiple of K!

Thus for any K such that B+1 is a multiple of K, if the alternating sum of the digits of N is a multiple of K, then N is a multiple of K.

If we disregard the “Other Tests” category, we can quite easily count for any given base how many digits will have sum, alternating sum, ending, or combination tests. People sometimes say that highly composite numbers would be ideal choices for numerical base systems, but those are really only good for ending tests, and not necessarily for sum or alternating sum tests.

I’ll define the notion of the orderliness of N as the amount of numbers 2 ≤ n < N that have a divisibility rule in base N, restricting divisibility rules to sum tests, alternating sum tests, ending tests, and combination tests.

Here’s a plot of the orderliness of bases from 3 to 100:

And from 3 to 1000:

The jaggedness of this graph suggests that some numbers make much better bases than their neighbors. By analogy to the notion of highly composite numbers, I’ll define highly ordered numbers as those whose orderliness is larger than all smaller numbers. These are numbers that would be ideal choices for numerical bases. The first few highly ordered numbers are:

3, 4, 5, 6, 8, 9, 10, 11, 14, 19, 20, 21, 29, 34, 49, 50, 55, 56, 69, 76, 90, 91, 99

Notice that 12 isn’t on this list, despite being highly composite! You might be able to guess why: though 12 has lots of factors, its neighbors 11 and 13 do not. On the other hand, 11 doesn’t have a lot of factors, but its neighbors 10 and 12 do, so it lands a spot on the list.

We could also measure orderability by what percentage of smaller numbers have rules for them. Here’s what that looks like: