What’s the probability that an election winner leads throughout the entire vote?

Try to answer the question in the title for yourself. An election has n votes for candidate A, m votes for candidate B, and no third candidate. Candidate A wins, so n > m. Assuming that the votes were equally likely to be counted in any order, what’s the probability that candidate A was always ahead? (Note that “always ahead” precludes the vote being tied at any point.)

My friend showed me this puzzle a couple of days ago and told me that there’s an extremely simple way to solve it. After trying a bunch of complicated things (including some wild Pascal’s triangle variants), I eventually relented and asked for the simple solution. Not only is there a short proof of the answer, but the answer is itself incredibly simple and elegant. See if you can do better than I did!




(Read on only after you’ve attempted the problem!)




So, here’s the solution. We start by considering the opposite of A always being ahead, which is that A is either behind or that the vote is tied at some point. Since we know that A eventually wins, A being behind at some point implies that at a later point A and B are tied. So the opposite of A always being ahead is really just that there is a tie at some point.

Pr(A is always ahead | n votes for A, m votes for B)
= 1 – Pr(A is behind at some point or tied at some point | n, m)
= 1 – Pr(There’s a tie at some point | n, m)

Now, let’s consider the probability of a tie at some point. There are two ways for this to happen: either the first vote is for A or the first vote is for B. The first vote being for B entails that there must be a tie at some point, since A must eventually pull ahead to win. This allows us to say the following:

Pr(Tie at some point | n, m)
= Pr(Tie at some point & A is first vote | n, m) + Pr(Tie at some point & B is first vote | n, m)
= Pr(Tie at some point & A is first vote | n, m) + Pr(B is first vote | n, m)
= Pr(Tie at some point & A is first vote | n, m) + m/(n+m)

Now, the final task is to figure out Pr(Tie at some point & A is first vote | n, m). If you haven’t yet figured it out, I encourage you to pause for a minute and think it over.



Alright, so here’s the trick. If there’s a tie at some point, then up to and including that point there are an equal number of votes for A and B. But this means that there are the same number of possible worlds in which A votes first as there are possible worlds in which B votes first! And this means that we can say the following:

Pr(Tie at some point & A is first vote | n, m)
= Pr(Tie at some point & B is first vote | n, m)

And this is exactly the probability we’ve already solved!

Pr(Tie at some point & B is first vote | n, m)
= Pr(B is first vote | n, m)
= m/(n+m)

And now we’re basically done!

Pr(Tie at some point | n, m)
= m/(n+m) + m/(n+m)
= 2m/(n+m)

Pr(A is always ahead | n, m)
= 1 – Pr(Tie at some point | n, m)
= 1 – 2m/(n+m)
= (n – m) / (n + m)

And there we have it: The probability that A is always ahead is just the difference in votes over the total number of votes! Beautiful, right?

We can even more elegantly express this as simply the percent of people that voted for A minus the percent that voted for B.

(n – m) / (n + m)
= n/(n+m) – m/(n+m)
=%A – %B

This tells us that even in the case where candidate A gets 75% of the vote, there’s still a 50/50 chance that they fall behind at some point!

An example of this: the recent election had Biden with 81,283,485 votes and Trump with 74,223,744 votes. Imagining that there were no third candidates, this would mean that Biden had 52.27% of the popular vote and Trump had 47.73%. And if we now pretend that the votes were equally likely to be counted in any order, then this tells us that there would only be a 9.54% chance that Biden would be ahead the entire time! Taking into account the composition of mail-in ballots, which were counted later, this means that Trump having an early lead was in fact exactly what we should have expected. The chance that Biden would have fallen behind at some point was likely quite a bit higher than 90.5%!

Quantum Chess

Recently I’ve been thinking about both quantum mechanics and chess, and came up with a chess variant that blends the two. The difference between ordinary chess and quantum chess is the ability to put your pieces in superpositions. Here’s how it works!


There are five modes of movement you can choose between in quantum chess: Move, Split, Merge, Collapse, and Branchmove.

Modes 1 and 2 are for non-superposed pieces, and modes 3, 4, and 5 are for non-superposed pieces.

Mode 1: Move

This mode allows you to move just as you would in an ordinary chess game.

Mode 2: Split

In the split mode, you can choose a non-superposed piece and split it between two positions. You can’t choose a position that is occupied, even by a superposed piece, meaning that splitting moves can never be attacks.

One powerful strategy is to castle into a superposition, bringing out both rooks and forcing your opponent to gamble on which side of the board to stage an attack on.

Mode 3: Merge

In mode 3, you can merge the two branches of one of your superposed pieces, recombining them onto a square that’s accessible from both branches.

You can’t merge to a position that’s only accessible from one of the two branches, and you can’t merge onto a square that’s occupied by one of your own superposed pieces, but merge moves can be attacks.

Mode 4: Collapse

Mode 4 is the riskiest mode. In this mode, you choose one of your superposed pieces and collapse it. There are two possible outcomes: First, it might collapse to the position you clicked on. In this case, you now have a choice to either perform an ordinary move…

… or to split it into a new superposition.

But if you get unlucky, then it will collapse to the position you didn’t select. In this case, your turn is over and it goes to your opponent.

Mode 5: Branch Move

Finally, in a branch move, you relocate just one branch of the superposition, without collapsing the wave-function or affecting the other branch.

Piece Interactions

Attacking a Superposed Piece

What happens if you attack a superposed piece? The result is that the superposed piece collapses. If the piece collapses onto the square you attacked, then you capture it.

But if it collapses onto the other branch of the superposition, then it is safe, and your piece moves harmlessly into the square you just attacked.

This means that attacking a superposed piece is risky! It’s possible for your attack to backfire, resulting in the attacker being captured next turn by the same piece it attacked.

It’s also possible for a pawn to move diagonally without taking a piece, in a failed attack.

Line of Sight

Superposed pieces block the lines of sight of both your pieces and your opponent’s pieces. This allows you to defend your pieces or block open files, without fully committing to defensive positions.

Winning the Game

To win quantum chess, you must actually take the opponent’s king. Let’s see why it’s not enough to just get the opponent into a position that would ordinarily be checkmate:

It’s blue’s turn now, and things look pretty lost. But look what blue can do:

Now the red queen has to choose one of the two targets to attack, and there’s a 50% chance that she gets it wrong, in which case the blue king can freely take the red queen, turning a sure loss into a draw!

So how can red get a guaranteed win? It takes patience. Rather than trying to attack one of the two squares, the red queen can hang back and wait for a turn.

Now the blue king has two choices: leave superposition, after which they can be taken wherever they go. Or move one branch of the superposition, but any possible branch move results in a safe shot at the king with the queen. This can be repeated until the king is taken.

And that’s quantum chess! I’ve played several games with friends, and each time have noticed interesting and surprising strategies arising. Let me leave you with a quantum chess puzzle. Here’s a position I found myself in, playing red:

What do you think was my best move here?

The Theory of Knots


A knot is a closed loop in three dimensional space that doesn’t intersect itself. Some examples:

Knot diagrams are what you just saw: two dimensional representations of the three dimensional objects. It should be obvious that any one knot has many different diagrams. This raises an immediate question: given two knot diagrams, do they represent the same knot?

We need to be clear on what we mean by “the same knot”, of course. The intuition here is fairly obvious: you can rotate, stretch, and wiggle any section of a knot without fundamentally changing it.

What you CANNOT do is cut and reconnect the knot or pass one section of a knot through another of its sections.

Looking back at our starting five example knots, ask yourself if any of them represent the same knot:

It might not be too obvious, but 2 and 4 represent the same knot, and so do 1 and 5! These equivalences (especially the equivalence between 1 and 5) should hopefully give a flavor for the difficulty of the general problem of determining equivalence between knots. While it’s sometimes possible to answer this by mere examination, it’s more often impossible. Even in cases where it seems very intuitively obvious (like, for instance, that the trefoil is not just an unknot in disguise), it’s hard to come up with a way to PROVE this fact.

When faced by a hard problem, we should begin thinking of the structure of a general solution. What we’d really like is a recipe for an algorithm that takes in two knot diagrams and decides whether they represent the same knots. (For economy of space and ease of reading, I’ll proceed to refer to “knot diagrams that represent equivalent knots” as “equivalent knot diagrams”.)

Here’s a naive first pass at a solution: Take either of the two knot diagrams you’re given. Start wiggling it around in all the allowed ways. At each step, check if you’ve obtained the other knot diagram. If you ever do, return True. Otherwise return False.

Maybe you see the problem here. In fact there are at least two major problems. (I said it was a naive first pass.)

The first big problem is in the “otherwise return False”. At what point in the algorithm do you actually reach this “otherwise” clause? Assuming that the two knot diagrams really do represent different knots, then you can keep manipulating the starting knot diagram in more and more complicated ways forever. At each stage you might get original knot diagrams, and at no stage are you justified in giving up and returning False. In other words, even if this algorithm worked as I described it, it could only ever decide equivalence of two equivalent knot diagrams. If handed two inequivalent knot diagrams, it would run forever. Thus this is really an algorithm for semi-deciding knot-diagram equivalence, not deciding it.

The second big problem is in the “start wiggling it around in all the allowed ways.” How would we describe the set of allowed transformations of a knot diagram to a computer? In particular, could we simplify down the allowed transformations to a finite set that would ensure that we really had done ALL possible manipulations? The answer to this is obviously no. Consider the following transformation:

Each transformation like this produces a new knot diagram, simply by wiggling one side in an arbitrary way. But how many possible wiggles are there? Clearly an uncountable infinity: you can do arbitrarily small disruptions centered on any point along the knot, leaving the rest undisturbed, and get a distinct knot diagram each time.

The “uncountable infinity” aspect of this second problem can actually be quickly resolved by sharpening our original question. Rather than regarding a knot diagram as any two-dimensional projection of a knot, we can think about a knot diagram as defined by its various crossings. For instance, we can label each uninterrupted “arc” in a knot diagram, as well as each crossing point.

We now produce an alternative description of knot diagrams, obtained by describing how each arc interacts with each crossing point.

Crucially, our new description is not affected by the uncountably many small wiggles that do not make any substantial changes to the knot diagram. I.e., they do not affect the way that the various arcs cross under or above each other. We call this “equivalence of knot diagrams up to planar isotopy”, where planar isotopy is meant to refer to changes in a knot that don’t affect the structure of the crossings.

That’s one helpful step towards fixing up our algorithm. However, we are still in need of a finite set of transformations that we are sure generates ALL possible transformations on knot diagrams. I encourage you to think about this problem for a moment, and come up with a set of transformations to knot diagrams that you think are sufficient to produce knot diagrams.



I’ll leave a little space for you to try to come up with an answer for yourself. It’s not an easy exercise, but might be fun to test your knot-theoretic intuitions on.



Reidemeister Moves

It turns out that a set of six very simple transformations suffices for this task, discovered by Kurt Reidemeister. There are three categories of transformations:

If two knot diagrams are equivalent, then there is a sequence of Reidemeister moves that transforms one to the other (up to planar isotopy).

It’s pretty fun to try to construct such sequences for oneself, and in doing so you’ll develop an intuition for why repeated Reidemeister moves suffice to generate all transformations. Try the earlier example of the two diagrams for the trefoil! I can do it in ten moves, can you do better?

So, this is great news! Our naive first attempt at building an algorithm to determine equivalence can be implemented (though it still only semi-decides equivalence rather than deciding it). Choose one of the diagrams, and run through all finite sequences of Reidemeister moves in any order. At each stage we check if we get the other diagram by application of these moves, and if so return True.

So with the help of the Reidemeister moves, we’ve successfully semidecided the equivalence problem. But it turns out that we can do better. Reidemeister moves not only allow us to determine that two diagrams are equivalent, but they lead us to a powerful tool to determine that two diagrams are inequivalent. The key concept here is invariants.


Imagine that we found some quantity Q, calculated from a knot diagram, such that no matter how we alter the knot diagram, that quantity never changes. Now if we’re given two knot diagrams D1 and D2, and find that Q(D1) ≠ Q(D2), then we have conclusively demonstrated the inequivalence of the two diagrams! Why? Because if there was some way to transform D1 into D2, and since every transformation holds fixed Q(D1), then Q(D2) would have to equal Q(D1)!

The hard part, of course, is finding invariants. But this is much easier when we have the help of Reidemeister moves! To show that Q(D) is an invariant quantity, all we need to do is show that the value of Q doesn’t change when we apply any Reidemeister move to the diagram D. This basically amounts to checking six simple cases, which is not too bad.

So, let’s go to our first invariant quantity: three-colorability.


Three-colorability is quite simple: a knot is three-colorable if it’s possible to color each knot such that at each crossing either all three arcs are the same color, or all three are different.

Notice that any knot can be trivially 3-colored by just letting every arc be the same color; to avoid this we add the requirement that for a 3-coloring to be valid, it must use at least two distinct colors.

It’s a fun exercise to show that this quantity is invariant using Reidemeister moves. The simplest case is the twist. Essentially all we need to do is show that if we apply a twist to a 3-colorable diagram, we can still 3-color the new diagram, AND that there’s a way to “untwist” a 3-colorable diagram so that we still retain 3-colorability. The proof is actually quite trivial; note that the crossing at any twist must be monochrome, since two of the involved arcs are actually the same.

Try convincing yourself that 3-colorabiity is preserved under the other Reidemeister moves!

So, 3-colorability is an invariant. There’s a very nice consequence of this. The trefoil is three-colorable, and the unknot (having only one arc) is not.

So this proves that the trefoil can not be unknotted! This was probably intuitively obvious from the outset, but we can now rest comfortably in the certainty of a mathematical proof.


A fun exercise: try to 3-color the figure-eight knot, and see why it’s not possible.

This shows that the figure-eight knot is not equivalent to the trefoil. But what about the unknot? Neither the unknot and the figure-eight knot are 3-colorable, and yet they are not equivalent! So 3-colorability, while quite cool, isn’t the whole story. However, there’s a nice generalization of 3-colorability to p-colorability for any prime p.

Think about colors as numbers. So our set of three colors from the last section will just be {0, 1, 2}. A coloring is then a function mapping arcs to {0, 1, 2}. This gives us an algebraic way of expressing the 3-colorability property:

If x, y, and z are the colors of the three arcs connecting at a crossing, with z being the arc that passes above the crossing, then 2z – x – y = 0 (mod 3). It’s easily checkable that this is equivalent to the condition that each crossing is either monochrome, or else the three arcs are different colors.

We generalize this to any prime p as follows:

Our set of colors will be {0, 1, …, p – 1}. If for each crossing, 2z – x – y = 0 mod p, and at least two colors are used, then the knot is p-colorable.

For each prime p, p-colorability is an invariant, meaning that we have now gone from one invariant quantity to a countable infinity of invariants! We also can now distinguish the figure-eight knot from the unknot: the first is 5-colorable, while the unknot is not.

Five-coloring the figure-eight – check for yourself that the algebraic condition (2z – x – y = 0 mod 5) is satisfied

An exercise: try 5-coloring the trefoil. Is it possible?

However, this is not the end of the story. For one thing, it’s not so easy to determine if an arbitrary knot is p-colorable. It’s doable, sure, by trying all the possible colorings, but we want a quicker method. This will be addressed in the next section.

Much more importantly, we’ll see in the next section that there are knot diagrams that have the same colorability profile (i.e. D1 is p-colorable iff D2 is p-colorable, for each p), and yet represent different knots. So we need an even more powerful invariant to discriminate these from one another.

Knot Determinant

Take any knot and label its arcs and crossings with natural numbers.

Then construct a matrix M, where the (i, j) component corresponds to arc i and crossing j. If arc i isn’t involved in crossing j, then Mij = 0. If arc i is involved in crossing j, and passes over it, then Mij = 2. And if arc i passes under crossing j, then Mij = -1.

Now, cross out any row and column of your choice, and take the determinant of the remaining matrix. Take its absolute value, and you have the knot determinant.

You know what I’m going to say next: the knot determinant is an invariant! Not only is it an invariant, but it also fully determines the colorability profile of a knot! The rule is that if a knot has knot determinant N, then for any prime p > 2, N is p-colorable if and only if p divides N.

For instance, say we calculate that a knot has determinant 33. We immediately conclude that the knot is 3-colorable, 11-colorable, and not p-colorable for any other prime.

This is pretty incredible. We now have a simple deterministic procedure for looking at a knot diagram and producing a number that gives us the entire colorability profile of the knot. But wait, there’s more!

Suppose we have two knot diagrams, one with determinant 15, and the other with determinant 45. These two indicate the same colorability profile: 3-colorable, 5-colorable, and nothing else. But since the determinants are different, the two knots are not equivalent! In other words, any two knots with different determinants that have all the same prime factors (but with possibly different powers of those prime factors) will have the exact same colorability profile, and will nonetheless be proven inequivalent by the knot determinant!

We’ve now reached the end of this dive into knot theory, but only because we’ve begun to near the end of my knowledge of the subject. By no means does the theory of knots end here. The knot determinant, amazing though it is, still does not fully pin down the different knots up to equivalence. There are distinct knots with the same determinant. If we were to dive further, we’d next look into a polynomial invariant, the Alexander polynomial. Going deeper, there’s hyperbolic invariants, involving the complement of a knot in hyperbolic space. Then there’s links, which are structures consisting of multiple knots linked together in some way, which have their own theory and periodic table. And we can even consider knotting spheres in four-dimensional space, or in general n-dimensional spheres in n+1-dimensional space! Plus there’s the operation of the knot sum, which we can use to combine distinct knots and get a new one. This operation is commutative and associative, and gives rise to the notion of prime and composite knots. And the coolness just keeps going on from here!

If nothing else, I hope I’ve convinced you that the subject of knots is of great mathematical interest and inspired you to look into it yourself. I’ll leave you with the first few rows of a (necessarily infinite) periodic table of knots, showing the prime knots up to seven crossings:

Polish Notation and Garden-Path Sentences

Polish notation is a mathematical notation system that allows you to eliminate parentheses without ambiguity. It’s called “Polish” because the name of its Polish creator, Jan Łukasiewicz, was too difficult for people to pronounce.

A motivating example: Suppose somebody says “p and q implies r”. There are two possible interpretations of this: “(p and q) implies r” and “p and (q implies r)”. The usual way to disambiguate these two is to simply add in parentheses like I just did. Another way is to set an order-of-operations convention, like that “and” always applies before “implies”. This is what’s used in basic algebra, and what allows you to write 2 + 2 ⋅ 4 without any fear that you’ll be interpreted as meaning (2 + 2) ⋅ 4.

Łukasiewicz’s method was to make all binary connectives into prefixes. So “A and B” because “and A B”, “P implies Q” becomes “implies P Q”, and so on. In this system, “(p and q) implies r” translates to “implies and p q r”, and “p and (q implies r)” translates to “and p implies q r”. Since the two expressions are different, there’s no need for parentheses! And in general, no ambiguity ever arises from lack of parentheses when using Polish notation.

If this is your first time encountering Polish notation, your first reaction might be to groan and develop a slight headache. But there’s something delightfully puzzling about reading an expression written in Polish notation and trying to understand what it means. Try figuring out what this means: “implies and not p or q s r”. Algebra can be written in Polish notation just as easily, removing the need for both parentheses AND order-of-operations. “2 + 2 = 4” becomes “+ 2 2 = 4”, or even better, “= + 2 2 4”.

Other binary connectives can be treated in Polish notation as well, creating gems like: “If and you’re happy you know it clap your hands!” “When life is what happens you’re busy making plans.” “And keep calm carry on.” “Therefore I think, I am.” (This last one is by of the author the Meditations). Hopefully you agree with me that these sentences have a nice ring to them, though the meaning is somewhat obscured.

But putting connectives in front of the two things being connected is not unheard of. Some examples in English: “ever since”, “because”, “nonwithstanding”, “whenever”, “when”, “until”, “unless”. Each of these connects two sentences, and yet can appear in front of both. When we hear a sentence like “Whenever he cheated on a test the professor caught him”, we don’t have any trouble parsing it. (And presumably you had no trouble parsing that entire last sentence either!) One could imagine growing up in a society where “and” and “or” are treated the same way as “ever since” and “until”, and perhaps in this society Polish notation would seem much more natural!

Slightly related to sentential connectives are verbs, which connect subjects and objects. English places its verbs squarely between the subject and the object, as does Chinese, French, and Spanish. But in fact the most common ordering is subject-object-verb! 45% of languages, including Hindi, Japanese, Korean, Latin, and Ancient Greek, use this pattern. So for instance, instead of “She burned her hand”, one would say “she her hand burned”. This is potentially weirder to English-speakers than Polish notation; it’s reverse Polish notation!

9% of languages use Polish notation for verbs (the verb-subject-object pattern). These include Biblical Hebrew, Arabic, Irish, and Filipino. In such languages, it would be grammatical to say “Loves she him” but not “She loves him”. (3% of languages are VOS – loves him she – 1% are OVS – him loves she – and just a handful are OSV – him she loves).

Let’s return to English. Binary prepositions like “until” appear out front, but they also swap the order of the two things that they connect. For instance, “Until you do your homework, you cannot go outside” is the same as “You cannot go outside until you do your homework”, not “You do your homework until you cannot go outside”, which sounds a bit more sinister.

I came up with some examples of sentences with several layers of these binary prepositions to see if the same type of confusion as we get when examining Polish notation for “and” or “implies” sets in here, and oh boy does it.

Single connective
Since when the Americans dropped the bomb the war ended, some claimed it was justified.

Two connectives, unlayered
Since when the Americans dropped the bomb the war ended, when some claimed it was an atrocity others argued it was justified.

Still pretty readable, no? Now let’s layer the connectives.

One layer
Whenever he was late she would weep.
She would weep whenever he was late.

Two layers
Since whenever he was late she would weep, he hurried over.
He hurried over, since she would weep whenever he was late.

Three layers
Because since whenever he was late she would weep he hurried over, he left his wallet at home.
He left his wallet at home, because he hurried over since she would weep whenever he was late.

Four layers
Because because since whenever he was late she would weep he hurried over he left his wallet at home, when he was pulled over the officer didn’t give him a ticket.
The officer didn’t give him a ticket when he was pulled over, because he left his wallet at home because he hurried over since she would weep whenever he was late.

Five layers
When he heard because because since whenever he was late she would weep he hurried over he left his wallet at home, when he was pulled over the officer didn’t give the man a ticket, the mayor was outraged at the lawlessness.
The mayor was outraged at the lawlessness when he heard the officer didn’t give the man a ticket when he was pulled over because he left his wallet at home because he hurried over since she would weep whenever he was late.

Read that last one out loud to a friend and see if they believes you that it makes grammatical sense! With each new layer, things become more and more… Polish. That is, indecipherable. (Incidentally, Polish is SVO just like English). Part of the problem is that when we have multiple layers like this, phrases that are semantically connected can become more and more distant in the sentence. It reminds me of my favorite garden-path sentence pattern:

The mouse the cat the dog chased ate was digested.
(The mouse that (the cat that the dog chased) ate) was digested.
The mouse (that the cat (that the dog chased) ate) was digested.

The phrases that are meant to be connected, like “the mouse” and “was digested” are sandwiched on either side of the sentence, and can be made arbitrarily distant by the addition of more “that the X verbed” clauses.

Does anybody know of any languages where “and” comes before the two conjuncts? What about “or”? English does this with “if”, so it might not be too much of a stretch.

A Self-Interpreting Book

A concept: a book that starts by assuming the understanding of the reader and using concepts freely, and as you go on it introduces a simple formal procedure for defining words. As you proceed, more and more words are defined in terms of the basic formal procedure, so that halfway through, half of the words being used are formally defined, and by the end the entire thing is formally defined. Once you’re read through the whole book, you can start it over and read from the beginning with no problem.

I just finished a set theory textbook that felt kind of like that. It started with the extremely sparse language of ZFC: first-order logic with a single non-logical symbol, ∈. So the alphabet of the formal language consisted of the following symbols: ∈ ( ) ∧ ∨ ¬ → ↔ ∀ ∃ x ‘. It could have even started with a sparser formal language if it was optimizing for alphabet economy: ∈ ( ∧ ¬ ∀ x ‘ would suffice. As time passed and you got through more of the book, more and more things were defined in terms of the alphabet of ZFC: subsets, ordered pairs, functions from one set to another, transitivity, partial orders, finiteness, natural numbers, order types, induction, recursion, countability, real numbers, and limits. By the last chapter it was breathtaking to read a sentence filled with complex concepts and realize that every single one of these concepts was ultimately grounded in this super simple formal language we started with, with a finitistic sound and complete system of rules for how to use each one.

But could it be possible to really fully define ALL the terms used by the end of the book? And even if it were, could the book be written in such a way as to allow an alien that begins understanding nothing of your language to read it and, by the end, understand everything in the book? Even worse, what if the alien not only understands nothing of your language, but starts understanding nothing of the concepts involved? This might be a nonsensical notion; an alien that can read a book and do any level of sophisticated reasoning but doesn’t understand concepts like “and” and “or“.

One way that language is learned is by “pointing”: somebody asks me what a tree is, so I point to some examples of trees and some examples of non-trees, clarifying which is and which is not. It would be helpful if in this book we could point to simple concepts by means of interactive programs. So, for instance, an e-book where an alien reading the book encounters some exceedingly simple programs that they can experiment with, putting in inputs and seeing what results. So for instance, we might have a program that takes as input either 00, 01, 10, or 11, and outputs the ∧ operation applied to the two input digits. Nothing else would be allowed as inputs, so after playing with the program for a little bit you learn everything that it can do.

One feature of such a book would be that it would probably use nothing above first-order logical concepts. The reason is that the semantics of second-order logic cannot be captured by any sound and complete proof system, meaning that there’s no finitistic set of rules one could explain to an alien so that they know how to use the concepts involved correctly. Worse, the set of second-order tautologies is not even recursively enumerable (worse than the set of first-order tautologies, which is merely undecidable), so no amount of pointing-to-programs would suffice. First-order ZFC can define a lot, but can it define enough to write a book on what it can define?

Transfinite Nim: uncomputable games and games whose winner depends on the Continuum Hypothesis

In the game of Nim, you start with piles of various (whole number) heights. Each step, a player chooses one pile and shrinks it by some non-zero amount. Once a pile’s height has been shrunk to zero, it can no longer be selected by a player for shrinking. The winner of the game is the one that takes the last pile to zero.

Here’s a sample game of Nim:

Starting state
3, 2
After Frank’s move
2, 2
After Marie’s move
2, 1
After Frank’s move
0, 1
After Marie’s move
0, 0

Marie takes the last pile to zero, so she is the winner. Frank’s second-to last move was a big mistake; by reducing the first pile from 2 to 0, he left the only remaining pile free to be taken by Marie. In a game of Nim, you should never leave only one pile remaining at the end of your turn. If Frank had instead shrunk the first pile from 2 to 1, then the state of the piles would be (1, 1). Marie would be forced to shrink one of the two piles to zero, leaving Frank to take the final pile and win.

The strategy of Nim with two piles is extremely simple: in your turn you should always even out the two piles if possible. This is only possible if the heights are different at the start of your turn. See if you can figure out why this strategy guarantees a win!

Transfinite Nim is a version of Nim where the piles are allowed to take infinite ordinal values. So for instance, a game might have the starting position:

Starting state
ω2 + ω, ω1 + ε0

If Marie is moving first, then can she guarantee a win? What move should she make?

It turns out that the strategy for two-pile Transfinite Nim is exactly the same as for Finite Nim. Marie has a guaranteed win, because the two piles are different values. Each move she’ll just even the piles out. So for her first move, she should do the following:

Starting state
ω2 + ω, ω1 + ε0
After Marie’s move
ω2 + ω, ω2 + ω

No matter what Frank does next, Marie can just “copy” that move on the other pile, guaranteeing that Marie always has a move as long as Frank does. This proves that Marie must have the last move, and therefore win.

One important feature of Transfinite Nim is that even though we’re dealing with infinitely large piles, every game can only last finitely long. In other words, Frank has no strategy for delaying his loss infinitely long, and thus forcing a sort of “stalemate by exhaustion.” This is because the ordinals are well-ordered, and any decreasing sequence of well-ordered items must terminate. (Why? Just consider the definition of a well-ordered set: every subset has a least element. If the game were to continue infinitely long, each step decreasing the state but never terminating, then the sequence of states would be a subset of the ordinals which has no least element!)

Although the strategy of Transfinite Nim is no more interesting than Finite Nim, the game does have some interesting features that it inherits from the ordinals. There are sets of ordinal numbers such that the ordering between them is uncomputable (i.e. that no Turing machine can take as input any two ordinals from that set and correctly assess whether one is larger than the other). For such sets, the ability to compute a winning strategy is called into question.

For instance, the set of all countable ordinals is uncomputable. The quick proof is that there are uncountably many countable ordinals – otherwise in ZFC the set of countable ordinals would itself be a countable ordinal and would thus contain itself – and any Turing machine can only compare countably many things.

The smallest uncomputable ordinal (which, in ZFC, is exactly the set of all computable ordinals) is called the Church Kleene ordinal and written ω1CK. Imagine the starting state of the game is two different ordinals that are both larger than ω1CK. If you’re moving first, then you have to determine which of the two ordinals is larger, in order to even them out. But this is not in general possible! So even if you go first and the two piles are different sizes, you might not be able to guarantee a win.

Suppose Marie is allowed uncomputable strategies, and Frank is only allowed computable strategies. Suppose further that the starting state involves two ordinals A and B, both larger than the Church-Kleene, and that the ordinals are expressed in some standard notation (so that you can’t write the same ordinal two different ways). There are a few cases.

Case 1: A = B, Marie goes first.
Marie decreases one of the two ordinals. Despite not being able to compute the order on the ordinals, Frank can just mimic her move. This will continue until Frank wins.

Case 2: A = B, Frank goes first.
Frank decreases one of the two ordinals, and Marie mimics. Marie eventually wins.

Case 3: A ≠ B, Marie goes first.
Marie can tell which of the ordinals is larger, and decreases that one to even out the two piles. Marie wins.

Case 4: A ≠ B, Frank goes first.
Frank can’t tell which of the ordinals is larger and can’t try to even them out, as doing so might result in an invalid move (trying to increase the smaller pile to the height of the larger one). So Frank does some random move, after which Marie is able to even out the two piles. Marie wins.

Finally, here’s a starting state for a game of Transfinite Nim:

ω1, ℶ1

ω1 is the first uncountable ordinal, and ℶ1 is the first ordinal with continuum cardinality. Frank goes first. Does he have a winning strategy?

It depends on whether ω1 = ℶ1. If the two are equal, then Frank can’t win, because he’s starting with two even piles. And if ω1 < ℶ1, then Marie can’t win, because Frank can decrease the ℶ1 pile to ω1.

If we suppose that the players must be able to prove a move’s validity in ZFC before playing that move, then the first player couldn’t decrease the ℶ1 pile to ω1. The first player still has to do something, and whatever he does will change the state to two ordinals that are comparable by ZFC.

What about larger starting ordinals whose size comparison is independent of ZFC, like ω15 and ℶ15? If the new state after the first player’s move move also involves two ordinals whose size comparison is independent of ZFC, then the second player will also be unable to even them out. This continues until one of them eventually decreases a pile to an ordinal whose size is comparable by ZFC to the other pile. So the winner will depend on who knows more pairs of ordinals less than the starting values with values that ZFC can’t compare. In fact, each player wants to force the other player to make the values ZFC-comparable, so they’ll be able to even the piles out on their turn.

The Anti-Set Program

In ZFC set theory, we specify a collection of sentences within a first-order language to count as our axioms. The models of this collection of sentences are the set-theoretic universes (and many of these models are “unintended” – pesky perversions in which the set of naturals ω is uncountably large, as one example – but we’ll put this aside for this post). Most of the axioms act as constraints that all sets must follow. For instance, the axiom of pairing says that “For any sets x and y, there must exist another set containing as elements just x and y and nothing else”. This is an axiom that begins with universal quantification over the sets of the universe, and then states some requirement that must hold of all these sets.

The anti-set program is what you get when you take each of these restriction axioms, and negate the restriction it imposes on all sets. So, for instance, the axiom of anti-pairing says that for any sets x and y, there must NOT exist any set {x, y}. Contrast this with the simple negation of pairing, which would tell us only that there exist two sets x and y such that their pair doesn’t exist. The anti-axiom is much stronger than the negated axiom, in that it requires NO pairs to exist.

Not all axioms begin with universal quantifiers, in particular the axiom of infinity, which simply asserts that a set exists that satisfies a certain property. To form the axiom of anti-infinity, we simply negate the original axiom (so that no sets with that property exist).

As it turns out, the anti-set program, if applied to ALL the axioms of ZFC, ends in disaster and paradox. In particular, a contradiction can be derived from anti-comprehension, from anti-replacement, and from anti-extensionality. We don’t handle these cases the same way. Anti-comprehension and anti-replacement are simply discarded, being too difficult to patch. By contrast, anti-extensionality is replaced by ordinary extensionality. What’s up with that? The philosophical justification is simply that extensionality, being the most a priori of the bunch, is needed to justify us calling the objects in our universe sets at all.

There’s one last consideration we must address, which regards the axiom of anti-choice. I am currently uncertain as to whether adding this axiom makes the theory inconsistent. One thing that is currently known about the axiom of anti-choice is that with its addition, out go all finite models (there’s a really pretty proof of this that I won’t include here). In the rest of this post, I will be excluding anti-choice from the axioms and only exploring models of anti-ZF.

With that background behind us, let me list the axioms of anti-ZF.

∀x∀y∀z∃w (w ∈ z ∧ w ≠ x ∧ w ≠ y)
No set is the pair of two others.

∀x∀y∃z (z ∈ y ∧ ¬∃w (z ∈ w ∧ w ∈ x))
No set is the union of another.

∀x∀y∃z (z ∈ y ∧ ∃w (w ∈ z ∧ w ∉ x))
No set contains all existing subsets of another.

∀x∀y (y ∈ x → ∃z (z ∈ x ∧ z ∈ y))
Every set’s members have at least one element in common with it.

∀x∃y ((y is empty ∧ y ∉ x) ∨ (y ∈ x ∧ ∃z (z = S(y) ∧ z ∉ x)))
Every set either doesn’t contain all empty sets, or has an element whose successor is outside the set.

If the axioms of ZF are considered to be a maximally nice setting for mathematics, then perhaps the axioms of anti-ZF can be considered to be maximally bad for mathematics.

We need to now address some issues involving the axiom of anti-infinity. First, the abbreviations used: “y is empty” is shorthand for “∀z (z ∉ y)” and “z = S(y)” is shorthand for “∀w (w ∈ z ↔ (w ∈ y ∨ w = y))” (i.e. z = y ⋃ {y}).

Second, it turns out that from the other axioms we can prove that there are no empty sets. So the first part of the disjunction is always false, meaning that the second part must always be true. Thus we can simplify the axiom of anti-infinity to the following statement, which is logically equivalent in the context of the other axioms:

∀x∃y (y ∈ x ∧ ∃z (z = S(y) ∧ z ∉ x))
Every set has an element whose successor is outside the set.

Let’s now prove some elementary consequences of these axioms.


>> There are no empty sets.
Anti-Union ⊢ ¬∃x∀y (y ∉ x)
Suppose ∃x∀y (y ∉ x). Call this set ∅. So ∀y (y ∉ ∅) Then ⋃∅ = ∅. But this implies that the union of ∅ exists. This contradicts anti-union.

>> There are no one-element sets.
Anti-Pairing ⊢ ¬∃x∃y∀z (z ∈ x ↔ z = y)
Suppose that ∃x∃y∀z (z ∈ x ↔ z = y). Then ∃x∃y∀z (z ∈ x ↔ (z = y ∨ z = y)). But this is a violation of anti-pairing, as then x would be the pair of y and y. Contradiction.

>> There are no two-element sets.
Anti-Pairing ⊢ ¬∃x∃y∃z∀w (w ∈ x ↔ (w = y ∨ w = z))
Suppose that ∃x∃y∃z∀w (w ∈ x ↔ (w = y ∨ w = z)). Then w is the pair of y and z, so anti-pairing is violated. Contradiction.

>> No models of anti-ZF have just N-element sets (for any finite N).
Suppose that a model of anti-ZF had only N-element sets. Take any of these sets and call it X. By anti-infinity, X must contain an element with a successor that is outside the set. Call this element Y and its successor S(Y). Y cannot be its own successor, as then its successor would be inside the set. This means that Y ∉ Y. Also, Y is an N-element set by assumption. But since Y ≠ S(Y), S(Y) must contain all the elements of Y in addition to Y itself. So S(Y) contains N+1 elements. Contradiction.

>> There is no set of all sets.
Suppose that there is such a set, and call it X. By Anti-Infinity, X must contain an element Y with a successor that’s outside of X. But no set is outside of X, by assumption! Contradiction.

>> No N-element model of anti-ZF can have N-1 sets that each contain N-1 elements.
Suppose that this were true. Consider any of these sets that contain N-1 elements, and call it X. By the same argument made two above, this set must contain an element whose successor contains N elements. But there are only N sets in the model, so this is a set of all sets! But we already know that no such set can exist. Contradiction.

>> Every finite set of disjoint sets must be its own choice function.
By a choice function for a set X, I mean a set that contains exactly one element in common with each element of X. Suppose that X is a finite set of disjoint sets. Let’s give the elements of X names: A1, A2, A3, …, AN. By anti-foundation, each An must contain at least one element of X. Since the Ans are disjoint, these elements cannot be the same. Also, since there are only N elements of X, no An can contain more than one element of X. So each element of X contains exactly one element of X. Thus X is a choice function for X!

The next results come from a program I wrote that finds models of anti-ZF of a given size.

>> There are no models of size one, two, three or four.

>> There are exactly two non-isomorphic models of size five.

Here are pictures of the two:

Now, some conjectures! I’m pretty sure of each of these, especially Conjecture 2, but haven’t been able to prove them.

Conjecture 1: There’s always a set that contains itself.

Conjecture 2: There can be no sets of disjoint sets.

Conjecture 3: In an N-element model, there are never less than three sets with fewer than N-1 elements.

How to draw lambda diagrams

If you don’t want spoilers for my puzzle a few days ago, don’t read ahead!

I think lambda diagrams are extremely cool, and haven’t seen any detailed description on how they work online. I’ll start by showing some very simple examples of lambda diagrams, and then build up to more complicated ones.

First of all, what are lambda diagrams? They are pictorial representations of lambda expressions, and hence count as a pictorial system for a large portion of mathematics. I will assume that you understand how lambda calculus works for this post, and if you aren’t familiar then you can do no better than this video for a basic introduction.

The Basics

Let’s start simple: the lambda expression for “True”: λx.λy.x

Each time we see a new bound variable, we draw a horizontal line to represent that variable. So parsing from left to right, we start with a horizontal line for x.

Next is “λy.“, so we insert another horizontal line for y (below the first horizontal line).

Next we have an x in the function body, which we represent by a vertical line leaving the x line:

And there we have it, the lambda diagram for True!

Now let’s look at False: λx.λy.y

We start the same way as before, with two horizontal lines for our variables x and y:

But now we have a y in our function body, not an x. So we draw our vertical line leaving the y line instead of the x line:

And that’s the diagram for False! The actual diagram needn’t have the labels “x” and “y” in it, because our convention that we draw lines for new variables below existing horizontal lines uniquely specifies which lines stand for which variables.

Let’s now do a slightly more complicated function: λx.λy.y x

As before, we start with our two variables, x and y.

Now we have a y in the function body, so we add a vertical line leaving the y bar:

Next is an x, which is being fed as input to the y. We draw this as follows:

Two things have happened here: First I’ve moved the original vertical line for y over to the left to make space. Next, I’ve represented “feeding x to y as input” as a line leaving the x bar and terminating at the vertical line for y.

Take a guess what λx.λy.x y would look like!

Here it is:

Next, let’s look at λx.λy.x x y (which is incidentally the “or” function).

We start with the introduction of the variables x and y.

Next, an x:

And another x:

And finally a y:

Notice that this y connects below the first connection, to the original branch. What would it mean if it were connected to the second branch?

As you can see, this would indicate that y is first fed to x, and then the result of THAT is fed to x.

With these principles, you should now be able to draw out diagrams for much more complicated lambda expressions. Try this one: λx.λy.λz.z (y x) (y z x) x

Here it is!

Make sure that this makes sense before moving on!

Lambda Expressions within Lambda Expressions

Next, we’ll look at how to deal with lambda expressions that contain new lambda expressions within their function body. Here’s a simple example: λx.λy.x (λz.z) y

Everything up until the third λ is the same as always:

Now, to deal with our new variable z, we draw a new horizontal line. But importantly, this horizontal line comes after the vertical line for the x that has already been used!

After the “λz.” we have a “z“, so we draw a line from the z bar to our original vertical line leaving x.

And finally, after all of this we feed y to the original vertical line:

Try this one: λx.λy.x (λz.z z) (λw.λv.v (w w)) y

And here’s another one, where the inside function uses outside variables: λx.λy.x (λz.y z)

Alright, the final category you need to know to understand how to draw any lambda diagram whatsoever is…

Function Application

Suppose we have the lambda expression (λx.λy.x) (λz.z z). We first draw the lambda diagrams for each of the two component expressions side by side:

And now we simply attach the second to the first, indicating that the entire second lambda expression is fed as input to the first!

Here’s another example for you to try: (λx.x) (λy.λz.z z y) (λw.λv.v (v w))

And one more example, this one using all the tricks we’ve seen so far: (λx.x (λy.λz.z y y) (λw.λv.w (v x))) (λu.u u)

Beta Reduction

The final thing to know about lambda diagrams is how to actually do computations with them. The basic idea is that when we have one lambda diagram fed as input to another, we can substitute the “input” diagram for each vertical line leaving the topmost horizontal bar, and then erase this bar. Let’s look at some examples:

You can also do function application within larger lambda diagrams. Take a look at the following example, which is a calculation that shows that the successor of zero is one:

The first beta reduction here is just like the previous ones we’ve seen. But the second one does a substitution within the main lambda expression, as does the third. This works in much the same way as the earlier reductions we saw, the primary difference being that references to variables outside the scope of the function being applied must be maintained. You can see this in the final step above, where we remove the line representing the variable y, and attach the line connecting to it to the line representing the variable x.

Now for the fun part! I’ve found some great animations of calculations using lambda diagrams on Youtube. Each of them is using just the rules I’ve described above, nothing more! And I must say that the music is delightful. Take a look!


The Pictorial Mathematics of Klak-Adbmal

In the distant land of Klak-Adbmal, cut off from the rest of civilization, mathematicians have developed their own distinctive form of math. It is entirely pictorial; every mathematical concept is translated as a diagram consisting of criss-crossing horizontal and vertical lines, and calculations are manipulations of these diagrams.

Translators have so far been able to figure out the symbols for a few basic mathematical concepts, but there are many remaining gaps in our ability to translate Klak-Adbmalist mathematics. This image shows the symbols we’ve figured out so far, as well as a step-by-step calculation of the successor of 1 equaling 2.

The Klak-Adbmalians are very secretive and slow to give up their secrets. Can you figure out the answers to the two questions above on the right?

It’s worth noting that just as there are multiple ways that ordinary mathematicians can write a given number (2 ⋅ 5 = 8 + 2 = 10),  there are many different pictures for any given concept. If you need more assistance, you can take a look at this recently discovered drawing of a Klak-Adbmalian calculation proving that True Or False is True.

And one final hint: Recently we’ve received verification that the following two calculations are aways valid, no matter what Klak-Adbmalist pictures replace the red and green boxes:

Logic on Planet Zorko

A group of Zorkan mathematicians are sitting around having a conversation in a language that you are unfamiliar with. You are listening in with a translator. This translator is an expert in formal logic, and has decided to play the following game with you. He says:

“After listening to the full conversation, I will translate all the sentences that were said for you. But I won’t translate them into English; I want something more universal. Instead, I will choose a formal language that captures the mathematical content of all the sentences said, while leaving out the vagaries and subtleties of the Zorkan language. I will describe to you the semantics of the formal language I choose, if you don’t already know it.”

“Furthermore,” (says the translator) “I happen to be intimately familiar with Zorkan society and culture. The Zorkans are having a discussion about one particular mathematical structure, and I know which one that is. The mathematicians are all fantastically precise reasoners, such that none of them ever says a sentence that is false of the structure that they are discussing.”

(So for instance if they are talking about the natural numbers, then no mathematician will say “0 = 1”, and if they are talking about abelian groups, then no mathematician will say “∃x∃y (xy ≠ yx)”. But they could say “∃x∃y (xy ≠ yx)” if they are talking about non-abelian groups.)

You know nothing about Zorkan psychology, besides that the Zorkan way of life is so utterly foreign to you that you cannot reliably assume that the mathematical structures that come most naturally to you will also come naturally to them. It might be, for instance, that nonstandard models of arithmetic are much more intuitive to them than the natural numbers. You cannot assume that the structure they are discussing is the one that you think is “most natural”; you can only conclude this if one of them says a sentence that is true of that model and no others.

The conversation finishes, and you are tasked with answering the following two questions:

(1) What structure are they talking about?
(2) Can you come up with a verification procedure for the mathematicians’ sentences (including possible future sentences they might say on the topic)?

So, that’s the setup. Now, the question I want you to consider is the following: Suppose that the structure that the mathematicians have in mind is actually the natural numbers. Is there some conversation, any conversation at all (even allowing infinitely long conversations, and uncomputable conversations – conversations which cannot be produced as the output of any Turing machine), that the mathematicians could have, and some translation of this conversation, such that you can successfully answer both (1) and (2)? If so, what is that conversation? And if not, then why not?


Let’s work out some simple examples.

Example 1

Suppose the conversation is translated into a propositional language with three atomic propositions {P, Q, R}.

Mathematician A: “P ∨ Q”
Mathematician B: “(Q ∨ R) → (¬P)”
Mathematician C: “R”

From this conversation, you can deduce that the model they are talking about is the one that assigns “False” to P, “True” to Q, and “True” to R.

M: {P is false, Q is true, R is true}

This is the answer to the question 1!

As for the second question, we want to know if there’s some general procedure that produces all the future statements the mathematicians could make. For instance, the set generated by our procedure should include (Q ∧ R) but not (Q ∧ P).

It turns out that such a procedure does exist, and is not too difficult to write out and implement.

Example 2

Take the above conversation and modify it slightly:

Mathematician A: “P ∨ Q”
Mathematician B: “(Q ∨ R) → (¬P)”
Mathematician C: “¬R”

If you work it out, you’ll see that question 1 can no longer be answered unambiguously. The problem is that there are multiple models of the sentences that the mathematicians are saying:

M1: {P is false, Q is true, R is false}
M2: {P is true, Q is false, R is false}

So even though they have one particular structure in mind, you don’t have enough information from their conversation to figure out exactly what that structure is.

Now let’s think about the answer to question 2. We don’t know whether the mathematicians are thinking about M1 or M2, and M1 and M2 differ in what truth value they assign the proposition P. So we can’t construct an algorithm that will generate the set of all their possible future statements, as this would require us to know, in particular, whether P is true or false in the model that they have in mind.

We might suspect that this holds true generally: if you can’t answer question 1, then you won’t be able to answer question 2 either. But we might also wonder: if we can answer question 1, then can we also always answer question 2?

The answer is no, as the next example will show.

Example 3

For this conversation, the translation is in second-order logic. This will allow us to talk about more interesting mathematical structures than before; namely, structures that have a domain of objects on which functions and predicates can act. In particular, we’re in a second-order language with one constant symbol “c” and one function symbol “f”. Here’s the conversation:

Mathematician A: ¬∃x (f(x) = c)
Mathematician B: ¬∃x∃y ((f(x) = f(y)) ∧ ¬(x = y))
Mathematician C: ∀R (R(c) ∧ ∀x(R(x) → R(f(x))) → ∀x R(x))

Notice that the only truly second-order sentence is the third one, in which we quantify over a predicate variable R rather than an individual variable x, y, z, …. But the second-order status of this sentence it makes it that the translator could not have possibly translated this conversation into a first-order language, much less a propositional language.

This time, questions 1 and 2 are much harder to answer than before. But if you work it out, you’ll see that there is exactly one mathematical structure that satisfies all three of the mathematicians’ statements. And that structure is the natural numbers!

So, we know exactly what structure the mathematicians have in mind. But can we also answer question 2 in the positive? Can we produce some verification procedure that will allow us to generate all the future possible sentences the mathematicians could say? Unfortunately, the answer is no. There is no sound and complete proof system for second-order logic, so in particular, we have no general algorithm for producing all the truths in this second order language. So sad.

Example 4

Now let’s move to first-order logic for our final example. The language of translation will be a first order language with a constant symbol for every natural number {0,1,2,3,…}, function symbols for ordinary arithmetic {+, ×}, and relation symbols for orders {≥}

Imagine that the conversation consists of literally all the first-order sentences in the language that are true of the natural numbers. Anything which you can say in the language, and which is true as a statement about ℕ, will be said at some point. This will obviously be a very long conversation, and in fact infinitely long, but that’s fine. It will include sentences like “0 ≠ 1”, “0 ≠ 2”, “0 ≠ 3”, and so on.  (These Zorkans are extremely thorough.)

Given this conversation, can we answer (1) and (2)? Take a guess; the answer may surprise you!

It turns out that even though we can answer (2) positively – we can actually produce an algorithm that will generate one-by-one all the possible future statements of the mathematicians (which really means all the sentences in the language that are true of the natural numbers), we cannot answer (1) positively! There are multiple distinct mathematical structures that are compatible with the entirety of true statements about natural numbers in the language. Earlier we hypothesized that any time we have a negative answer to (1), we will also have a negative answer to (2). But this is not true! We can verify all the true statements about natural numbers in the language… without even knowing that we’re actually talking about the natural numbers! This is an important and unintuitive consequence of the expressive limitations (and in particular, of the compactness) of first-order logic.

The Takeaway

We had an example where we could answer both (1) and (2) for a simple mathematical structure (a model of propositional logic). And we saw examples for natural numbers where we could answer (1) but not (2), as well as examples where we could answer (2) but not (1). But we haven’t yet seen an example for natural numbers where we had both (1) and (2). This is no coincidence!

It is actually a consequence of the theorem I proved and discussed in my last post that no such such conversation can exist. When structures at least as complicated as the natural numbers are being discussed in some language (call it L), you cannot simultaneously (1) know for sure what structure is being talked about and (2) have an algorithmic verification system for L-sentences about the structure.