Divisibility Tricks and Orderly Numbers

A fun exercise is to think about the divisibility tricks you learned in grade school and generalize them to different bases. In base 10 we have:

N is divisible by 2 if its last digit is divisible by 2
N is divisible by 3 if the sum of its digits is divisible by 3
N is divisible by 4 if its last two digits are divisible by 4
N is divisible by 5 if its last digit is 0 or 5
N is divisible by 6 if N is divisible by 2 and 3
N is divisible by 7 if N = Xy for a single digit y and X – 2y is divisible by 7
N is divisible by 8 if its last three digits are divisible by 8
N is divisible by 9 if the sum of its digits is divisible by 9
N is divisible by 10 if its last digit is 0
N is divisible by 11 if the alternating sum of its digits is divisible by 11

There are a few natural categories of divisibility rules here. Divisibility by 3 and 9 are both tested by taking a sum of the digits of N. Divisibility by 11 is checked with an alternating sum. Divisibility by 2, 4, 5, and 8 is tested by looking at the last few digits of N. Divisibility by 6 is a combination of two other divisibility rules. And divisibility by 7 is in its own category.

Sum Test
To test divisibility of N by k, check if the sum of digits of N is divisible by k.

In base B, this test can be used whenever k divides B – 1. For example, in base B = 10, we use the sum test only to test divisibility by numbers that divide 9, i.e. 3 and 9.

Alternating Sum Test
To test divisibility of N by k, check if the alternating sum of digits of N is divisible by k.

In base B, this test can be used only if k divides B+1. For example, in base B = 10, we use the alternating sum test only to test divisibility by numbers that divide 11, i.e. 11.

Ending Test
To test divisibility of N by k, check if the last m digits of N is divisible by k.

In base B, this test can only be used if k divides B, or is a power of some number that divides B. For example, in base B = 10, we use the ending test only to test divisibility by numbers that divide 10, i.e. 2, 5, and 10, and powers of those numbers, i.e. 4, 8, 16, and 25.

Combination Tests
In any base B, to test divisibility of N by k = ij, where i and j are co-prime, it suffices to check divisibility by i and divisibility by j.

So for instance, divisibility by 6 can be checked by performing the tests for 2 and 3, but divisibility by 12 can’t be checked by performing the tests for 2 and 6.

Other Tests
The test for divisibility by 7 in base 10 doesn’t fit into any of the previous categories.

We’ll only be considering the first four categories.

There’s a nice way to see why these tests generalize in the way that they do. Let’s think about the sum test first.

Sum Test

We reason it through by induction. Suppose that we’re in base B, and checking divisibility by K < B. Our base case is that the sum of the digits of K is just K, which is divisible by K. For our inductive case, assume that every number n less than N, n is divisible by K if the digits of n add up to something divisible by K. Now, N = (N-K) + K, and (N-K) < N, so (N-K) satisfies the inductive hypothesis. This means that we just need to show that if we start with a number whose digits add to a multiple of K, and add K to this number, the sum of the digits of the resulting number will also be a multiple of K.

Break this into a few cases: First, adding K only affects the last digit. Then the last digit increases by K, so the sum of digits increases by K, and is thus still divisible by K.

What if adding K makes the last digit larger than B? Then we subtract B from the last digit, and add 1 to the second-to-last digit. So the net change in the digit sum is +1 + (K – B) = K – (B – 1)

But adding 1 to the second-to-last digit might also have resulted in a digit larger than B. In this case, we decrease the second-to-last digit is by B and add 1 to the the third-to-last digit. The net change here is -(B – 1).

If this causes the third-to-last digit to exceed B, then the same thing happens, and the net change in the sum of digits from those two is another -(B – 1)

In sum, this tells us that by adding K, the change in the digit sum is one of the following:

+K
+K – (B – 1)
+K – 2(B – 1)
+K – 3(B – 1)

In general, the change in the digit sum is +K – n(B – 1) for some n. If we now assume that (B-1) is a multiple of K, then the change in the digit-sum is also a multiple of K. And thus the resulting digit sum will be a multiple of K as well!

So for any K such that B – 1 is a multiple of K, if the digit sum of N is a multiple of K, then N is a multiple of K!

Alternating Sum Test

We can reason it through similarly for the alternating sum test. The possible changes to digits are:

+K
+1, +K – B
+1, +1 – B, +K – B
+1, +1 – B, +1 – B, +K – B
+1, +1 – B, +1 – B, +1 – B, +K – B

The alternating sum of this pattern, starting from the far right, gives:

K
(K – B) – 1
(K – B) – (1 – B) + 1
(K – B) – (1 – B) + (1 – B) – 1
(K – B) – (1 – B) + (1 – B) – (1 – B) + 1

The result of this sum is always either K or K – (B + 1). So as long as B+1 is a multiple of K, then the alternating sum will also change by a multiple of K!

Thus for any K such that B+1 is a multiple of K, if the alternating sum of the digits of N is a multiple of K, then N is a multiple of K.

If we disregard the “Other Tests” category, we can quite easily count for any given base how many digits will have sum, alternating sum, ending, or combination tests. People sometimes say that highly composite numbers would be ideal choices for numerical base systems, but those are really only good for ending tests, and not necessarily for sum or alternating sum tests.

I’ll define the notion of the orderliness of N as the amount of numbers 2 ≤ n < N that have a divisibility rule in base N, restricting divisibility rules to sum tests, alternating sum tests, ending tests, and combination tests.

Here’s a plot of the orderliness of bases from 3 to 100:

And from 3 to 1000:

The jaggedness of this graph suggests that some numbers make much better bases than their neighbors. By analogy to the notion of highly composite numbers, I’ll define highly ordered numbers as those whose orderliness is larger than all smaller numbers. These are numbers that would be ideal choices for numerical bases. The first few highly ordered numbers are:

3, 4, 5, 6, 8, 9, 10, 11, 14, 19, 20, 21, 29, 34, 49, 50, 55, 56, 69, 76, 90, 91, 99

Notice that 12 isn’t on this list, despite being highly composite! You might be able to guess why: though 12 has lots of factors, its neighbors 11 and 13 do not. On the other hand, 11 doesn’t have a lot of factors, but its neighbors 10 and 12 do, so it lands a spot on the list.

We could also measure orderability by what percentage of smaller numbers have rules for them. Here’s what that looks like:

What’s the probability that an election winner leads throughout the entire vote?

Try to answer the question in the title for yourself. An election has n votes for candidate A, m votes for candidate B, and no third candidate. Candidate A wins, so n > m. Assuming that the votes were equally likely to be counted in any order, what’s the probability that candidate A was always ahead? (Note that “always ahead” precludes the vote being tied at any point.)

My friend showed me this puzzle a couple of days ago and told me that there’s an extremely simple way to solve it. After trying a bunch of complicated things (including some wild Pascal’s triangle variants), I eventually relented and asked for the simple solution. Not only is there a short proof of the answer, but the answer is itself incredibly simple and elegant. See if you can do better than I did!

(…)

(…)

(…)

(Read on only after you’ve attempted the problem!)

(…)

(…)

(…)

So, here’s the solution. We start by considering the opposite of A always being ahead, which is that A is either behind or that the vote is tied at some point. Since we know that A eventually wins, A being behind at some point implies that at a later point A and B are tied. So the opposite of A always being ahead is really just that there is a tie at some point.

Pr(A is always ahead | n votes for A, m votes for B)
= 1 – Pr(A is behind at some point or tied at some point | n, m)
= 1 – Pr(There’s a tie at some point | n, m)

Now, let’s consider the probability of a tie at some point. There are two ways for this to happen: either the first vote is for A or the first vote is for B. The first vote being for B entails that there must be a tie at some point, since A must eventually pull ahead to win. This allows us to say the following:

Pr(Tie at some point | n, m)
= Pr(Tie at some point & A is first vote | n, m) + Pr(Tie at some point & B is first vote | n, m)
= Pr(Tie at some point & A is first vote | n, m) + Pr(B is first vote | n, m)
= Pr(Tie at some point & A is first vote | n, m) + m/(n+m)

Now, the final task is to figure out Pr(Tie at some point & A is first vote | n, m). If you haven’t yet figured it out, I encourage you to pause for a minute and think it over.

(…)

(…)

Alright, so here’s the trick. If there’s a tie at some point, then up to and including that point there are an equal number of votes for A and B. But this means that there are the same number of possible worlds in which A votes first as there are possible worlds in which B votes first! And this means that we can say the following:

Pr(Tie at some point & A is first vote | n, m)
= Pr(Tie at some point & B is first vote | n, m)

And this is exactly the probability we’ve already solved!

Pr(Tie at some point & B is first vote | n, m)
= Pr(B is first vote | n, m)
= m/(n+m)

And now we’re basically done!

Pr(Tie at some point | n, m)
= m/(n+m) + m/(n+m)
= 2m/(n+m)

Pr(A is always ahead | n, m)
= 1 – Pr(Tie at some point | n, m)
= 1 – 2m/(n+m)
= (n – m) / (n + m)

And there we have it: The probability that A is always ahead is just the difference in votes over the total number of votes! Beautiful, right?

We can even more elegantly express this as simply the percent of people that voted for A minus the percent that voted for B.

(n – m) / (n + m)
= n/(n+m) – m/(n+m)
=%A – %B

This tells us that even in the case where candidate A gets 75% of the vote, there’s still a 50/50 chance that they fall behind at some point!

An example of this: the recent election had Biden with 81,283,485 votes and Trump with 74,223,744 votes. Imagining that there were no third candidates, this would mean that Biden had 52.27% of the popular vote and Trump had 47.73%. And if we now pretend that the votes were equally likely to be counted in any order, then this tells us that there would only be a 9.54% chance that Biden would be ahead the entire time! Taking into account the composition of mail-in ballots, which were counted later, this means that Trump having an early lead was in fact exactly what we should have expected. The chance that Biden would have fallen behind at some point was likely quite a bit higher than 90.5%!

Quantum Chess

Recently I’ve been thinking about both quantum mechanics and chess, and came up with a chess variant that blends the two. The difference between ordinary chess and quantum chess is the ability to put your pieces in superpositions. Here’s how it works!

Movement

There are five modes of movement you can choose between in quantum chess: Move, Split, Merge, Collapse, and Branchmove.

Modes 1 and 2 are for non-superposed pieces, and modes 3, 4, and 5 are for non-superposed pieces.

Mode 1: Move

This mode allows you to move just as you would in an ordinary chess game.

Mode 2: Split

In the split mode, you can choose a non-superposed piece and split it between two positions. You can’t choose a position that is occupied, even by a superposed piece, meaning that splitting moves can never be attacks.

One powerful strategy is to castle into a superposition, bringing out both rooks and forcing your opponent to gamble on which side of the board to stage an attack on.

Mode 3: Merge

In mode 3, you can merge the two branches of one of your superposed pieces, recombining them onto a square that’s accessible from both branches.

You can’t merge to a position that’s only accessible from one of the two branches, and you can’t merge onto a square that’s occupied by one of your own superposed pieces, but merge moves can be attacks.

Mode 4: Collapse

Mode 4 is the riskiest mode. In this mode, you choose one of your superposed pieces and collapse it. There are two possible outcomes: First, it might collapse to the position you clicked on. In this case, you now have a choice to either perform an ordinary move…

… or to split it into a new superposition.

But if you get unlucky, then it will collapse to the position you didn’t select. In this case, your turn is over and it goes to your opponent.

Mode 5: Branch Move

Finally, in a branch move, you relocate just one branch of the superposition, without collapsing the wave-function or affecting the other branch.

Piece Interactions

Attacking a Superposed Piece

What happens if you attack a superposed piece? The result is that the superposed piece collapses. If the piece collapses onto the square you attacked, then you capture it.

But if it collapses onto the other branch of the superposition, then it is safe, and your piece moves harmlessly into the square you just attacked.

This means that attacking a superposed piece is risky! It’s possible for your attack to backfire, resulting in the attacker being captured next turn by the same piece it attacked.

It’s also possible for a pawn to move diagonally without taking a piece, in a failed attack.

Line of Sight

Superposed pieces block the lines of sight of both your pieces and your opponent’s pieces. This allows you to defend your pieces or block open files, without fully committing to defensive positions.

Winning the Game

To win quantum chess, you must actually take the opponent’s king. Let’s see why it’s not enough to just get the opponent into a position that would ordinarily be checkmate:

It’s blue’s turn now, and things look pretty lost. But look what blue can do:

Now the red queen has to choose one of the two targets to attack, and there’s a 50% chance that she gets it wrong, in which case the blue king can freely take the red queen, turning a sure loss into a draw!

So how can red get a guaranteed win? It takes patience. Rather than trying to attack one of the two squares, the red queen can hang back and wait for a turn.

Now the blue king has two choices: leave superposition, after which they can be taken wherever they go. Or move one branch of the superposition, but any possible branch move results in a safe shot at the king with the queen. This can be repeated until the king is taken.

And that’s quantum chess! I’ve played several games with friends, and each time have noticed interesting and surprising strategies arising. Let me leave you with a quantum chess puzzle. Here’s a position I found myself in, playing red:

What do you think was my best move here?

Biblical inerrancy

A puzzling phenomenon is the existence of Biblical inerrantists. It seems to me to be impossible to have both (1) carefully read the Bible and (2) come to the conclusion that it’s inerrant. Bart Ehrman talks about a possible explanation for this phenomenon in the order in which people read the Gospels: if you read the Gospels all the way through, one at a time, rather than reading them simultaneously, side by side, then it’s much easier for you to fail to notice all of the discrepancies. And boy are there discrepancies!

Virtually no story that appears in multiple Gospels is identical in the different tellings. That’s not hyperbole; literally from the story of Jesus’s birth all the way to his crucifixion and resurrection, there are unambiguous contradictions to be found the entire way. I don’t think that these contradictions make it rationally impossible to be a Christian, but they certainly do make it rationally impossible to be a Christian of an inerrantist tradition. And for more liberal Christians, they face a serious challenge of how they justify placing such enormous stock in the wording of a text that is known to be error-ridden.

There are just too many examples of blatant contradictions to go through them all. It’s a remarkable fact that many Christians that have read these stories throughout their lives are completely unaware that they disagree with one another! What I want to do here is just to pick one of the most well-known stories, the empty tomb. As I go through the story as each Gospel tells it, if at any moment you feel skeptical of what I’m saying, just go look at the verses being cited for yourself! The source of all the following quotes is the New International Version (NIV).

I’ve copied the entire “empty tomb” story as it’s told in each of the Gospels and highlighted the differences.

Now, let’s test your reading comprehension! In the story of the empty tomb, how many women come to the tomb, one, two, or three? Depends on who you read! According to Mark, it was three (the two Marys and Salome). According to Matthew, it’s two (the two Marys). According to John it’s just Mary Magdalene. And according to Luke it’s some unspecified number more than 1.

How many entities do they encounter at the tomb, and are they ordinary men or angels? According to Mark, they see one young man already inside the tomb. Luke says that after they enter, two men suddenly appear beside them. Matthew describes a violent earthquake preceding the arrival of an angel from heaven, while still outside the tomb. And John describes two angels inside the tomb (seen by Mary from the outside of the tomb). What’s more, in John and Matthew the woman/women see and talks to Jesus at the tomb! You have to agree that this is a pretty noteworthy thing for Mark and Luke to leave out.

Ok, how about the stone in front of the entrance? When the women/woman arrive(s), is the stone already rolled away from the tomb (as in Mark, Luke, and John), or is it moved later (as in Matthew)?

When the one/two men/angels speak to the woman/women, do they say that Jesus will meet the disciples in Galilee? In Mark and Matthew, yes. But in Luke and John there’s no mention of this! And in fact, in Luke the disciples don’t go to Galilee to meet Jesus at all! Jesus appears to disciples in Jerusalem and tells them to stay there, which they do until the end of the gospel (Luke 24:51-53). He ascends right outside of Jerusalem (1:9,12)

When the woman/women leave the tomb, do they describe what they saw to the disciples or not? According to Luke, Matthew, and John, yes. But not according to Mark; in Mark, the women flee and in their fear “say nothing to anyone”!

Did Peter ever visit the tomb? Not in Matthew or Mark, but in Luke and John he does. Is he by himself or with another disciple? Matthew says he’s by himself, John describes another disciple with him.

So much for the empty tomb! This level of contradiction is not special to this story. Think about Jesus’s death. When did he die? This is one of the most blatant contradictions in the Bible, because both John and Mark take great pains to explicitly lay out their chronology. According to Mark, Jesus and his disciples have their last supper on the evening of the Passover (Mark 14:12-17), and the following morning he is taken to be crucified (Mark 15:1). In John there is no last supper! John explicitly states in John 19:14 that Jesus is taken away from crufixion on “the day of Preparation of the Passover”, that is, the day before Passover!

This business about the last supper is actually pretty interesting; in John there is no last supper, but the author still manages to fit in some of the dinner table discussion early in the narrative. In Matthew, Mark, and Luke, it is during the last supper that Jesus says that the bread is his body (Mark 14:22, Matthew 26:26, Luke 22:19) and the wine his blood (Mark 14:24, Matthew 26:28, Luke 22:20). In John, these things are said some 12 chapters before his arrest and crucifixion (John 6:32-58). The context is ENTIRELY different within John; there, it comes up after the miracle where he multiplies the loaves of bread and fish. His disciples talk to him about this miracle, and he responds with the famous line “I am the bread of life”. This upsets some of the Jews around him, and starts an argument about how Jesus can BE the bread, in response to which Jesus doubles down and says “I am the living bread that came down from heaven” and “Very truly I tell you, unless you eat the flesh of the Son of Man and drink his blood, you have no life in you.” Thus the origin of this doctrine. Given the importance that Catholics place on the wording here, it should be disturbing to them that there’s such dramatic disagreement between the Gospels on the context in which he said it!

Let’s now quickly run through some more Biblical contradictions. Who was the father of Joseph, Jesus’s father? According to Matthew 1:15, “Jacob begat Joseph, the husband of Mary”, whereas according to Luke 3:23, Joseph is said to be “the son of Heli”. The genealogies presented in Matthew and Luke are virtually in complete disagreement starting two generations up from Jesus. Apologists will often argue that one of the two is presenting the maternal lineage rather than the paternal line, but this is far from obvious when you look at the wording, which is specifically about Joseph’s father not Mary’s (plus the fact that in both genealogies, the entire rest of the list follows only the paternal line.)

In Mark 5:21-24, Jairus comes to Jesus before his daughter dies and asks him to heal her (“My little daughter is dying. Please come and put your hands on her so that she will be healed and live”), but in Matthew 9:18-20 the daughter has already died by the time Jairus comes to Jesus (“My daughter has just died. But come and put your hand on her, and she will live”).

In Mark 15:37-39, the curtain that separates the holy of holies from the rest of the temple rips in two after Jesus dies. And in Luke 23:45-46, it rips before.

In Matthew 2:1-23, Joseph Mary and Jesus flee to Egypt (250 miles away) after Jesus’s birth, where they stay until King Herod dies, after which they resettle in Nazareth. But in Luke 2:1-40, Joseph and the fam do their rites of purification in Bethlehem after birth and return to Nazareth directly, 41 days after Jesus’s birth.

Also in Luke 2, it is described that Joseph and Mary travel to Galilee for a census declared by decree by Caesar Augustus to be “taken of the entire Roman world.” The problem with this is that we have good historical records of Caesar Augustus, and no such census took place!

One final one: in Mark 2:25-26, Jesus references an Old Testament passage about David eating unlawfully eating consecrated bread “in the days of Abiathar the high priest.” There’s a big problem with this: Jesus made a mistake! In the Old Testament passage, Abiathar wasn’t the high priest! The high priest was Ahimelech, whose son Abiathar would much later become high priest (1 Sam 21.1-7). So Christians have a choice to make between either Jesus not knowing his Old Testament or Mark not being an inerrant recording of Jesus’s sayings.

✯✯✯

All these contradictions are begging for an explanation. Is one or more of the authors lying? Not necessarily. Lying implies intention, and it’s worth keeping in mind the timeline of the Bible. Jesus is purported to have lived from 0 to 30 AD. Scholars unanimously agree that the earliest of the Gospels is Mark, and that it was originally written around 70 AD. Next were Matthew and Luke, both around 80-85 AD, and finally came John, around 90-95 AD. That’s a gap of 40 to 65 years from the events that are described! What’s more, the authors of Mark, Matthew, and John were almost certainly not the actual historical Mark, Matthew, and John (for a bunch of reasons I won’t get into now, most obviously that these texts were written in Greek by highly educated individuals and all three of these individuals were uneducated and would not have known Greek). And of course, Luke wasn’t a disciple and never met Jesus personally.

So the first texts that are written are from non-eyewitnesses recording an oral tradition that had started forty to sixty-five years before! In a forty-year game of telephone, nobody needs to have lied in order for the original story to become warped beyond recognition. Anybody that doubts that stories can become so dramatically altered over time need only think about the many Trump supporters that to this day insist that Trump’s inauguration had more attendees than Obama’s, despite LITERAL TIME LAPSE FOOTAGE of the entire thing and photographs all throughout. In one poll, Trump and Clinton voters that were handed these two photos, one from Obama’s inauguration and the other from Trump’s:

And guess what? 15% of Trump voters said that the left photo has more people! Suffice it to say, in the presence of emotionally charged topics like religion and politics, human brains start to act funny. Put this in context: this is an event from four years ago that we have video records of. And it’s somehow supposed to be unimaginable that 40 years of word-of-mouth transmission by religious believers made any significant changes to the original stories?

It’s even worse than this. These first texts are not what our modern Gospels are based off of, simply because we don’t have any copies of these first texts! The first copies of the texts that we possess come over A HUNDRED YEARS LATER, meaning that we have more than a hundred years of scribes making copies of copies of copies of the original texts. We know for a fact that these scribes were not perfect copyists, from the thousands of copies of the Gospels we possess, which abound in mistakes as small as spelling errors to as large as entire missing stories or new stories that had never appeared before. I’m sure that you know the story of the adulteress who Jesus defends, saying “he that is without sin among you, let him first cast a stone at her.” Did you know that this story appears in none of our earliest copies of the Gospels? Scholars unanimously agree that this story was added by scribes hundreds of years after the original writing, both because it literally doesn’t appear in copies earlier than that and also because the writing style is different from the rest of John.

So it seems to me like there really is no mystery here once you learn about the actual history of the texts. There are contradictions in the Bible because the Bible is an extremely imperfect copy of a copy of a copy of a … copy of a text written by non-eyewitnesses that heard stories told by people who had heard stories told by… people who had heard stories told by eyewitnesses to the events.

Formal Semantics 1: Historical Prelude and Compositionality

English is really complicated. For a long time, logicians looking at natural languages believed that there could be no formal system detailing their grammar and semantics. They resigned themselves to extremely simple idealized fragments of English, like propositional logic (formalizing “and”, “not”, and “or”) and first-order logic (formalizing “every”, “some”, and “is”).

The slogan of the time was “ordinary language has no logic” (Bertrand Russell and Peter Strawson). Chomsky famously argued that the languages invented by logicians were too artificial and entirely unlike natural languages, and that therefore the methods of logicians simply couldn’t be applied to this more complex realm.

This attitude has changed over time. Perhaps the most important figure in the “logic of natural language” movement is Richard Montague, a student of the giant of logic Alfred Tarski. The first line of his paper English as a Formal Language reads “I reject the contention that an important theoretical difference exists between formal and natural languages”, and he follows this up by more or less single-handedly invented formal semantics, now a thriving field. Hilariously, Montague apparently saw this work as child’s play, writing:

I (…) sat down one day and proceeded to do something that I previously regarded, and continue to regard, as both rather easy and not very important — that is, to analyze ordinary language.

(This had to hit hard for linguists of his time.)

Alright, enough prologue. In the next few posts I want to describe a naive first pass at formalizing a fairly substantial fragment of English, modeled off of Montague semantics. The key concept throughout will be the notion of compositionality, which I’ll briefly describe now.

Compositionality

Compositionality is all about how to construct the meaning of phrases from their smaller components. Take a sentence like “The cat sat on the mat.” The meaning of this sentence clearly has something to do with the meanings of “the cat” and “sat on the mat”. Similarly, the meaning of “sat on the mat” must have something to do with the meanings of “sat”, “on”, “the”, and “mat”.

The compositionality thesis says that this is all that determines the meaning of “the cat sat on the mat.” In other words, the meaning of any phrase is a function of the meanings of the individual words within it. These meanings are composed together in some way to form the meaning of the sentence as a whole.

The natural question that arises now is, what is the nature of this composition? Take a very simple example: “Epstein died.” According to compositionality, the meaning of “Epstein died” depends only on the meanings of “Epstein” and “died”. That seems pretty reasonable. What about: “Epstein died suspiciously”? How do we compose the meanings of the individual words when there are three?

One proposal is to compose all three simultaneously. That’s possible, but a simpler framework would have us build up the meanings of our sentences iteratively, composing two units of meaning at a time until we’ve generated the entire sentence’s meaning.

Let me now introduce some notation that allows us to say this compactly. If X is some word, phrase, or sentence, we’ll denote the meaning of X as ⟦X⟧. Then the principle of binary compositionality is just that there’s some function F such that ⟦X Y⟧ = F(⟦X⟧, ⟦Y⟧).

There’s two major questions that arise at this point.

First, in which order should we compose our units of meaning? Should we combine “Epstein” with “died” first, and then combine that with “suspiciously”? Or should it be “Epstein” and “suspiciously” first, then that with “died”? Or should we combine “Epstein” with the combination of “suspiciously” and “died”?

One might suggest here that the order actually doesn’t matter; no matter what order we combine the meanings in, we should still get the same meaning. The problem with this is that “The Clintons killed Epstein” has a different meaning than “Epstein killed the Clintons.” If order of composition didn’t matter, then we’d expect these to mean the same thing.

Second, how exactly does composing two meanings work? Is there a single rule for composition, or are there multiple different rules that apply in different contexts? It would be most elegant if we could find a single universal rule for generating meanings of complicated phrases from simple ones, but maybe that’s overambitious.

For instance, you might model the meaning of “died” as a set of objects, namely all those objects that died at some moment in the past, and the meaning of “Epstein” as one particular object in the universe. Then we might have our composition rule be the following: ⟦Epstein died⟧ will be a truth value, and it will be True if and only if the object denoted by “Epstein” is within the set of objects denoted by “died”. So in this framework, ⟦X Y⟧ = True if and only if ⟦X⟧ ∈ ⟦Y⟧.

This works nicely for “Epstein died”. But what about “Epstein died suspiciously”? Now we have two compositions to do, and the order of composition will matter. The problem is that no matter how we compose things, it seems not to work. Suppose that we combine “died” and “suspiciously” first, then combine “Epstein” with that. Using our model, ⟦died suspiciously⟧ will be True if and only if ⟦died⟧ ∈ ⟦suspiciously⟧, which is already a little bit weird. But even worse, ⟦Epstein died suspiciously⟧ will be True if and only if ⟦Epstein⟧ ∈ ⟦died suspiciously⟧. But what would it mean for the object denoted by “Epstein” to be an element of a truth value? It looks like in this framework, most three-word sentences end up becoming vacuously false.

Anyway, the last two paragraphs only show us that one particular attempt to formalize composition fails to be universal. It doesn’t show that it’s impossible in general. In fact, we’ll end up doing pretty well with a small set of composition rules centered around function application. The idea can be very simply phrased as: ⟦X Y⟧ = ⟦X⟧(⟦Y⟧). And in particular, the meaning of “Epstein died suspiciously” will be ⟦suspiciously⟧(⟦died⟧)(⟦Epstein⟧). And that’s enough warm-up! Next we’ll explore this idea further and dive into our Montague-style system.

Two more short and sweet proofs of propositional compactness

Proof 1 (countable language)

Let S be a countable set of sentences in a propositional language with atomic sentences p0, p1, p2, ….  Assume that S is finitely satisfiable. We want to build a truth assignment V that satisfies S.

We’ll assign truth values to the atomic sentences one at a time. Vn will be the partial truth assignment after assigning the first n truth values. If W agrees with Vn on its domain, then call W an extension of Vn.

We’ll now prove by induction that for each n, every finite subset of S is satisfied by some extension of Vn.

First of all, V0 is just the empty set, and every function is an extension of the empty set. So the hypothesis follows trivially from the finite satisfiability of S. 

Now, assume that every finite subset of S is satisfied by some extension of Vn. We’ll show that the same holds of Vn+1.

Suppose not. Then both of the following must be true:
(1) Some finite subset S0 of S is not satisfied by any extension of Vn ⋃ {(pn, T)}
(2) Some finite subset S1 of S is not satisfied by any extension of Vn ⋃ {(pn, F)}

S0 ⋃ S1 is not satisfied by any extension of Vn ⋃ {(pn, T)}, or any extension of Vn ⋃ {(pn, F)}. So it’s not satisfied by any extension of Vn, contradicting the inductive hypothesis since S0 ⋃ S1 is a finite set. This proves that every finite subset of S is satisfied by some extension of Vn, for each n.

Define V = U {Vn : n ∈ ω}. We now show that V satisfies S. Consider any sentence φ ∈ S. φ contains a finite number of atomic sentences, so there’s some n large enough that Vn assigns truth values to all sentence letters in φ.  {φ} is a finite subset of S, so some extension of Vn satisfies it. Every extension of Vn agrees on the assignments to the atomic sentences that appear in φ.  So since some extension of Vn makes φ true, all extensions of Vn must make φ true.  In particular, V makes φ true.

Proof 2 (any language)

Suppose S is a finitely satisfiable set of sentences, and let L be the set of atomic sentences.

A partial function g: L → {T, F} is called good if each finite subset of S is satisfied by some extension of g.

The good partial functions are a poset under ⊆, and since S is finitely satisfiable, ∅ is good. Furthermore, the union of any chain of good functions is a good function. Thus every chain has an upper bound, namely its union.

By Zorn’s lemma, there’s a maximal good function g. Since g is maximal, the domain of g is all of L.

We now show that g satisfies every element of S.
… Suppose φ ∈ S.
… Since g is good, it has an extension satisfying φ.
… But g already has all of L as its domain, so g satisfies φ.

So S is satisfiable!

The Theory of Knots

Knots

A knot is a closed loop in three dimensional space that doesn’t intersect itself. Some examples:

Knot diagrams are what you just saw: two dimensional representations of the three dimensional objects. It should be obvious that any one knot has many different diagrams. This raises an immediate question: given two knot diagrams, do they represent the same knot?

We need to be clear on what we mean by “the same knot”, of course. The intuition here is fairly obvious: you can rotate, stretch, and wiggle any section of a knot without fundamentally changing it.

What you CANNOT do is cut and reconnect the knot or pass one section of a knot through another of its sections.

Looking back at our starting five example knots, ask yourself if any of them represent the same knot:

It might not be too obvious, but 2 and 4 represent the same knot, and so do 1 and 5! These equivalences (especially the equivalence between 1 and 5) should hopefully give a flavor for the difficulty of the general problem of determining equivalence between knots. While it’s sometimes possible to answer this by mere examination, it’s more often impossible. Even in cases where it seems very intuitively obvious (like, for instance, that the trefoil is not just an unknot in disguise), it’s hard to come up with a way to PROVE this fact.

When faced by a hard problem, we should begin thinking of the structure of a general solution. What we’d really like is a recipe for an algorithm that takes in two knot diagrams and decides whether they represent the same knots. (For economy of space and ease of reading, I’ll proceed to refer to “knot diagrams that represent equivalent knots” as “equivalent knot diagrams”.)

Here’s a naive first pass at a solution: Take either of the two knot diagrams you’re given. Start wiggling it around in all the allowed ways. At each step, check if you’ve obtained the other knot diagram. If you ever do, return True. Otherwise return False.

Maybe you see the problem here. In fact there are at least two major problems. (I said it was a naive first pass.)

The first big problem is in the “otherwise return False”. At what point in the algorithm do you actually reach this “otherwise” clause? Assuming that the two knot diagrams really do represent different knots, then you can keep manipulating the starting knot diagram in more and more complicated ways forever. At each stage you might get original knot diagrams, and at no stage are you justified in giving up and returning False. In other words, even if this algorithm worked as I described it, it could only ever decide equivalence of two equivalent knot diagrams. If handed two inequivalent knot diagrams, it would run forever. Thus this is really an algorithm for semi-deciding knot-diagram equivalence, not deciding it.

The second big problem is in the “start wiggling it around in all the allowed ways.” How would we describe the set of allowed transformations of a knot diagram to a computer? In particular, could we simplify down the allowed transformations to a finite set that would ensure that we really had done ALL possible manipulations? The answer to this is obviously no. Consider the following transformation:

Each transformation like this produces a new knot diagram, simply by wiggling one side in an arbitrary way. But how many possible wiggles are there? Clearly an uncountable infinity: you can do arbitrarily small disruptions centered on any point along the knot, leaving the rest undisturbed, and get a distinct knot diagram each time.

The “uncountable infinity” aspect of this second problem can actually be quickly resolved by sharpening our original question. Rather than regarding a knot diagram as any two-dimensional projection of a knot, we can think about a knot diagram as defined by its various crossings. For instance, we can label each uninterrupted “arc” in a knot diagram, as well as each crossing point.

We now produce an alternative description of knot diagrams, obtained by describing how each arc interacts with each crossing point.

Crucially, our new description is not affected by the uncountably many small wiggles that do not make any substantial changes to the knot diagram. I.e., they do not affect the way that the various arcs cross under or above each other. We call this “equivalence of knot diagrams up to planar isotopy”, where planar isotopy is meant to refer to changes in a knot that don’t affect the structure of the crossings.

That’s one helpful step towards fixing up our algorithm. However, we are still in need of a finite set of transformations that we are sure generates ALL possible transformations on knot diagrams. I encourage you to think about this problem for a moment, and come up with a set of transformations to knot diagrams that you think are sufficient to produce knot diagrams.

(…)

(…)

I’ll leave a little space for you to try to come up with an answer for yourself. It’s not an easy exercise, but might be fun to test your knot-theoretic intuitions on.

(…)

(…)

Reidemeister Moves

It turns out that a set of six very simple transformations suffices for this task, discovered by Kurt Reidemeister. There are three categories of transformations:

If two knot diagrams are equivalent, then there is a sequence of Reidemeister moves that transforms one to the other (up to planar isotopy).

It’s pretty fun to try to construct such sequences for oneself, and in doing so you’ll develop an intuition for why repeated Reidemeister moves suffice to generate all transformations. Try the earlier example of the two diagrams for the trefoil! I can do it in ten moves, can you do better?

So, this is great news! Our naive first attempt at building an algorithm to determine equivalence can be implemented (though it still only semi-decides equivalence rather than deciding it). Choose one of the diagrams, and run through all finite sequences of Reidemeister moves in any order. At each stage we check if we get the other diagram by application of these moves, and if so return True.

So with the help of the Reidemeister moves, we’ve successfully semidecided the equivalence problem. But it turns out that we can do better. Reidemeister moves not only allow us to determine that two diagrams are equivalent, but they lead us to a powerful tool to determine that two diagrams are inequivalent. The key concept here is invariants.

Invariants

Imagine that we found some quantity Q, calculated from a knot diagram, such that no matter how we alter the knot diagram, that quantity never changes. Now if we’re given two knot diagrams D1 and D2, and find that Q(D1) ≠ Q(D2), then we have conclusively demonstrated the inequivalence of the two diagrams! Why? Because if there was some way to transform D1 into D2, and since every transformation holds fixed Q(D1), then Q(D2) would have to equal Q(D1)!

The hard part, of course, is finding invariants. But this is much easier when we have the help of Reidemeister moves! To show that Q(D) is an invariant quantity, all we need to do is show that the value of Q doesn’t change when we apply any Reidemeister move to the diagram D. This basically amounts to checking six simple cases, which is not too bad.

So, let’s go to our first invariant quantity: three-colorability.

3-Colorability

Three-colorability is quite simple: a knot is three-colorable if it’s possible to color each knot such that at each crossing either all three arcs are the same color, or all three are different.

Notice that any knot can be trivially 3-colored by just letting every arc be the same color; to avoid this we add the requirement that for a 3-coloring to be valid, it must use at least two distinct colors.

It’s a fun exercise to show that this quantity is invariant using Reidemeister moves. The simplest case is the twist. Essentially all we need to do is show that if we apply a twist to a 3-colorable diagram, we can still 3-color the new diagram, AND that there’s a way to “untwist” a 3-colorable diagram so that we still retain 3-colorability. The proof is actually quite trivial; note that the crossing at any twist must be monochrome, since two of the involved arcs are actually the same.

Try convincing yourself that 3-colorabiity is preserved under the other Reidemeister moves!

So, 3-colorability is an invariant. There’s a very nice consequence of this. The trefoil is three-colorable, and the unknot (having only one arc) is not.

So this proves that the trefoil can not be unknotted! This was probably intuitively obvious from the outset, but we can now rest comfortably in the certainty of a mathematical proof.

p-Colorability

A fun exercise: try to 3-color the figure-eight knot, and see why it’s not possible.

This shows that the figure-eight knot is not equivalent to the trefoil. But what about the unknot? Neither the unknot and the figure-eight knot are 3-colorable, and yet they are not equivalent! So 3-colorability, while quite cool, isn’t the whole story. However, there’s a nice generalization of 3-colorability to p-colorability for any prime p.

Think about colors as numbers. So our set of three colors from the last section will just be {0, 1, 2}. A coloring is then a function mapping arcs to {0, 1, 2}. This gives us an algebraic way of expressing the 3-colorability property:

If x, y, and z are the colors of the three arcs connecting at a crossing, with z being the arc that passes above the crossing, then 2z – x – y = 0 (mod 3). It’s easily checkable that this is equivalent to the condition that each crossing is either monochrome, or else the three arcs are different colors.

We generalize this to any prime p as follows:

Our set of colors will be {0, 1, …, p – 1}. If for each crossing, 2z – x – y = 0 mod p, and at least two colors are used, then the knot is p-colorable.

For each prime p, p-colorability is an invariant, meaning that we have now gone from one invariant quantity to a countable infinity of invariants! We also can now distinguish the figure-eight knot from the unknot: the first is 5-colorable, while the unknot is not.

Five-coloring the figure-eight – check for yourself that the algebraic condition (2z – x – y = 0 mod 5) is satisfied

An exercise: try 5-coloring the trefoil. Is it possible?

However, this is not the end of the story. For one thing, it’s not so easy to determine if an arbitrary knot is p-colorable. It’s doable, sure, by trying all the possible colorings, but we want a quicker method. This will be addressed in the next section.

Much more importantly, we’ll see in the next section that there are knot diagrams that have the same colorability profile (i.e. D1 is p-colorable iff D2 is p-colorable, for each p), and yet represent different knots. So we need an even more powerful invariant to discriminate these from one another.

Knot Determinant

Take any knot and label its arcs and crossings with natural numbers.

Then construct a matrix M, where the (i, j) component corresponds to arc i and crossing j. If arc i isn’t involved in crossing j, then Mij = 0. If arc i is involved in crossing j, and passes over it, then Mij = 2. And if arc i passes under crossing j, then Mij = -1.

Now, cross out any row and column of your choice, and take the determinant of the remaining matrix. Take its absolute value, and you have the knot determinant.

You know what I’m going to say next: the knot determinant is an invariant! Not only is it an invariant, but it also fully determines the colorability profile of a knot! The rule is that if a knot has knot determinant N, then for any prime p > 2, N is p-colorable if and only if p divides N.

For instance, say we calculate that a knot has determinant 33. We immediately conclude that the knot is 3-colorable, 11-colorable, and not p-colorable for any other prime.

This is pretty incredible. We now have a simple deterministic procedure for looking at a knot diagram and producing a number that gives us the entire colorability profile of the knot. But wait, there’s more!

Suppose we have two knot diagrams, one with determinant 15, and the other with determinant 45. These two indicate the same colorability profile: 3-colorable, 5-colorable, and nothing else. But since the determinants are different, the two knots are not equivalent! In other words, any two knots with different determinants that have all the same prime factors (but with possibly different powers of those prime factors) will have the exact same colorability profile, and will nonetheless be proven inequivalent by the knot determinant!

We’ve now reached the end of this dive into knot theory, but only because we’ve begun to near the end of my knowledge of the subject. By no means does the theory of knots end here. The knot determinant, amazing though it is, still does not fully pin down the different knots up to equivalence. There are distinct knots with the same determinant. If we were to dive further, we’d next look into a polynomial invariant, the Alexander polynomial. Going deeper, there’s hyperbolic invariants, involving the complement of a knot in hyperbolic space. Then there’s links, which are structures consisting of multiple knots linked together in some way, which have their own theory and periodic table. And we can even consider knotting spheres in four-dimensional space, or in general n-dimensional spheres in n+1-dimensional space! Plus there’s the operation of the knot sum, which we can use to combine distinct knots and get a new one. This operation is commutative and associative, and gives rise to the notion of prime and composite knots. And the coolness just keeps going on from here!

If nothing else, I hope I’ve convinced you that the subject of knots is of great mathematical interest and inspired you to look into it yourself. I’ll leave you with the first few rows of a (necessarily infinite) periodic table of knots, showing the prime knots up to seven crossings:

A Compact Proof of the Compactness Theorem

I’ve written up a proof of the compactness theorem before, but I recently looked it over and think that the proof can be expressed more, heh heh, compactly, than before.

So, what is the compactness theorem? It is the statement that if a set of statements is finitely satisfiable (each of its finite subsets has a model), then it’s satisfiable. The converse of this (a satisfiable set of statements is also finitely satisfiable) is trivial: a model of a set of sentences is also a model of every subset of that set. The following proof will be for propositional logic, but can be easily extended to first order logic.

The short version

Suppose that a set of sentences A is finitely satisfiable. We’ll extend A to a larger set B by giving A an “opinion” on every sentence. We can build up this extension in a series of stages: B0 is just A. Now, take any sentence a. If B0 ⋃ {a} is finitely satisfiable, define B1 to be B0 ⋃ {a}. Otherwise, define B1 to be B0 ⋃ {¬a}. Either way, B1 will be finitely satisfiable, because B0 cannot be inconsistent with both a and ¬a. When we’ve gone through every sentence, we take the union of all these extensions to form B. B is finitely satisfiable, since every finite subset of B is also a finite subset of one of the extensions that make it up. Now, we define a truth assignment V that assigns True to each propositional variable p if and only if p is in B. V satisfies B (which you can show by induction on sentences), and since A is a subset of B, V also satisfies A. So A is satisfiable.

The long(er) version

Suppose that a set of sentences A is finitely satisfiable. If a set of sentences A’ is finitely satisfiable, then for any sentence a, at least one of A’ ⋃ {a} and A’ ⋃ {¬a} is finitely satisfiable. If neither were, then we’d have two finite sets, one that entails ¬a and the other that entails a, and the union of these would be an unsatisfiable finite subset of A’.) So given any well-ordering of the sentences of the language, we can extend A one sentence at a time, maintaining finite satisfiability at each step. The union of all these extensions, call it B, is still finitely satisfiable, because any finite subset of B is also a finite subset of one of the extensions. B is also complete: for any sentence b either b ∈ B or ¬b ∈ B.

Now, define the truth assignment V as follows: for any atomic sentence b, V(b) = True iff b ∈ B. V satisfies B, which we prove by induction on sentences:

  • If b is an atomic sentence, then V(b) = True iff b ∈ B, by construction.
  • If b = ¬c for some c (for which V(c) = True iff c ∈ B), then V(b) = True iff V(c) = False iff c ∉ B iff b ∈ B.
  • If b = c ∧ d for some c and d that satisfy the induction hypothesis, then V(b) = V(c ∧ d) = True iff V(c) = V(d) = True iff c ∈ B and d ∈ B. If both c and d are in B, then ¬(c ∧ d) can’t be in B by finite satisfiability, so c ∧ d ∈ B by completeness of B. And if c ∧ d ∈ B, then neither ¬c nor ¬d can be in B by finite satisfiability. So c and d are both in B by completeness of B.

This proof by induction covers the connectives ¬ and ∧, with which you can express all other connectives, so this shows that V satisfies B. And since B is a superset of A, V satisfies A as well. This shows that any finitely satisfiable set of sentences is also satisfiable. Note that we used the axiom of choice implicitly with the instruction to well-order the language. As the language can be any size, this is equivalent to the well-ordering principle, which is equivalent in ZF to the axiom of choice. The same sort of assumption arises in the proof of the completeness theorems for first-order and propositional logics. If you reject choice, then you should also be skeptical that propositional logic and first-order logics are complete!

ZFC, and getting the right answer by accident

There’s something I’m confused about with regards to ZFC. We already know that no first order theory allows you to perfectly capture the semantics of concepts like the natural numbers or finitely many. The best we can do if restricted to first-order logic (which is where most of modern mathematics tries to stay) is to have theories with surrogates for those concepts.

For instance, there’s this object called ω that stands in for the natural numbers in ZFC, even though there are all these models of ZFC in which ω contains many more objects besides the natural numbers (in fact, uncountably many in some models). And “X is finite” is replaced by “there’s a bijection from X to an element of ω.”

In formalizing these surrogate concepts, we make sure to not include any false statements about the concepts, which gives us a type of soundness of our surrogate concept. I.e., ZFC isn’t going to prove things about ω that are false of the set of natural numbers, because in one model ω is the set of natural numbers.

But it doesn’t give us completeness. We aren’t going to be able to prove ALL the true first-order sentences about the natural numbers, or about the concept of finiteness. (This is of course just a product of the first-order nature of the theories in which we’re defining these surrogates.)

So in each of these cases there will be true statements involving the concept that our theory will be agnostic about. We can look for “really good” surrogates, surrogates which allow us to prove most of the important and interesting true statements about the concept, and only fail in very abstract and unusual settings. The degree to which we can make good surrogates is the degree to which a first-order thinker can make sense of and usefully apply non-first-order concepts. (A first-order thinker being one that has no built-in understanding of non-first-order concepts.)

So one general question is: how good is a given surrogate? And another is, how do we know based on the axioms how good of a surrogate we’ll be getting? This is the thing I’m confused about.

In ZFC, there’s this weird phenomenon of the theory “getting the right answers accidentally.” It’s a little tough to put into words, but here’s an example:

ZFC can prove that |P(X)| > |X| for all X. So for instance ZFC can prove that |P(ω)| > |ω|. Meaning that ZFC can prove that the power set of the naturals is uncountable. But ZFC has countable models! (As does every first-order theory.) In those models, the power set of the naturals is NOT uncountable.

First order logic is sound, so what’s going on ISN’T that ZFC is proving a sentence that’s false in some of its models. It’s that the sentence is false in that model, if interpreted to be about the actual concept, and true in that model if interpreted to be about the surrogate concept. The surrogate for “P(ω) is uncountable” in ZFC is “there’s no bijection from P(ω) to ω”. And in the countable models of ZFC, the bijection that would prove the equinumerosity of ω and P(ω) is missing! So in those models, it’s actually TRUE that “there’s no bijection from P(ω) to ω”, even though P(ω) and ω really do have the same number of elements.

This is the sort of subtle nonsense you get used to in mathematical logic. Two accidents cancel out: first ZFC makes the mistake of having models where P(ω) is countable, and second it makes the mistake of losing track of the bijection from P(ω) and ω. And as a result, ZFC is able to prove the correct thing.

This seems really weird to me, and it’s an unsettling way for a supposed foundations of mathematics to be getting the right answers. This type of thing happens a lot, and it feels like we keep getting “lucky” in that our unintended interpretations of a sentence interfere with each other and cancel out their problematic features. It makes me wonder why we should be confident that ZFC will continue giving us the right answers (as opposed to being agnostic on important basic questions). And in fact we do have some examples of important and basic questions that ZFC is agnostic on, most dramatically that if |X| < |Y|, then |P(X)| < |P(Y)|!

It’s not that I doubt that ZFC’s surrogates for non-first order concepts end up allowing us to prove an enormous amount of the true facts about these concepts, because they clearly do. But is there some principled reason we can give for WHY the axioms of ZFC lead us to such a robust framework for mathematics in which we can prove many true statements?

One thing that suggests this is the power of the axiom of replacement. It’s just an extremely strong and useful axiom that allows us to prove a whole lot. But that doesn’t seem to help explain the “right-by-accident” phenomenon. So what does?

Polish Notation and Garden-Path Sentences

Polish notation is a mathematical notation system that allows you to eliminate parentheses without ambiguity. It’s called “Polish” because the name of its Polish creator, Jan Łukasiewicz, was too difficult for people to pronounce.

A motivating example: Suppose somebody says “p and q implies r”. There are two possible interpretations of this: “(p and q) implies r” and “p and (q implies r)”. The usual way to disambiguate these two is to simply add in parentheses like I just did. Another way is to set an order-of-operations convention, like that “and” always applies before “implies”. This is what’s used in basic algebra, and what allows you to write 2 + 2 ⋅ 4 without any fear that you’ll be interpreted as meaning (2 + 2) ⋅ 4.

Łukasiewicz’s method was to make all binary connectives into prefixes. So “A and B” because “and A B”, “P implies Q” becomes “implies P Q”, and so on. In this system, “(p and q) implies r” translates to “implies and p q r”, and “p and (q implies r)” translates to “and p implies q r”. Since the two expressions are different, there’s no need for parentheses! And in general, no ambiguity ever arises from lack of parentheses when using Polish notation.

If this is your first time encountering Polish notation, your first reaction might be to groan and develop a slight headache. But there’s something delightfully puzzling about reading an expression written in Polish notation and trying to understand what it means. Try figuring out what this means: “implies and not p or q s r”. Algebra can be written in Polish notation just as easily, removing the need for both parentheses AND order-of-operations. “2 + 2 = 4” becomes “+ 2 2 = 4”, or even better, “= + 2 2 4”.

Other binary connectives can be treated in Polish notation as well, creating gems like: “If and you’re happy you know it clap your hands!” “When life is what happens you’re busy making plans.” “And keep calm carry on.” “Therefore I think, I am.” (This last one is by of the author the Meditations). Hopefully you agree with me that these sentences have a nice ring to them, though the meaning is somewhat obscured.

But putting connectives in front of the two things being connected is not unheard of. Some examples in English: “ever since”, “because”, “nonwithstanding”, “whenever”, “when”, “until”, “unless”. Each of these connects two sentences, and yet can appear in front of both. When we hear a sentence like “Whenever he cheated on a test the professor caught him”, we don’t have any trouble parsing it. (And presumably you had no trouble parsing that entire last sentence either!) One could imagine growing up in a society where “and” and “or” are treated the same way as “ever since” and “until”, and perhaps in this society Polish notation would seem much more natural!

Slightly related to sentential connectives are verbs, which connect subjects and objects. English places its verbs squarely between the subject and the object, as does Chinese, French, and Spanish. But in fact the most common ordering is subject-object-verb! 45% of languages, including Hindi, Japanese, Korean, Latin, and Ancient Greek, use this pattern. So for instance, instead of “She burned her hand”, one would say “she her hand burned”. This is potentially weirder to English-speakers than Polish notation; it’s reverse Polish notation!

9% of languages use Polish notation for verbs (the verb-subject-object pattern). These include Biblical Hebrew, Arabic, Irish, and Filipino. In such languages, it would be grammatical to say “Loves she him” but not “She loves him”. (3% of languages are VOS – loves him she – 1% are OVS – him loves she – and just a handful are OSV – him she loves).

Let’s return to English. Binary prepositions like “until” appear out front, but they also swap the order of the two things that they connect. For instance, “Until you do your homework, you cannot go outside” is the same as “You cannot go outside until you do your homework”, not “You do your homework until you cannot go outside”, which sounds a bit more sinister.

I came up with some examples of sentences with several layers of these binary prepositions to see if the same type of confusion as we get when examining Polish notation for “and” or “implies” sets in here, and oh boy does it.

Single connective
Since when the Americans dropped the bomb the war ended, some claimed it was justified.

Two connectives, unlayered
Since when the Americans dropped the bomb the war ended, when some claimed it was an atrocity others argued it was justified.

Still pretty readable, no? Now let’s layer the connectives.

One layer
Whenever he was late she would weep.
She would weep whenever he was late.

Two layers
Since whenever he was late she would weep, he hurried over.
He hurried over, since she would weep whenever he was late.

Three layers
Because since whenever he was late she would weep he hurried over, he left his wallet at home.
He left his wallet at home, because he hurried over since she would weep whenever he was late.

Four layers
Because because since whenever he was late she would weep he hurried over he left his wallet at home, when he was pulled over the officer didn’t give him a ticket.
The officer didn’t give him a ticket when he was pulled over, because he left his wallet at home because he hurried over since she would weep whenever he was late.

Five layers
When he heard because because since whenever he was late she would weep he hurried over he left his wallet at home, when he was pulled over the officer didn’t give the man a ticket, the mayor was outraged at the lawlessness.
The mayor was outraged at the lawlessness when he heard the officer didn’t give the man a ticket when he was pulled over because he left his wallet at home because he hurried over since she would weep whenever he was late.

Read that last one out loud to a friend and see if they believes you that it makes grammatical sense! With each new layer, things become more and more… Polish. That is, indecipherable. (Incidentally, Polish is SVO just like English). Part of the problem is that when we have multiple layers like this, phrases that are semantically connected can become more and more distant in the sentence. It reminds me of my favorite garden-path sentence pattern:

The mouse the cat the dog chased ate was digested.
(The mouse that (the cat that the dog chased) ate) was digested.
The mouse (that the cat (that the dog chased) ate) was digested.

The phrases that are meant to be connected, like “the mouse” and “was digested” are sandwiched on either side of the sentence, and can be made arbitrarily distant by the addition of more “that the X verbed” clauses.

Does anybody know of any languages where “and” comes before the two conjuncts? What about “or”? English does this with “if”, so it might not be too much of a stretch.