Polish Notation and Garden-Path Sentences

Polish notation is a mathematical notation system that allows you to eliminate parentheses without ambiguity. It’s called “Polish” because the name of its Polish creator, Jan Łukasiewicz, was too difficult for people to pronounce.

A motivating example: Suppose somebody says “p and q implies r”. There are two possible interpretations of this: “(p and q) implies r” and “p and (q implies r)”. The usual way to disambiguate these two is to simply add in parentheses like I just did. Another way is to set an order-of-operations convention, like that “and” always applies before “implies”. This is what’s used in basic algebra, and what allows you to write 2 + 2 ⋅ 4 without any fear that you’ll be interpreted as meaning (2 + 2) ⋅ 4.

Łukasiewicz’s method was to make all binary connectives into prefixes. So “A and B” because “and A B”, “P implies Q” becomes “implies P Q”, and so on. In this system, “(p and q) implies r” translates to “implies and p q r”, and “p and (q implies r)” translates to “and p implies q r”. Since the two expressions are different, there’s no need for parentheses! And in general, no ambiguity ever arises from lack of parentheses when using Polish notation.

If this is your first time encountering Polish notation, your first reaction might be to groan and develop a slight headache. But there’s something delightfully puzzling about reading an expression written in Polish notation and trying to understand what it means. Try figuring out what this means: “implies and not p or q s r”. Algebra can be written in Polish notation just as easily, removing the need for both parentheses AND order-of-operations. “2 + 2 = 4” becomes “+ 2 2 = 4”, or even better, “= + 2 2 4”.

Other binary connectives can be treated in Polish notation as well, creating gems like: “If and you’re happy you know it clap your hands!” “When life is what happens you’re busy making plans.” “And keep calm carry on.” “Therefore I think, I am.” (This last one is by of the author the Meditations). Hopefully you agree with me that these sentences have a nice ring to them, though the meaning is somewhat obscured.

But putting connectives in front of the two things being connected is not unheard of. Some examples in English: “ever since”, “because”, “nonwithstanding”, “whenever”, “when”, “until”, “unless”. Each of these connects two sentences, and yet can appear in front of both. When we hear a sentence like “Whenever he cheated on a test the professor caught him”, we don’t have any trouble parsing it. (And presumably you had no trouble parsing that entire last sentence either!) One could imagine growing up in a society where “and” and “or” are treated the same way as “ever since” and “until”, and perhaps in this society Polish notation would seem much more natural!

Slightly related to sentential connectives are verbs, which connect subjects and objects. English places its verbs squarely between the subject and the object, as does Chinese, French, and Spanish. But in fact the most common ordering is subject-object-verb! 45% of languages, including Hindi, Japanese, Korean, Latin, and Ancient Greek, use this pattern. So for instance, instead of “She burned her hand”, one would say “she her hand burned”. This is potentially weirder to English-speakers than Polish notation; it’s reverse Polish notation!

9% of languages use Polish notation for verbs (the verb-subject-object pattern). These include Biblical Hebrew, Arabic, Irish, and Filipino. In such languages, it would be grammatical to say “Loves she him” but not “She loves him”. (3% of languages are VOS – loves him she – 1% are OVS – him loves she – and just a handful are OSV – him she loves).

Let’s return to English. Binary prepositions like “until” appear out front, but they also swap the order of the two things that they connect. For instance, “Until you do your homework, you cannot go outside” is the same as “You cannot go outside until you do your homework”, not “You do your homework until you cannot go outside”, which sounds a bit more sinister.

I came up with some examples of sentences with several layers of these binary prepositions to see if the same type of confusion as we get when examining Polish notation for “and” or “implies” sets in here, and oh boy does it.

Single connective
Since when the Americans dropped the bomb the war ended, some claimed it was justified.

Two connectives, unlayered
Since when the Americans dropped the bomb the war ended, when some claimed it was an atrocity others argued it was justified.

Still pretty readable, no? Now let’s layer the connectives.

One layer
Whenever he was late she would weep.
She would weep whenever he was late.

Two layers
Since whenever he was late she would weep, he hurried over.
He hurried over, since she would weep whenever he was late.

Three layers
Because since whenever he was late she would weep he hurried over, he left his wallet at home.
He left his wallet at home, because he hurried over since she would weep whenever he was late.

Four layers
Because because since whenever he was late she would weep he hurried over he left his wallet at home, when he was pulled over the officer didn’t give him a ticket.
The officer didn’t give him a ticket when he was pulled over, because he left his wallet at home because he hurried over since she would weep whenever he was late.

Five layers
When he heard because because since whenever he was late she would weep he hurried over he left his wallet at home, when he was pulled over the officer didn’t give the man a ticket, the mayor was outraged at the lawlessness.
The mayor was outraged at the lawlessness when he heard the officer didn’t give the man a ticket when he was pulled over because he left his wallet at home because he hurried over since she would weep whenever he was late.

Read that last one out loud to a friend and see if they believes you that it makes grammatical sense! With each new layer, things become more and more… Polish. That is, indecipherable. (Incidentally, Polish is SVO just like English). Part of the problem is that when we have multiple layers like this, phrases that are semantically connected can become more and more distant in the sentence. It reminds me of my favorite garden-path sentence pattern:

The mouse the cat the dog chased ate was digested.
(The mouse that (the cat that the dog chased) ate) was digested.
The mouse (that the cat (that the dog chased) ate) was digested.

The phrases that are meant to be connected, like “the mouse” and “was digested” are sandwiched on either side of the sentence, and can be made arbitrarily distant by the addition of more “that the X verbed” clauses.

Does anybody know of any languages where “and” comes before the two conjuncts? What about “or”? English does this with “if”, so it might not be too much of a stretch.

A Self-Interpreting Book

A concept: a book that starts by assuming the understanding of the reader and using concepts freely, and as you go on it introduces a simple formal procedure for defining words. As you proceed, more and more words are defined in terms of the basic formal procedure, so that halfway through, half of the words being used are formally defined, and by the end the entire thing is formally defined. Once you’re read through the whole book, you can start it over and read from the beginning with no problem.

I just finished a set theory textbook that felt kind of like that. It started with the extremely sparse language of ZFC: first-order logic with a single non-logical symbol, ∈. So the alphabet of the formal language consisted of the following symbols: ∈ ( ) ∧ ∨ ¬ → ↔ ∀ ∃ x ‘. It could have even started with a sparser formal language if it was optimizing for alphabet economy: ∈ ( ∧ ¬ ∀ x ‘ would suffice. As time passed and you got through more of the book, more and more things were defined in terms of the alphabet of ZFC: subsets, ordered pairs, functions from one set to another, transitivity, partial orders, finiteness, natural numbers, order types, induction, recursion, countability, real numbers, and limits. By the last chapter it was breathtaking to read a sentence filled with complex concepts and realize that every single one of these concepts was ultimately grounded in this super simple formal language we started with, with a finitistic sound and complete system of rules for how to use each one.

But could it be possible to really fully define ALL the terms used by the end of the book? And even if it were, could the book be written in such a way as to allow an alien that begins understanding nothing of your language to read it and, by the end, understand everything in the book? Even worse, what if the alien not only understands nothing of your language, but starts understanding nothing of the concepts involved? This might be a nonsensical notion; an alien that can read a book and do any level of sophisticated reasoning but doesn’t understand concepts like “and” and “or“.

One way that language is learned is by “pointing”: somebody asks me what a tree is, so I point to some examples of trees and some examples of non-trees, clarifying which is and which is not. It would be helpful if in this book we could point to simple concepts by means of interactive programs. So, for instance, an e-book where an alien reading the book encounters some exceedingly simple programs that they can experiment with, putting in inputs and seeing what results. So for instance, we might have a program that takes as input either 00, 01, 10, or 11, and outputs the ∧ operation applied to the two input digits. Nothing else would be allowed as inputs, so after playing with the program for a little bit you learn everything that it can do.

One feature of such a book would be that it would probably use nothing above first-order logical concepts. The reason is that the semantics of second-order logic cannot be captured by any sound and complete proof system, meaning that there’s no finitistic set of rules one could explain to an alien so that they know how to use the concepts involved correctly. Worse, the set of second-order tautologies is not even recursively enumerable (worse than the set of first-order tautologies, which is merely undecidable), so no amount of pointing-to-programs would suffice. First-order ZFC can define a lot, but can it define enough to write a book on what it can define?

Transfinite Nim: uncomputable games and games whose winner depends on the Continuum Hypothesis

In the game of Nim, you start with piles of various (whole number) heights. Each step, a player chooses one pile and shrinks it by some non-zero amount. Once a pile’s height has been shrunk to zero, it can no longer be selected by a player for shrinking. The winner of the game is the one that takes the last pile to zero.

Here’s a sample game of Nim:

Starting state
3, 2
After Frank’s move
2, 2
After Marie’s move
2, 1
After Frank’s move
0, 1
After Marie’s move
0, 0

Marie takes the last pile to zero, so she is the winner. Frank’s second-to last move was a big mistake; by reducing the first pile from 2 to 0, he left the only remaining pile free to be taken by Marie. In a game of Nim, you should never leave only one pile remaining at the end of your turn. If Frank had instead shrunk the first pile from 2 to 1, then the state of the piles would be (1, 1). Marie would be forced to shrink one of the two piles to zero, leaving Frank to take the final pile and win.

The strategy of Nim with two piles is extremely simple: in your turn you should always even out the two piles if possible. This is only possible if the heights are different at the start of your turn. See if you can figure out why this strategy guarantees a win!

Transfinite Nim is a version of Nim where the piles are allowed to take infinite ordinal values. So for instance, a game might have the starting position:

Starting state
ω2 + ω, ω1 + ε0

If Marie is moving first, then can she guarantee a win? What move should she make?

It turns out that the strategy for two-pile Transfinite Nim is exactly the same as for Finite Nim. Marie has a guaranteed win, because the two piles are different values. Each move she’ll just even the piles out. So for her first move, she should do the following:

Starting state
ω2 + ω, ω1 + ε0
After Marie’s move
ω2 + ω, ω2 + ω

No matter what Frank does next, Marie can just “copy” that move on the other pile, guaranteeing that Marie always has a move as long as Frank does. This proves that Marie must have the last move, and therefore win.

One important feature of Transfinite Nim is that even though we’re dealing with infinitely large piles, every game can only last finitely long. In other words, Frank has no strategy for delaying his loss infinitely long, and thus forcing a sort of “stalemate by exhaustion.” This is because the ordinals are well-ordered, and any decreasing sequence of well-ordered items must terminate. (Why? Just consider the definition of a well-ordered set: every subset has a least element. If the game were to continue infinitely long, each step decreasing the state but never terminating, then the sequence of states would be a subset of the ordinals which has no least element!)

Although the strategy of Transfinite Nim is no more interesting than Finite Nim, the game does have some interesting features that it inherits from the ordinals. There are sets of ordinal numbers such that the ordering between them is uncomputable (i.e. that no Turing machine can take as input any two ordinals from that set and correctly assess whether one is larger than the other). For such sets, the ability to compute a winning strategy is called into question.

For instance, the set of all countable ordinals is uncomputable. The quick proof is that there are uncountably many countable ordinals – otherwise in ZFC the set of countable ordinals would itself be a countable ordinal and would thus contain itself – and any Turing machine can only compare countably many things.

The smallest uncomputable ordinal (which, in ZFC, is exactly the set of all computable ordinals) is called the Church Kleene ordinal and written ω1CK. Imagine the starting state of the game is two different ordinals that are both larger than ω1CK. If you’re moving first, then you have to determine which of the two ordinals is larger, in order to even them out. But this is not in general possible! So even if you go first and the two piles are different sizes, you might not be able to guarantee a win.

Suppose Marie is allowed uncomputable strategies, and Frank is only allowed computable strategies. Suppose further that the starting state involves two ordinals A and B, both larger than the Church-Kleene, and that the ordinals are expressed in some standard notation (so that you can’t write the same ordinal two different ways). There are a few cases.

Case 1: A = B, Marie goes first.
Marie decreases one of the two ordinals. Despite not being able to compute the order on the ordinals, Frank can just mimic her move. This will continue until Frank wins.

Case 2: A = B, Frank goes first.
Frank decreases one of the two ordinals, and Marie mimics. Marie eventually wins.

Case 3: A ≠ B, Marie goes first.
Marie can tell which of the ordinals is larger, and decreases that one to even out the two piles. Marie wins.

Case 4: A ≠ B, Frank goes first.
Frank can’t tell which of the ordinals is larger and can’t try to even them out, as doing so might result in an invalid move (trying to increase the smaller pile to the height of the larger one). So Frank does some random move, after which Marie is able to even out the two piles. Marie wins.

Finally, here’s a starting state for a game of Transfinite Nim:

ω1, ℶ1

ω1 is the first uncountable ordinal, and ℶ1 is the first ordinal with continuum cardinality. Frank goes first. Does he have a winning strategy?

It depends on whether ω1 = ℶ1. If the two are equal, then Frank can’t win, because he’s starting with two even piles. And if ω1 < ℶ1, then Marie can’t win, because Frank can decrease the ℶ1 pile to ω1.

If we suppose that the players must be able to prove a move’s validity in ZFC before playing that move, then the first player couldn’t decrease the ℶ1 pile to ω1. The first player still has to do something, and whatever he does will change the state to two ordinals that are comparable by ZFC.

What about larger starting ordinals whose size comparison is independent of ZFC, like ω15 and ℶ15? If the new state after the first player’s move move also involves two ordinals whose size comparison is independent of ZFC, then the second player will also be unable to even them out. This continues until one of them eventually decreases a pile to an ordinal whose size is comparable by ZFC to the other pile. So the winner will depend on who knows more pairs of ordinals less than the starting values with values that ZFC can’t compare. In fact, each player wants to force the other player to make the values ZFC-comparable, so they’ll be able to even the piles out on their turn.

The Anti-Set Program

In ZFC set theory, we specify a collection of sentences within a first-order language to count as our axioms. The models of this collection of sentences are the set-theoretic universes (and many of these models are “unintended” – pesky perversions in which the set of naturals ω is uncountably large, as one example – but we’ll put this aside for this post). Most of the axioms act as constraints that all sets must follow. For instance, the axiom of pairing says that “For any sets x and y, there must exist another set containing as elements just x and y and nothing else”. This is an axiom that begins with universal quantification over the sets of the universe, and then states some requirement that must hold of all these sets.

The anti-set program is what you get when you take each of these restriction axioms, and negate the restriction it imposes on all sets. So, for instance, the axiom of anti-pairing says that for any sets x and y, there must NOT exist any set {x, y}. Contrast this with the simple negation of pairing, which would tell us only that there exist two sets x and y such that their pair doesn’t exist. The anti-axiom is much stronger than the negated axiom, in that it requires NO pairs to exist.

Not all axioms begin with universal quantifiers, in particular the axiom of infinity, which simply asserts that a set exists that satisfies a certain property. To form the axiom of anti-infinity, we simply negate the original axiom (so that no sets with that property exist).

As it turns out, the anti-set program, if applied to ALL the axioms of ZFC, ends in disaster and paradox. In particular, a contradiction can be derived from anti-comprehension, from anti-replacement, and from anti-extensionality. We don’t handle these cases the same way. Anti-comprehension and anti-replacement are simply discarded, being too difficult to patch. By contrast, anti-extensionality is replaced by ordinary extensionality. What’s up with that? The philosophical justification is simply that extensionality, being the most a priori of the bunch, is needed to justify us calling the objects in our universe sets at all.

There’s one last consideration we must address, which regards the axiom of anti-choice. I am currently uncertain as to whether adding this axiom makes the theory inconsistent. One thing that is currently known about the axiom of anti-choice is that with its addition, out go all finite models (there’s a really pretty proof of this that I won’t include here). In the rest of this post, I will be excluding anti-choice from the axioms and only exploring models of anti-ZF.

With that background behind us, let me list the axioms of anti-ZF.

Anti-Pairing
∀x∀y∀z∃w (w ∈ z ∧ w ≠ x ∧ w ≠ y)
No set is the pair of two others.

Anti-Union
∀x∀y∃z (z ∈ y ∧ ¬∃w (z ∈ w ∧ w ∈ x))
No set is the union of another.

Anti-Powerset
∀x∀y∃z (z ∈ y ∧ ∃w (w ∈ z ∧ w ∉ x))
No set contains all existing subsets of another.

Anti-Foundation
∀x∀y (y ∈ x → ∃z (z ∈ x ∧ z ∈ y))
Every set’s members have at least one element in common with it.

Anti-Infinity
∀x∃y ((y is empty ∧ y ∉ x) ∨ (y ∈ x ∧ ∃z (z = S(y) ∧ z ∉ x)))
Every set either doesn’t contain all empty sets, or has an element whose successor is outside the set.

If the axioms of ZF are considered to be a maximally nice setting for mathematics, then perhaps the axioms of anti-ZF can be considered to be maximally bad for mathematics.

We need to now address some issues involving the axiom of anti-infinity. First, the abbreviations used: “y is empty” is shorthand for “∀z (z ∉ y)” and “z = S(y)” is shorthand for “∀w (w ∈ z ↔ (w ∈ y ∨ w = y))” (i.e. z = y ⋃ {y}).

Second, it turns out that from the other axioms we can prove that there are no empty sets. So the first part of the disjunction is always false, meaning that the second part must always be true. Thus we can simplify the axiom of anti-infinity to the following statement, which is logically equivalent in the context of the other axioms:

Anti-Infinity
∀x∃y (y ∈ x ∧ ∃z (z = S(y) ∧ z ∉ x))
Every set has an element whose successor is outside the set.

Let’s now prove some elementary consequences of these axioms.

Theorems

>> There are no empty sets.
Anti-Union ⊢ ¬∃x∀y (y ∉ x)
Suppose ∃x∀y (y ∉ x). Call this set ∅. So ∀y (y ∉ ∅) Then ⋃∅ = ∅. But this implies that the union of ∅ exists. This contradicts anti-union.

>> There are no one-element sets.
Anti-Pairing ⊢ ¬∃x∃y∀z (z ∈ x ↔ z = y)
Suppose that ∃x∃y∀z (z ∈ x ↔ z = y). Then ∃x∃y∀z (z ∈ x ↔ (z = y ∨ z = y)). But this is a violation of anti-pairing, as then x would be the pair of y and y. Contradiction.

>> There are no two-element sets.
Anti-Pairing ⊢ ¬∃x∃y∃z∀w (w ∈ x ↔ (w = y ∨ w = z))
Suppose that ∃x∃y∃z∀w (w ∈ x ↔ (w = y ∨ w = z)). Then w is the pair of y and z, so anti-pairing is violated. Contradiction.

>> No models of anti-ZF have just N-element sets (for any finite N).
Suppose that a model of anti-ZF had only N-element sets. Take any of these sets and call it X. By anti-infinity, X must contain an element with a successor that is outside the set. Call this element Y and its successor S(Y). Y cannot be its own successor, as then its successor would be inside the set. This means that Y ∉ Y. Also, Y is an N-element set by assumption. But since Y ≠ S(Y), S(Y) must contain all the elements of Y in addition to Y itself. So S(Y) contains N+1 elements. Contradiction.

>> There is no set of all sets.
Suppose that there is such a set, and call it X. By Anti-Infinity, X must contain an element Y with a successor that’s outside of X. But no set is outside of X, by assumption! Contradiction.

>> No N-element model of anti-ZF can have N-1 sets that each contain N-1 elements.
Suppose that this were true. Consider any of these sets that contain N-1 elements, and call it X. By the same argument made two above, this set must contain an element whose successor contains N elements. But there are only N sets in the model, so this is a set of all sets! But we already know that no such set can exist. Contradiction.

>> Every finite set of disjoint sets must be its own choice function.
By a choice function for a set X, I mean a set that contains exactly one element in common with each element of X. Suppose that X is a finite set of disjoint sets. Let’s give the elements of X names: A1, A2, A3, …, AN. By anti-foundation, each An must contain at least one element of X. Since the Ans are disjoint, these elements cannot be the same. Also, since there are only N elements of X, no An can contain more than one element of X. So each element of X contains exactly one element of X. Thus X is a choice function for X!

The next results come from a program I wrote that finds models of anti-ZF of a given size.

>> There are no models of size one, two, three or four.

>> There are exactly two non-isomorphic models of size five.

Here are pictures of the two:

Now, some conjectures! I’m pretty sure of each of these, especially Conjecture 2, but haven’t been able to prove them.

Conjecture 1: There’s always a set that contains itself.

Conjecture 2: There can be no sets of disjoint sets.

Conjecture 3: In an N-element model, there are never less than three sets with fewer than N-1 elements.

How to draw lambda diagrams

If you don’t want spoilers for my puzzle a few days ago, don’t read ahead!

I think lambda diagrams are extremely cool, and haven’t seen any detailed description on how they work online. I’ll start by showing some very simple examples of lambda diagrams, and then build up to more complicated ones.

First of all, what are lambda diagrams? They are pictorial representations of lambda expressions, and hence count as a pictorial system for a large portion of mathematics. I will assume that you understand how lambda calculus works for this post, and if you aren’t familiar then you can do no better than this video for a basic introduction.

The Basics

Let’s start simple: the lambda expression for “True”: λx.λy.x

Each time we see a new bound variable, we draw a horizontal line to represent that variable. So parsing from left to right, we start with a horizontal line for x.

Next is “λy.“, so we insert another horizontal line for y (below the first horizontal line).

Next we have an x in the function body, which we represent by a vertical line leaving the x line:

And there we have it, the lambda diagram for True!

Now let’s look at False: λx.λy.y

We start the same way as before, with two horizontal lines for our variables x and y:

But now we have a y in our function body, not an x. So we draw our vertical line leaving the y line instead of the x line:

And that’s the diagram for False! The actual diagram needn’t have the labels “x” and “y” in it, because our convention that we draw lines for new variables below existing horizontal lines uniquely specifies which lines stand for which variables.

Let’s now do a slightly more complicated function: λx.λy.y x

As before, we start with our two variables, x and y.

Now we have a y in the function body, so we add a vertical line leaving the y bar:

Next is an x, which is being fed as input to the y. We draw this as follows:

Two things have happened here: First I’ve moved the original vertical line for y over to the left to make space. Next, I’ve represented “feeding x to y as input” as a line leaving the x bar and terminating at the vertical line for y.

Take a guess what λx.λy.x y would look like!

Here it is:

Next, let’s look at λx.λy.x x y (which is incidentally the “or” function).

We start with the introduction of the variables x and y.

Next, an x:

And another x:

And finally a y:

Notice that this y connects below the first connection, to the original branch. What would it mean if it were connected to the second branch?

As you can see, this would indicate that y is first fed to x, and then the result of THAT is fed to x.

With these principles, you should now be able to draw out diagrams for much more complicated lambda expressions. Try this one: λx.λy.λz.z (y x) (y z x) x

Here it is!

Make sure that this makes sense before moving on!

Lambda Expressions within Lambda Expressions

Next, we’ll look at how to deal with lambda expressions that contain new lambda expressions within their function body. Here’s a simple example: λx.λy.x (λz.z) y

Everything up until the third λ is the same as always:

Now, to deal with our new variable z, we draw a new horizontal line. But importantly, this horizontal line comes after the vertical line for the x that has already been used!

After the “λz.” we have a “z“, so we draw a line from the z bar to our original vertical line leaving x.

And finally, after all of this we feed y to the original vertical line:

Try this one: λx.λy.x (λz.z z) (λw.λv.v (w w)) y

And here’s another one, where the inside function uses outside variables: λx.λy.x (λz.y z)

Alright, the final category you need to know to understand how to draw any lambda diagram whatsoever is…

Function Application

Suppose we have the lambda expression (λx.λy.x) (λz.z z). We first draw the lambda diagrams for each of the two component expressions side by side:

And now we simply attach the second to the first, indicating that the entire second lambda expression is fed as input to the first!

Here’s another example for you to try: (λx.x) (λy.λz.z z y) (λw.λv.v (v w))

And one more example, this one using all the tricks we’ve seen so far: (λx.x (λy.λz.z y y) (λw.λv.w (v x))) (λu.u u)

Beta Reduction

The final thing to know about lambda diagrams is how to actually do computations with them. The basic idea is that when we have one lambda diagram fed as input to another, we can substitute the “input” diagram for each vertical line leaving the topmost horizontal bar, and then erase this bar. Let’s look at some examples:

You can also do function application within larger lambda diagrams. Take a look at the following example, which is a calculation that shows that the successor of zero is one:

The first beta reduction here is just like the previous ones we’ve seen. But the second one does a substitution within the main lambda expression, as does the third. This works in much the same way as the earlier reductions we saw, the primary difference being that references to variables outside the scope of the function being applied must be maintained. You can see this in the final step above, where we remove the line representing the variable y, and attach the line connecting to it to the line representing the variable x.

Now for the fun part! I’ve found some great animations of calculations using lambda diagrams on Youtube. Each of them is using just the rules I’ve described above, nothing more! And I must say that the music is delightful. Take a look!

Beautiful!

The Pictorial Mathematics of Klak-Adbmal

In the distant land of Klak-Adbmal, cut off from the rest of civilization, mathematicians have developed their own distinctive form of math. It is entirely pictorial; every mathematical concept is translated as a diagram consisting of criss-crossing horizontal and vertical lines, and calculations are manipulations of these diagrams.

Translators have so far been able to figure out the symbols for a few basic mathematical concepts, but there are many remaining gaps in our ability to translate Klak-Adbmalist mathematics. This image shows the symbols we’ve figured out so far, as well as a step-by-step calculation of the successor of 1 equaling 2.

The Klak-Adbmalians are very secretive and slow to give up their secrets. Can you figure out the answers to the two questions above on the right?

It’s worth noting that just as there are multiple ways that ordinary mathematicians can write a given number (2 ⋅ 5 = 8 + 2 = 10),  there are many different pictures for any given concept. If you need more assistance, you can take a look at this recently discovered drawing of a Klak-Adbmalian calculation proving that True Or False is True.

And one final hint: Recently we’ve received verification that the following two calculations are aways valid, no matter what Klak-Adbmalist pictures replace the red and green boxes:

Logic on Planet Zorko

A group of Zorkan mathematicians are sitting around having a conversation in a language that you are unfamiliar with. You are listening in with a translator. This translator is an expert in formal logic, and has decided to play the following game with you. He says:

“After listening to the full conversation, I will translate all the sentences that were said for you. But I won’t translate them into English; I want something more universal. Instead, I will choose a formal language that captures the mathematical content of all the sentences said, while leaving out the vagaries and subtleties of the Zorkan language. I will describe to you the semantics of the formal language I choose, if you don’t already know it.”

“Furthermore,” (says the translator) “I happen to be intimately familiar with Zorkan society and culture. The Zorkans are having a discussion about one particular mathematical structure, and I know which one that is. The mathematicians are all fantastically precise reasoners, such that none of them ever says a sentence that is false of the structure that they are discussing.”

(So for instance if they are talking about the natural numbers, then no mathematician will say “0 = 1”, and if they are talking about abelian groups, then no mathematician will say “∃x∃y (xy ≠ yx)”. But they could say “∃x∃y (xy ≠ yx)” if they are talking about non-abelian groups.)

You know nothing about Zorkan psychology, besides that the Zorkan way of life is so utterly foreign to you that you cannot reliably assume that the mathematical structures that come most naturally to you will also come naturally to them. It might be, for instance, that nonstandard models of arithmetic are much more intuitive to them than the natural numbers. You cannot assume that the structure they are discussing is the one that you think is “most natural”; you can only conclude this if one of them says a sentence that is true of that model and no others.

The conversation finishes, and you are tasked with answering the following two questions:

(1) What structure are they talking about?
(2) Can you come up with a verification procedure for the mathematicians’ sentences (including possible future sentences they might say on the topic)?

So, that’s the setup. Now, the question I want you to consider is the following: Suppose that the structure that the mathematicians have in mind is actually the natural numbers. Is there some conversation, any conversation at all (even allowing infinitely long conversations, and uncomputable conversations – conversations which cannot be produced as the output of any Turing machine), that the mathematicians could have, and some translation of this conversation, such that you can successfully answer both (1) and (2)? If so, what is that conversation? And if not, then why not?

✯✯✯

Let’s work out some simple examples.

Example 1

Suppose the conversation is translated into a propositional language with three atomic propositions {P, Q, R}.

Mathematician A: “P ∨ Q”
Mathematician B: “(Q ∨ R) → (¬P)”
Mathematician C: “R”

From this conversation, you can deduce that the model they are talking about is the one that assigns “False” to P, “True” to Q, and “True” to R.

M: {P is false, Q is true, R is true}

This is the answer to the question 1!

As for the second question, we want to know if there’s some general procedure that produces all the future statements the mathematicians could make. For instance, the set generated by our procedure should include (Q ∧ R) but not (Q ∧ P).

It turns out that such a procedure does exist, and is not too difficult to write out and implement.

Example 2

Take the above conversation and modify it slightly:

Mathematician A: “P ∨ Q”
Mathematician B: “(Q ∨ R) → (¬P)”
Mathematician C: “¬R”

If you work it out, you’ll see that question 1 can no longer be answered unambiguously. The problem is that there are multiple models of the sentences that the mathematicians are saying:

M1: {P is false, Q is true, R is false}
M2: {P is true, Q is false, R is false}

So even though they have one particular structure in mind, you don’t have enough information from their conversation to figure out exactly what that structure is.

Now let’s think about the answer to question 2. We don’t know whether the mathematicians are thinking about M1 or M2, and M1 and M2 differ in what truth value they assign the proposition P. So we can’t construct an algorithm that will generate the set of all their possible future statements, as this would require us to know, in particular, whether P is true or false in the model that they have in mind.

We might suspect that this holds true generally: if you can’t answer question 1, then you won’t be able to answer question 2 either. But we might also wonder: if we can answer question 1, then can we also always answer question 2?

The answer is no, as the next example will show.

Example 3

For this conversation, the translation is in second-order logic. This will allow us to talk about more interesting mathematical structures than before; namely, structures that have a domain of objects on which functions and predicates can act. In particular, we’re in a second-order language with one constant symbol “c” and one function symbol “f”. Here’s the conversation:

Mathematician A: ¬∃x (f(x) = c)
Mathematician B: ¬∃x∃y ((f(x) = f(y)) ∧ ¬(x = y))
Mathematician C: ∀R (R(c) ∧ ∀x(R(x) → R(f(x))) → ∀x R(x))

Notice that the only truly second-order sentence is the third one, in which we quantify over a predicate variable R rather than an individual variable x, y, z, …. But the second-order status of this sentence it makes it that the translator could not have possibly translated this conversation into a first-order language, much less a propositional language.

This time, questions 1 and 2 are much harder to answer than before. But if you work it out, you’ll see that there is exactly one mathematical structure that satisfies all three of the mathematicians’ statements. And that structure is the natural numbers!

So, we know exactly what structure the mathematicians have in mind. But can we also answer question 2 in the positive? Can we produce some verification procedure that will allow us to generate all the future possible sentences the mathematicians could say? Unfortunately, the answer is no. There is no sound and complete proof system for second-order logic, so in particular, we have no general algorithm for producing all the truths in this second order language. So sad.

Example 4

Now let’s move to first-order logic for our final example. The language of translation will be a first order language with a constant symbol for every natural number {0,1,2,3,…}, function symbols for ordinary arithmetic {+, ×}, and relation symbols for orders {≥}

Imagine that the conversation consists of literally all the first-order sentences in the language that are true of the natural numbers. Anything which you can say in the language, and which is true as a statement about ℕ, will be said at some point. This will obviously be a very long conversation, and in fact infinitely long, but that’s fine. It will include sentences like “0 ≠ 1”, “0 ≠ 2”, “0 ≠ 3”, and so on.  (These Zorkans are extremely thorough.)

Given this conversation, can we answer (1) and (2)? Take a guess; the answer may surprise you!

It turns out that even though we can answer (2) positively – we can actually produce an algorithm that will generate one-by-one all the possible future statements of the mathematicians (which really means all the sentences in the language that are true of the natural numbers), we cannot answer (1) positively! There are multiple distinct mathematical structures that are compatible with the entirety of true statements about natural numbers in the language. Earlier we hypothesized that any time we have a negative answer to (1), we will also have a negative answer to (2). But this is not true! We can verify all the true statements about natural numbers in the language… without even knowing that we’re actually talking about the natural numbers! This is an important and unintuitive consequence of the expressive limitations (and in particular, of the compactness) of first-order logic.

The Takeaway

We had an example where we could answer both (1) and (2) for a simple mathematical structure (a model of propositional logic). And we saw examples for natural numbers where we could answer (1) but not (2), as well as examples where we could answer (2) but not (1). But we haven’t yet seen an example for natural numbers where we had both (1) and (2). This is no coincidence!

It is actually a consequence of the theorem I proved and discussed in my last post that no such such conversation can exist. When structures at least as complicated as the natural numbers are being discussed in some language (call it L), you cannot simultaneously (1) know for sure what structure is being talked about and (2) have an algorithmic verification system for L-sentences about the structure.

A Gödelian Logic Puzzle

There’s an island on which there lives exactly two types of people: truthers and liars. Truthers always say true statements, and liars always say false statements. One day a brilliant logician comes to visit the island. The logician knows all of the above-stated facts about the island. It also happens that the logician is a perfectly sound reasoner – he never proves anything that is false.

The logician encounters an individual named ‘Jal’ that lives on the island. The logician knows that Jal lives on the island, and so is either a truther or a liar. Now, Jal makes a statement from which it logically follows that Jal is a truther. But the logician could never possibly prove that Jal is a truther! (Remember, we never asserted that the logician proves all true things, just that the logician proves only true things). What type of statement could accomplish this?

This puzzle is from a paper by Raymond Smullyan on mathematical logic. Try to answer it for yourself before reading on!

(…)

Alright, so here’s one possible answer. Jal could say to the logician: “You will never prove that I am a truther.” I claim that this sentence logically entails that Jal is a truther, and yet the logician cannot possibly prove it.

First of all, why does it entail that Jal is a truther? Let’s prove it by contradiction. Suppose that Jal is not a truther. Then, since Jal is either a truther or a liar, Jal must be a liar. That means that every statement Jal makes must be false. So in particular, Jal’s statement that “you will never prove that I am a truther” must be false. This entails that the logician must eventually prove that Jal is a truther. But we assumed that Jal isn’t a truther! So the logician must eventually prove a falsehood. But remember, we assumed that our logician’s proofs were always sound, so that he will never prove a falsehood. So we have a contradiction.

Therefore, Jal is a truther.

Now, why can the logician not prove that Jal is a truther? This can be seen very straightforwardly: we just proved that Jal is a truther, which means that all of Jal’s statements must be true. So in particular, Jal’s statement that “you will never prove that I am a truther” must be true. So in other words, it’s true that the logician will never prove that Jal is a truther!

So there you have it, a statement that appears to satisfy both of the criteria!

But now the next question I have for you is a bit trickier. It appears from the line of reasoning above that we have just proven that Jal is a truther. So why couldn’t the logician just run through that exact same line of reasoning? It appears to be perfectly valid, and to use nothing more advanced than basic predicate logic.

But if the logician does go through that line of reasoning, then he will conclude that Jal is a truther, which will make Jal’s statement false, which is a contradiction! So we’ve gone from something which was maybe just unintuitive to an actual paradox. Can you see how to resolve this paradox? (Again, see if you can figure it out yourself before reading on!)

(…)

Okay, so here’s the resolution. If we say that the logician can go through the same line of reasoning as us, then we reach a contradiction (that a truther tells a false statement). So we must deny that the logician can go through the same line of reasoning as us. But why not? As I said above, the reasoning is nothing more complicated than basic predicate logic. So it’s not that we’re using some magical supernatural rules of inference that no mortal logician could get his hands on. It must be that one of the assumptions we used in the argument is an assumption that the logician cannot use.

So look back through the argument, and carefully consider each of the assumptions we used:

First of all, why does it entail that Jal is a truther? Let’s prove it by contradiction. Suppose that Jal is not a truther. Then, since Jal is either a truther or a liar, Jal must be a liar. That means that every statement Jal makes must be false. So in particular, Jal’s statement that “you will never prove that I am a truther” must be false. This entails that the logician must eventually prove that Jal is a truther. But we assumed that Jal isn’t a truther! So the logician must eventually prove a falsehood. But remember, we assumed that our logician’s proofs were always sound, so that he will never prove a falsehood. So we have a contradiction.

In order, we made use of the assumptions that (1) Jal is either a truther or a liar, (2) every statement made by a liar is false, and (3) the logician is a sound reasoner.

I told you at the beginning that facts (1) through (2) are all known to the logician, but I did not say the same of (3)! The logician can only run through this argument if he knows that he is a sound reasoner (that he only proves true things). And this is the problem assumption, which must be rejected.

It’s not that no logician can actually ever be sound (a logician who only ever reasons in first order logic and nothing more fancy would be sound). It’s that the logician, though he really is sound, cannot know himself to be sound. In other words, no sound system can prove its own soundness!

This is very similar to Gödel’s second incompleteness theorem. The only proof system which can assert its own consistency is an inconsistent proof system, and the only type of logician that can prove his own soundness will end up being unsound. Here’s the argument that the logician might make if they believe in their own soundness:

Supposing Jal is a liar, then his statement is false, so I could eventually prove that he is a truther. But then I’d have proven something false, which I know I can never do, so Jal must not be a liar. So he must be a truther. 

Since the logician has now produced a proof that Jal is a truther, Jal’s statement is false. This means that Jal cannot be a truther, so the logician has proven a false statement!

Crazy conditionals

It’s well known that the material implication → of propositional logic does not do a perfect job of capturing what we mean when we make “if… then…” statements in English. The usual examples of failure rest on the fact that any material conditional with a false antecedent is vacuously true (so “if 2 is odd then 2 is even” turns out to be true). But over time, philosophers have come up with a whole lot of different ways in which → can catch us by surprise.

Here’s a list of some such cases. In each case, I will present an argument using if…then… statements that is clearly invalid, but which is actually valid in propositional logic if the if…then… statements are translated as the material conditional!

1. Harper

If I put sugar into my coffee, it will taste fine.
Therefore, if I put sugar and motor oil into my coffee, it will taste fine.

S → F
(S ∧ M) → F

2. Distributivity

If I pull both switch A and switch B, the engine will start.
Therefore, either the engine will start if I pull switch A or the engine will start if I pull switch B.

(A ∧ B) → S
(A → S) ∨ (B → S)

3. Transitivity

If Biden dies before the election, Trump will win.
If Trump wins the election, Biden will retire to his home.
Therefore, if Biden dies before the election, Biden will retire to his home.

B → T
T → R
B → R

4. Principle of Explosion

Either zombies will rise from the ground if I bury a chicken head in my backyard, or zombies will rise from the ground if I don’t bury a chicken head in my backyard.

(B → D) ∨ (¬B → D) is a tautology

5. Contraposition

If I buy a car, I won’t buy a Pontiac.
Therefore, if I buy a Pontiac, I won’t buy a car.

C → ¬P
P → ¬C

6. Simplification

If John is in London then he’s in England, and if he’s in Paris then he’s in France.
Therefore, either (1) if John’s in London he’s in France or (2) if John’s in Paris then he’s in England.

(L → E) ∧ (P → F)
(L → F) ∨ (P → E)

7. False Antecedent

It’s not the case that if God exists then human life is a product of random chance.
Therefore, God exists.

¬(G → C)
G

8. True Consequent

If I will have eternal life if I believe in God, then God must exist.
I do not believe in God.
Therefore, God exists.

(B → E) → G
~B
G

You can check for yourself that each of these is logically valid! Can you figure out what’s going wrong in each case?

A Dice Puzzle

Today I have a wonderfully counterintuitive puzzle to share!

You and a friend each throw a dice. Each of you can see how your own die landed, but not how your friend’s die landed. Each of you is to guess how the other’s die landed. If you both guess correctly, then you each get a reward. But if only one of you guesses correctly, neither of you get anything.

The two die rolls are independent and you are not allowed to communicate with your friend after the dice have been thrown, though you can coordinate beforehand. Given this, you would expect that you each have a 1 in 6 chance of guessing the other’s roll correctly, coming out to a total chance of 1 in 36 of getting the reward.

The question is: Is it possible to do any better?

Answer below, but only read on after thinking about it for yourself!

 

(…)

 

(…)

 

(Spoiler space)

 

(…)

 

(…)

 

The answer is that remarkably, yes, you can do better! In fact, you can get your chance of getting the reward as high as 1 in 6. This should seem totally crazy. You and your friend each have zero information about how the other die roll turned out. So certainly each of you has a 1 in 6 chance of guessing correctly. The only way for the chance of both guessing correctly to drop below 1 in 36 would be if the two guesses being correct were somehow dependent on each other. But the two die rolls are independent of one another, and no communication of any kind is allowed once the dice have been rolled! So from where does the dependence come? Sure you can coordinate beforehand, but it’s hard to imagine how this could help out.

It turns out that the coordination beforehand does in fact make a huge difference. Here’s the strategy that both can adopt in order to get a 1 in 6 chance of getting the reward: Each guesses that the others’ die lands the same way that their own die landed. So if my die lands 3, I guess that my friend’s die landed 3 as well. This strategy succeeds whenever the dice actually do land the same way. And what’s the chance of this? 6 out of 36, or 1 out of 6!

1 1       2 1       3 1       4 1       5 1       6 1
1 2       2 2       3 2       4 2       5 2       6 2
1 3       2 3       3 3       4 3       5 3       6 3
1 4       2 4       3 4       4 4       5 4       6 4
1 5       2 5       3 5       4 5       5 5       6 5
1 6       2 6       3 6       4 6       5 6       6 6