The Hypergame Paradox

December 15, 2021April 16, 2026 ~ ~ 2 Comments

Credit to Joel David Hamkins, who I heard discussing this paradox on an episode of the podcast My Favorite Theorem.

Define a finite game to be any two-player turn-based game such that every possible playthrough ends after finitely many turns. For example, tic-tac-toe is a finite game because every game ends in at most nine turns.. So is chess with the 50-move-rule enforced (if 50 moves are taken without any pawn advances or captures, then the game ends in a draw). In both these examples, there’s an upper bound to how long the game can last, but this is not required. A game of transfinite Nim would count as finite; every game lasts for only finitely many turns, even though there is no upper bound on the number of turns it takes.

Now, consider the game Hypergame. To play Hypergame, Player 1 begins by choosing any finite game G. Then Player 2 plays the first move of G, Player 1 plays the second move of G, and so on until the game is completed. (Since Player 1 chose a finite game, this will always happen after some finite amount of time.)

Is Hypergame a finite game? Yes, we can easily see that it must be. Whatever game Player 1 chooses will be over after n steps, for some finite n. So that playthrough of Hypergame will have taken n+1 steps.

But if Hypergame is a finite game, then it is a valid choice for the first move of Hypergame! So we can now imagine the following playthrough of Hypergame:

A Troubling Playthrough of Hypergame
Player 1: For the finite game that we shall play, I pick Hypergame.
Player 2: Hm, okay. So now I’m playing the first move of Hypergame. So I must now choose any finite game. I’ll choose Hypergame!
Player 1: Alright so I’m again playing the first move of this new game of Hypergame. I’ll choose Hypergame again.
Player 2: And I choose Hypergame again as well.
So on forever…

At no point does either player violate the rules of Hypergame. And yet, we ended up with an infinite playthrough of Hypergame, which we proved was impossible! So we have a contradiction. What is the resolution?

✵✵✵

Here’s one possible resolution, analogous to the resolutions of similar set-theoretic paradoxes.

We can think about a game as a directed rooted tree. The vertices of the tree correspond to game states, and the edges correspond to the allowed moves. The root of the tree corresponds to the starting game state, and Player 1 gets to choose which edge to travel along first. From the new vertex, Player 2 decides the next edge to travel along. And so on. The tree’s leaves correspond to ending states of the game, and each leaf is labelled according to which player won in that ending state.

In this framework, what is a finite game? As I defined it above, a finite game is just any directed rooted tree such that every path starting at the root ends at a leaf after passing through finitely many edges. This corresponds perfectly to the idea that every possible playthrough of the game takes only finitely many turns. Notice that a finite game is not necessarily a finite tree! The game tree of a finite game is only finite if it’s also finitely branching. In other words, for a game to have a finite tree requires not just that every playthrough is finitely long, but that each player always has only finitely many choices on their turn.

For instance, the game tree of Hypergame is not a finite tree, because Player 1 has infinitely many possible finite games to choose from on his first turn. How big exactly is the game tree of Hypergame? We know that we have to have a vertex corresponding to the start of any finite game, so it must be at least as large as the set of all finite games. But how large is this set?

This is where we run into problems. The game tree of a finite game can be arbitrarily large. Consider the game which starts by Player 1 choosing any real number and then immediately losing. The height of the game tree is 1, but its width is the cardinality of the continuum. Similarly for any set X we can find a game whose tree has cardinality |X|. This means that there are finite games of arbitrary cardinalities. But then as a corollary to the nonexistence of a largest cardinality, we know that there is no set of all finite games! And this implies that Hypergame has no game tree! More precisely, there is no set corresponding to the game tree of Hypergame as we defined it.

Couldn’t we instead think about Hypergame as a proper class? Sure! But then when we choose a finite game in our first move, we couldn’t be picking Hypergame, as the property “is a finite game” would only apply to sets and not proper classes. This means that we can’t actually select Hypergame as our first move! And so we avoid the paradoxical conclusion that we can keep picking Hypergame ad infinitum.

I find it quite fascinating that the seemingly innocent notion of a finite game can lead us into paradoxes involving proper classes!

Formal Semantics 1: Historical Prelude and Compositionality

November 16, 2020 ~ ~ 1 Comment

English is really complicated. For a long time, logicians looking at natural languages believed that there could be no formal system detailing their grammar and semantics. They resigned themselves to extremely simple idealized fragments of English, like propositional logic (formalizing “and”, “not”, and “or”) and first-order logic (formalizing “every”, “some”, and “is”).

The slogan of the time was “ordinary language has no logic” (Bertrand Russell and Peter Strawson). Chomsky famously argued that the languages invented by logicians were too artificial and entirely unlike natural languages, and that therefore the methods of logicians simply couldn’t be applied to this more complex realm.

This attitude has changed over time. Perhaps the most important figure in the “logic of natural language” movement is Richard Montague, a student of the giant of logic Alfred Tarski. The first line of his paper English as a Formal Language reads “I reject the contention that an important theoretical difference exists between formal and natural languages”, and he follows this up by more or less single-handedly invented formal semantics, now a thriving field. Hilariously, Montague apparently saw this work as child’s play, writing:

I (…) sat down one day and proceeded to do something that I previously regarded, and continue to regard, as both rather easy and not very important — that is, to analyze ordinary language.

(This had to hit hard for linguists of his time.)

Alright, enough prologue. In the next few posts I want to describe a naive first pass at formalizing a fairly substantial fragment of English, modeled off of Montague semantics. The key concept throughout will be the notion of compositionality, which I’ll briefly describe now.

Compositionality

Compositionality is all about how to construct the meaning of phrases from their smaller components. Take a sentence like “The cat sat on the mat.” The meaning of this sentence clearly has something to do with the meanings of “the cat” and “sat on the mat”. Similarly, the meaning of “sat on the mat” must have something to do with the meanings of “sat”, “on”, “the”, and “mat”.

The compositionality thesis says that this is all that determines the meaning of “the cat sat on the mat.” In other words, the meaning of any phrase is a function of the meanings of the individual words within it. These meanings are composed together in some way to form the meaning of the sentence as a whole.

The natural question that arises now is, what is the nature of this composition? Take a very simple example: “Epstein died.” According to compositionality, the meaning of “Epstein died” depends only on the meanings of “Epstein” and “died”. That seems pretty reasonable. What about: “Epstein died suspiciously”? How do we compose the meanings of the individual words when there are three?

One proposal is to compose all three simultaneously. That’s possible, but a simpler framework would have us build up the meanings of our sentences iteratively, composing two units of meaning at a time until we’ve generated the entire sentence’s meaning.

Let me now introduce some notation that allows us to say this compactly. If X is some word, phrase, or sentence, we’ll denote the meaning of X as ⟦X⟧. Then the principle of binary compositionality is just that there’s some function F such that ⟦X Y⟧ = F(⟦X⟧, ⟦Y⟧).

There’s two major questions that arise at this point.

First, in which order should we compose our units of meaning? Should we combine “Epstein” with “died” first, and then combine that with “suspiciously”? Or should it be “Epstein” and “suspiciously” first, then that with “died”? Or should we combine “Epstein” with the combination of “suspiciously” and “died”?

One might suggest here that the order actually doesn’t matter; no matter what order we combine the meanings in, we should still get the same meaning. The problem with this is that “The Clintons killed Epstein” has a different meaning than “Epstein killed the Clintons.” If order of composition didn’t matter, then we’d expect these to mean the same thing.

Second, how exactly does composing two meanings work? Is there a single rule for composition, or are there multiple different rules that apply in different contexts? It would be most elegant if we could find a single universal rule for generating meanings of complicated phrases from simple ones, but maybe that’s overambitious.

For instance, you might model the meaning of “died” as a set of objects, namely all those objects that died at some moment in the past, and the meaning of “Epstein” as one particular object in the universe. Then we might have our composition rule be the following: ⟦Epstein died⟧ will be a truth value, and it will be True if and only if the object denoted by “Epstein” is within the set of objects denoted by “died”. So in this framework, ⟦X Y⟧ = True if and only if ⟦X⟧ ∈ ⟦Y⟧.

This works nicely for “Epstein died”. But what about “Epstein died suspiciously”? Now we have two compositions to do, and the order of composition will matter. The problem is that no matter how we compose things, it seems not to work. Suppose that we combine “died” and “suspiciously” first, then combine “Epstein” with that. Using our model, ⟦died suspiciously⟧ will be True if and only if ⟦died⟧ ∈ ⟦suspiciously⟧, which is already a little bit weird. But even worse, ⟦Epstein died suspiciously⟧ will be True if and only if ⟦Epstein⟧ ∈ ⟦died suspiciously⟧. But what would it mean for the object denoted by “Epstein” to be an element of a truth value? It looks like in this framework, most three-word sentences end up becoming vacuously false.

Anyway, the last two paragraphs only show us that one particular attempt to formalize composition fails to be universal. It doesn’t show that it’s impossible in general. In fact, we’ll end up doing pretty well with a small set of composition rules centered around function application. The idea can be very simply phrased as: ⟦X Y⟧ = ⟦X⟧(⟦Y⟧). And in particular, the meaning of “Epstein died suspiciously” will be ⟦suspiciously⟧(⟦died⟧)(⟦Epstein⟧). And that’s enough warm-up! Next we’ll explore this idea further and dive into our Montague-style system.

Polish Notation and Garden-Path Sentences

September 4, 2020 ~ ~ 3 Comments

Polish notation is a mathematical notation system that allows you to eliminate parentheses without ambiguity. It’s called “Polish” because the name of its Polish creator, Jan Łukasiewicz, was too difficult for people to pronounce.

A motivating example: Suppose somebody says “p and q implies r”. There are two possible interpretations of this: “(p and q) implies r” and “p and (q implies r)”. The usual way to disambiguate these two is to simply add in parentheses like I just did. Another way is to set an order-of-operations convention, like that “and” always applies before “implies”. This is what’s used in basic algebra, and what allows you to write 2 + 2 ⋅ 4 without any fear that you’ll be interpreted as meaning (2 + 2) ⋅ 4.

Łukasiewicz’s method was to make all binary connectives into prefixes. So “A and B” because “and A B”, “P implies Q” becomes “implies P Q”, and so on. In this system, “(p and q) implies r” translates to “implies and p q r”, and “p and (q implies r)” translates to “and p implies q r”. Since the two expressions are different, there’s no need for parentheses! And in general, no ambiguity ever arises from lack of parentheses when using Polish notation.

If this is your first time encountering Polish notation, your first reaction might be to groan and develop a slight headache. But there’s something delightfully puzzling about reading an expression written in Polish notation and trying to understand what it means. Try figuring out what this means: “implies and not p or q s r”. Algebra can be written in Polish notation just as easily, removing the need for both parentheses AND order-of-operations. “2 + 2 = 4” becomes “+ 2 2 = 4”, or even better, “= + 2 2 4”.

Other binary connectives can be treated in Polish notation as well, creating gems like: “If and you’re happy you know it clap your hands!” “When life is what happens you’re busy making plans.” “And keep calm carry on.” “Therefore I think, I am.” (This last one is by of the author the Meditations). Hopefully you agree with me that these sentences have a nice ring to them, though the meaning is somewhat obscured.

But putting connectives in front of the two things being connected is not unheard of. Some examples in English: “ever since”, “because”, “nonwithstanding”, “whenever”, “when”, “until”, “unless”. Each of these connects two sentences, and yet can appear in front of both. When we hear a sentence like “Whenever he cheated on a test the professor caught him”, we don’t have any trouble parsing it. (And presumably you had no trouble parsing that entire last sentence either!) One could imagine growing up in a society where “and” and “or” are treated the same way as “ever since” and “until”, and perhaps in this society Polish notation would seem much more natural!

Slightly related to sentential connectives are verbs, which connect subjects and objects. English places its verbs squarely between the subject and the object, as does Chinese, French, and Spanish. But in fact the most common ordering is subject-object-verb! 45% of languages, including Hindi, Japanese, Korean, Latin, and Ancient Greek, use this pattern. So for instance, instead of “She burned her hand”, one would say “she her hand burned”. This is potentially weirder to English-speakers than Polish notation; it’s reverse Polish notation!

9% of languages use Polish notation for verbs (the verb-subject-object pattern). These include Biblical Hebrew, Arabic, Irish, and Filipino. In such languages, it would be grammatical to say “Loves she him” but not “She loves him”. (3% of languages are VOS – loves him she – 1% are OVS – him loves she – and just a handful are OSV – him she loves).

Let’s return to English. Binary prepositions like “until” appear out front, but they also swap the order of the two things that they connect. For instance, “Until you do your homework, you cannot go outside” is the same as “You cannot go outside until you do your homework”, not “You do your homework until you cannot go outside”, which sounds a bit more sinister.

I came up with some examples of sentences with several layers of these binary prepositions to see if the same type of confusion as we get when examining Polish notation for “and” or “implies” sets in here, and oh boy does it.

Single connective
Since when the Americans dropped the bomb the war ended, some claimed it was justified.

Two connectives, unlayered
Since when the Americans dropped the bomb the war ended, when some claimed it was an atrocity others argued it was justified.

Still pretty readable, no? Now let’s layer the connectives.

One layer
Whenever he was late she would weep.
She would weep whenever he was late.

Two layers
Since whenever he was late she would weep, he hurried over.
He hurried over, since she would weep whenever he was late.

Three layers
Because since whenever he was late she would weep he hurried over, he left his wallet at home.
He left his wallet at home, because he hurried over since she would weep whenever he was late.

Four layers
Because because since whenever he was late she would weep he hurried over he left his wallet at home, when he was pulled over the officer didn’t give him a ticket.
The officer didn’t give him a ticket when he was pulled over, because he left his wallet at home because he hurried over since she would weep whenever he was late.

Five layers
When he heard because because since whenever he was late she would weep he hurried over he left his wallet at home, when he was pulled over the officer didn’t give the man a ticket, the mayor was outraged at the lawlessness.
The mayor was outraged at the lawlessness when he heard the officer didn’t give the man a ticket when he was pulled over because he left his wallet at home because he hurried over since she would weep whenever he was late.

Read that last one out loud to a friend and see if they believes you that it makes grammatical sense! With each new layer, things become more and more… Polish. That is, indecipherable. (Incidentally, Polish is SVO just like English). Part of the problem is that when we have multiple layers like this, phrases that are semantically connected can become more and more distant in the sentence. It reminds me of my favorite garden-path sentence pattern:

The mouse the cat the dog chased ate was digested.
(The mouse that (the cat that the dog chased) ate) was digested.
The mouse (that the cat (that the dog chased) ate) was digested.

The phrases that are meant to be connected, like “the mouse” and “was digested” are sandwiched on either side of the sentence, and can be made arbitrarily distant by the addition of more “that the X verbed” clauses.

Does anybody know of any languages where “and” comes before the two conjuncts? What about “or”? English does this with “if”, so it might not be too much of a stretch.

A Self-Interpreting Book

August 23, 2020August 23, 2020 ~ ~ 3 Comments

A concept: a book that starts by assuming the understanding of the reader and using concepts freely, and as you go on it introduces a simple formal procedure for defining words. As you proceed, more and more words are defined in terms of the basic formal procedure, so that halfway through, half of the words being used are formally defined, and by the end the entire thing is formally defined. Once you’re read through the whole book, you can start it over and read from the beginning with no problem.

I just finished a set theory textbook that felt kind of like that. It started with the extremely sparse language of ZFC: first-order logic with a single non-logical symbol, ∈. So the alphabet of the formal language consisted of the following symbols: ∈ ( ) ∧ ∨ ¬ → ↔ ∀ ∃ x ‘. It could have even started with a sparser formal language if it was optimizing for alphabet economy: ∈ ( ∧ ¬ ∀ x ‘ would suffice. As time passed and you got through more of the book, more and more things were defined in terms of the alphabet of ZFC: subsets, ordered pairs, functions from one set to another, transitivity, partial orders, finiteness, natural numbers, order types, induction, recursion, countability, real numbers, and limits. By the last chapter it was breathtaking to read a sentence filled with complex concepts and realize that every single one of these concepts was ultimately grounded in this super simple formal language we started with, with a finitistic sound and complete system of rules for how to use each one.

But could it be possible to really fully define ALL the terms used by the end of the book? And even if it were, could the book be written in such a way as to allow an alien that begins understanding nothing of your language to read it and, by the end, understand everything in the book? Even worse, what if the alien not only understands nothing of your language, but starts understanding nothing of the concepts involved? This might be a nonsensical notion; an alien that can read a book and do any level of sophisticated reasoning but doesn’t understand concepts like “and” and “or“.

One way that language is learned is by “pointing”: somebody asks me what a tree is, so I point to some examples of trees and some examples of non-trees, clarifying which is and which is not. It would be helpful if in this book we could point to simple concepts by means of interactive programs. So, for instance, an e-book where an alien reading the book encounters some exceedingly simple programs that they can experiment with, putting in inputs and seeing what results. So for instance, we might have a program that takes as input either 00, 01, 10, or 11, and outputs the ∧ operation applied to the two input digits. Nothing else would be allowed as inputs, so after playing with the program for a little bit you learn everything that it can do.

One feature of such a book would be that it would probably use nothing above first-order logical concepts. The reason is that the semantics of second-order logic cannot be captured by any sound and complete proof system, meaning that there’s no finitistic set of rules one could explain to an alien so that they know how to use the concepts involved correctly. Worse, the set of second-order tautologies is not even recursively enumerable (worse than the set of first-order tautologies, which is merely undecidable), so no amount of pointing-to-programs would suffice. First-order ZFC can define a lot, but can it define enough to write a book on what it can define?

Finiteness can’t be captured in a sound, complete, finitary proof system

June 8, 2020April 16, 2026 ~ ~ 3 Comments

Consider the sentence “This blog has finitely many posts.” Do you understand what this sentence means? Then no set of rules (even infinite, even uncomputable!) can model your reasoning. This claim may sound shocking, but it can be justified on solid metamathematical grounds.

Another example: the sentence “There are finitely many planets in the universe.” You don’t have to think it’s true, you just have to think you understand what it means. What’s the common theme? It’s the notion of there being ‘finitely many’ of some class of objects. Let’s imagine building a language that has the expressive resources of first-order logic (which are quite modest), plus an additional quantifier F, whose semantics are given by the following rule: Fx φ(x) is satisfied by a model iff there are only finitely many objects in that model that satisfy φ(x).

It turns out that any language consisting of first order logic plus the quantifier F can’t be axiomatized in any sound, complete, and finitary proof system. Notice that effective enumerability of the rules of the proof system is not a requirement here! So long as the language is strong enough to express the semantics of {∧, ¬, ∀, F, variables x_n, and relations R_n}, no set of sentences and sentence-manipulation rules in that language will suffice to capture these semantics.

Here’s a proof: consider the first-order theory of Peano arithmetic. This theory has nonstandard models (as any theory of arithmetic must have in a logic that is compact). All of these nonstandard models have the following feature: that there are numbers that are larger than infinitely many numbers. Think about what this means: this is a common feature of all nonstandards, so if we could write a sentence to express this feature then we could rule them all out! This is where the quantifier F steps in. With F, we can write the following simple sentence and add it to PA as an axiom:

∀x Fy (y < x)

In English: every number has finitely many numbers less than it. And with that, we’ve ruled out all nonstandard models! So now we have a theory that is categorical for ℕ. And that’s a big deal, metamathematically speaking!

Why? Well, as I’ve talked about in a previous post, you can prove some pretty strong limitative results about any logic that can produce a theory that’s categorical for ℕ. In particular, if we can produce such a theory then its logic cannot be compact. Quick proof: suppose a set of sentences Σ models ℕ. Add to Σ a constant c and the axioms “c ≠ 0”, “c ≠ 1”, “c ≠ 2”, and so on, and call this new set Σ’. Every finite subset of Σ’ models ℕ. So by compactness, Σ’ has a model. But this model is nonstandard – it contains numbers that aren’t natural numbers. And since Σ is a subset of Σ’, any model of Σ’ is also a model of Σ.

So compactness implies that no theory is categorical for ℕ. But compactness follows from the following three properties: a sound and complete proof system (Σ ⊢ α if and only if Σ ⊨ α), and that all proofs are only finitely long (try expressing this property without F!). Quick proof: If a set of sentences is finitely satisfied, then every finite subset of it has a model (by definition), so no finite subset of it can be refuted (by soundness), so the entire set can’t be refuted (by finite proofs), so the entire set is satisfied (by completeness).

So soundness + completeness + finiteness ⇒ compactness ⇒ the existence of nonstandard models of arithmetic in any theory that models ℕ. Which means that the semantics of F cannot be captured in any sound, complete, and finite proof system!

Take your pick: either you don’t really understand the semantics of the “finitely many” quantifier F, or no set of rules (not even requiring this set to be finite or computable) can fully capture your reasoning in finite-length proofs.

More information about related extensions of first-order logic and their limitations can be found here. The result I describe here is a rephrasing of results discussed there.

Meaning ain’t in the brain

April 26, 2020April 26, 2020 ~ ~ Leave a comment

I don’t know if there’s a name for the position that the meanings of our terms is pinned down by facts about the brain. The closest I know is semantic internalism, but a semantic internalist could think that meaning is pinned down by facts about qualia, which happen to not be facts about the brain. So I’ll make up a name for this position: call it physicalist semantic internalism.

Now, here’s an argument against physicalist semantic internalism that seems totally right to me.

What I mean by “second-order logical concepts” is the concepts of “and”, “or”, “not”, second-order quantifiers (“for all” and “for some”, ranging over not just objects but properties of objects), and the notions of functions, relations, and concepts.

The semantics of second order logic captures what I mean when I use second-order logical concepts.
No finite set of rules (and correspondingly no finite machine) can pin down the semantics of second order logic.
So no finite machine pins down what I mean when I use second-order logical concepts.
My brain is a finite machine.
So my brain does not pin down what I mean when I use second-order logical concepts.

And here’s another argument along similar lines:

The truth values of sentences about integers are determined by what we mean by integers.
The statement of the satisfiability of each Diophantine equation has a determinate truth value.
The statement of the satisfiability of each Diophantine equation is a statement about integers.
So the satisfiability of each Diophantine equation is fixed by what we mean by integers.
No finite machine can fix the satisfiability of each Diophantine equation.
Our brain is a finite machine.
So the meaning of integers is not contained in the brain.

On philosophical progress

April 3, 2020April 3, 2020 ~ ~ 2 Comments

A big question in the philosophy of philosophy is whether philosophers make progress over time. One relevant piece of evidence that gets brought up in these discussions is the lack of consensus on age old questions like free will, normative ethics, and the mind body problem. If a discipline is progressing steadily towards truth with time, the argument goes, then we should expect that questions that have been discussed for thousands of years should be more or less settled by now. After all, that is what we see in the hard sciences; there are no lingering disputes over the validity of vitalism or the realm of applicability of Newtonian mechanics.

There are a few immediate responses to this line of argument. It might be that the age old questions of philosophy are simply harder than the questions that get addressed by physicists or biologists. “Harder” doesn’t mean “requires more advanced mathematics to grapple with” here, but something more like “it’s unclear what even would count as a conclusive argument for one position or another, and therefore much less clear how to go about building consensus.” Try to imagine what sort of argument would convince you of the nonexistence of libertarian free will with the same sort of finality as a demonstration of time dilation convinces you of the inadequacy of nonrelativistic mechanics.

A possible rejoinder at this point would be to take after the logical positivists and deny the meaningfulness or at least truth-aptness of the big questions of philosophy as a whole. This may go too far; it may well be that a query is meaningful but, due to certain epistemic limitations of ours, forever beyond our ability to decide. (We know for sure that such queries can exist, due to Gödelian discoveries in mathematics. For instance, we know of the existence of a series of numbers that are perfectly well defined, but for which no algorithm can exist to enumerate all of them. The later numbers in this sequence will forever be a mystery to us, and not for lack of meaningfulness.)

I think that the roughly correct position to take is that science is largely about examining empirical facts-of-the-matter, whereas philosophy is largely about analyzing and refining our conceptual framework. While we have a fairly clear set of standards for how to update theories about the empirical world, we are not in possession of such a set of standards for evaluating different conceptual frameworks. The question of “what really are the laws governing the behavior of stuff out there” has much clearer truth conditions than a question like “what is the best way to think about the concepts of right and wrong”; i.e. It’s clearer what counts as a good answer and what counts as a bad answer.

When we’re trying to refine our concepts, we are taking into account our pre-theoretical intuitions (e.g. any good theory of the concept of justice must have something to do with our basic intuitive conception of justice). But we’re not just satisfied to describe the concept solely as the messy inconsistent bundle of intuitions that constitute our starting position on it. We also aim to describe the concept simply, by developing a “theory of justice” that relies on a small set of axioms and from which (the hope is) the rest of our conclusions about justice follow. We want our elaboration of the concept to be consistent, in that we shouldn’t simultaneously affirm that A is an instance of the concept and that A is not an instance of the concept. Often we also want our theory to be precise, even when the concept itself has vague boundaries.

Maybe there are other standards besides these, intuitiveness, simplicity, consistency, and precision. And the application of these standards is very rarely made explicit. But one thing that’s certain is that different philosophers have different mixes of these values. One philosopher might value simplicity more or less than another, and it’s not clear that one of them is doing something wrong by having different standards. Put another way, I’m not convinced that there is one unique right set of standards for conceptual refinement.

We may want to be subjectivists to some degree about philosophy, and say that there are a range of rationally permissible standards for conceptual refinement, none better than any other. This would have the result that on some philosophical questions, multiple distinct answers may be acceptable but some crazy enough answers are not. Maybe compatibilism and nihilism are acceptable stances on free will but libertarianism is not. Maybe dualism and physicalism are okay but not epiphenomenalism. And so on.

This view allows for a certain type of philosophical progress, namely the gradual ruling out of some philosophical positions as TOO weird. It also allows for formation of consensus, through the discovery of philosophical positions that are the best according to all or most of the admissible sets of standards. I think that one example of this would be the relatively recent rise of Bayesian epistemology in philosophy of science, and in particular the Bayesian view of scientific evidence as being quantified by the Bayes factor. In brief, what does it mean to say that an observation O gives evidence for a hypothesis H? The Bayesian not only has an answer to this, but to the more detailed question of to what degree O gives evidence for H. The quantity is cr(O | H) / cr(O), where cr(.) is a credence function encoding somebody’s beliefs before observing O. If this quantity is equal to 1, then O is no evidence for H. If it is greater than 1, then O is evidence for H. And if it’s less than 1, then O is evidence against H.

Not everything in Bayesian epistemology is perfectly uncontroversial, but I would argue that on this particular issue – the issue of how to best formalize the notion of scientific evidence – the Bayesian definition survives all its challenges unscathed. What are some other philosophical questions on which you think there has been definite progress?

Logic on Planet Zorko

March 28, 2020April 16, 2026 ~ ~ 5 Comments

A group of Zorkan mathematicians are sitting around having a conversation in a language that you are unfamiliar with. You are listening in with a translator. This translator is an expert in formal logic, and has decided to play the following game with you. He says:

“After listening to the full conversation, I will translate all the sentences that were said for you. But I won’t translate them into English; I want something more universal. Instead, I will choose a formal language that captures the mathematical content of all the sentences said, while leaving out the vagaries and subtleties of the Zorkan language. I will describe to you the semantics of the formal language I choose, if you don’t already know it.”

“Furthermore,” (says the translator) “I happen to be intimately familiar with Zorkan society and culture. The Zorkans are having a discussion about one particular mathematical structure, and I know which one that is. The mathematicians are all fantastically precise reasoners, such that none of them ever says a sentence that is false of the structure that they are discussing.”

(So for instance if they are talking about the natural numbers, then no mathematician will say “0 = 1”, and if they are talking about abelian groups, then no mathematician will say “∃x∃y (xy ≠ yx)”. But they could say “∃x∃y (xy ≠ yx)” if they are talking about non-abelian groups.)

You know nothing about Zorkan psychology, besides that the Zorkan way of life is so utterly foreign to you that you cannot reliably assume that the mathematical structures that come most naturally to you will also come naturally to them. It might be, for instance, that nonstandard models of arithmetic are much more intuitive to them than the natural numbers. You cannot assume that the structure they are discussing is the one that you think is “most natural”; you can only conclude this if one of them says a sentence that is true of that model and no others.

The conversation finishes, and you are tasked with answering the following two questions:

(1) What structure are they talking about?
(2) Can you come up with a verification procedure for the mathematicians’ sentences (including possible future sentences they might say on the topic)?

So, that’s the setup. Now, the question I want you to consider is the following: Suppose that the structure that the mathematicians have in mind is actually the natural numbers. Is there some conversation, any conversation at all (even allowing infinitely long conversations, and uncomputable conversations – conversations which cannot be produced as the output of any Turing machine), that the mathematicians could have, and some translation of this conversation, such that you can successfully answer both (1) and (2)? If so, what is that conversation? And if not, then why not?

✯✯✯

Let’s work out some simple examples.

Example 1

Suppose the conversation is translated into a propositional language with three atomic propositions {P, Q, R}.

Mathematician A: “P ∨ Q”
Mathematician B: “(Q ∨ R) → (¬P)”
Mathematician C: “R”

From this conversation, you can deduce that the model they are talking about is the one that assigns “False” to P, “True” to Q, and “True” to R.

M: {P is false, Q is true, R is true}

This is the answer to the question 1!

As for the second question, we want to know if there’s some general procedure that produces all the future statements the mathematicians could make. For instance, the set generated by our procedure should include (Q ∧ R) but not (Q ∧ P).

It turns out that such a procedure does exist, and is not too difficult to write out and implement.

Example 2

Take the above conversation and modify it slightly:

Mathematician A: “P ∨ Q”
Mathematician B: “(Q ∨ R) → (¬P)”
Mathematician C: “¬R”

If you work it out, you’ll see that question 1 can no longer be answered unambiguously. The problem is that there are multiple models of the sentences that the mathematicians are saying:

M₁: {P is false, Q is true, R is false}
M₂: {P is true, Q is false, R is false}

So even though they have one particular structure in mind, you don’t have enough information from their conversation to figure out exactly what that structure is.

Now let’s think about the answer to question 2. We don’t know whether the mathematicians are thinking about M₁ or M₂, and M₁ and M₂ differ in what truth value they assign the proposition P. So we can’t construct an algorithm that will generate the set of all their possible future statements, as this would require us to know, in particular, whether P is true or false in the model that they have in mind.

We might suspect that this holds true generally: if you can’t answer question 1, then you won’t be able to answer question 2 either. But we might also wonder: if we can answer question 1, then can we also always answer question 2?

The answer is no, as the next example will show.

Example 3

For this conversation, the translation is in second-order logic. This will allow us to talk about more interesting mathematical structures than before; namely, structures that have a domain of objects on which functions and predicates can act. In particular, we’re in a second-order language with one constant symbol “c” and one function symbol “f”. Here’s the conversation:

Mathematician A: ¬∃x (f(x) = c)
Mathematician B: ¬∃x∃y ((f(x) = f(y)) ∧ ¬(x = y))
Mathematician C: ∀R (R(c) ∧ ∀x(R(x) → R(f(x))) → ∀x R(x))

Notice that the only truly second-order sentence is the third one, in which we quantify over a predicate variable R rather than an individual variable x, y, z, …. But the second-order status of this sentence it makes it that the translator could not have possibly translated this conversation into a first-order language, much less a propositional language.

This time, questions 1 and 2 are much harder to answer than before. But if you work it out, you’ll see that there is exactly one mathematical structure that satisfies all three of the mathematicians’ statements. And that structure is the natural numbers!

So, we know exactly what structure the mathematicians have in mind. But can we also answer question 2 in the positive? Can we produce some verification procedure that will allow us to generate all the future possible sentences the mathematicians could say? Unfortunately, the answer is no. There is no sound and complete proof system for second-order logic, so in particular, we have no general algorithm for producing all the truths in this second order language. So sad.

Example 4

Now let’s move to first-order logic for our final example. The language of translation will be a first order language with a constant symbol for every natural number {0,1,2,3,…}, function symbols for ordinary arithmetic {+, ×}, and relation symbols for orders {≥}

Imagine that the conversation consists of literally all the first-order sentences in the language that are true of the natural numbers. Anything which you can say in the language, and which is true as a statement about ℕ, will be said at some point. This will obviously be a very long conversation, and in fact infinitely long, but that’s fine. It will include sentences like “0 ≠ 1”, “0 ≠ 2”, “0 ≠ 3”, and so on. (These Zorkans are extremely thorough.)

Given this conversation, can we answer (1) and (2)? Take a guess; the answer may surprise you!

It turns out that even though we can answer (2) positively – we can actually produce an algorithm that will generate one-by-one all the possible future statements of the mathematicians (which really means all the sentences in the language that are true of the natural numbers), we cannot answer (1) positively! There are multiple distinct mathematical structures that are compatible with the entirety of true statements about natural numbers in the language. Earlier we hypothesized that any time we have a negative answer to (1), we will also have a negative answer to (2). But this is not true! We can verify all the true statements about natural numbers in the language… without even knowing that we’re actually talking about the natural numbers! This is an important and unintuitive consequence of the expressive limitations (and in particular, of the compactness) of first-order logic.

The Takeaway

We had an example where we could answer both (1) and (2) for a simple mathematical structure (a model of propositional logic). And we saw examples for natural numbers where we could answer (1) but not (2), as well as examples where we could answer (2) but not (1). But we haven’t yet seen an example for natural numbers where we had both (1) and (2). This is no coincidence!

It is actually a consequence of the theorem I proved and discussed in my last post that no such such conversation can exist. When structures at least as complicated as the natural numbers are being discussed in some language (call it L), you cannot simultaneously (1) know for sure what structure is being talked about and (2) have an algorithmic verification system for L-sentences about the structure.

Crazy conditionals

March 10, 2020 ~ ~ 3 Comments

It’s well known that the material implication → of propositional logic does not do a perfect job of capturing what we mean when we make “if… then…” statements in English. The usual examples of failure rest on the fact that any material conditional with a false antecedent is vacuously true (so “if 2 is odd then 2 is even” turns out to be true). But over time, philosophers have come up with a whole lot of different ways in which → can catch us by surprise.

Here’s a list of some such cases. In each case, I will present an argument using if…then… statements that is clearly invalid, but which is actually valid in propositional logic if the if…then… statements are translated as the material conditional!

1. Harper

If I put sugar into my coffee, it will taste fine.
Therefore, if I put sugar and motor oil into my coffee, it will taste fine.

S → F
(S ∧ M) → F

2. Distributivity

If I pull both switch A and switch B, the engine will start.
Therefore, either the engine will start if I pull switch A or the engine will start if I pull switch B.

(A ∧ B) → S
(A → S) ∨ (B → S)

3. Transitivity

If Biden dies before the election, Trump will win.
If Trump wins the election, Biden will retire to his home.
Therefore, if Biden dies before the election, Biden will retire to his home.

B → T
T → R
B → R

4. Principle of Explosion

Either zombies will rise from the ground if I bury a chicken head in my backyard, or zombies will rise from the ground if I don’t bury a chicken head in my backyard.

(B → D) ∨ (¬B → D) is a tautology

5. Contraposition

If I buy a car, I won’t buy a Pontiac.
Therefore, if I buy a Pontiac, I won’t buy a car.

C → ¬P
P → ¬C

6. Simplification

If John is in London then he’s in England, and if he’s in Paris then he’s in France.
Therefore, either (1) if John’s in London he’s in France or (2) if John’s in Paris then he’s in England.

(L → E) ∧ (P → F)
(L → F) ∨ (P → E)

7. False Antecedent

It’s not the case that if God exists then human life is a product of random chance.
Therefore, God exists.

¬(G → C)
G

8. True Consequent

If I will have eternal life if I believe in God, then God must exist.
I do not believe in God.
Therefore, God exists.

(B → E) → G
~B
G

You can check for yourself that each of these is logically valid! Can you figure out what’s going wrong in each case?

Describing the world

February 16, 2020February 16, 2020 ~ ~ 1 Comment

Wittgenstein starts his Tractatus Philosophicus with the following two sentences.

1. The world is everything that is the case.

1.1 The world is the totality of facts, not of things.

Let’s take him up on this suggestion and see how far we get. In the process, we’ll discover some deep connections to theorems in mathematical logic, as well as some fascinating limitations on the expressive powers of propositional and first order logic.

We start out with a set of atomic propositions. For a very simple world, we might only need a finite number of these: “Particle 1 out of 3 has property 1 out of 50”, “Particle 2 of 3 has property 17 out of 50”, and so on. More realistically, the set of atomic propositions will be infinite (countable if the universe doesn’t have any continuous properties, and uncountable otherwise).

For simplicity, we’ll imagine labeling our set of atomic propositions P₁, P₂, P₃, and so on (even though this entails that there are at most countably many, nothing important will rest on this assumption.) We combine these atomic propositions with the operators of propositional logic {(, ), ¬, ∧, ∨, →}. This allows us to build up more complicated propositions, like ((P₇∧P₂)→(¬P₁₃)). This will be the language that we use to describe the world.

Now, the way that the world is is just a consistent assignment of truth values to the set of all grammatical sentences in our language. For example, one simple assignment of truth values is the one that assigns “True” to all atomic propositions. Once we’ve assigned truth values to all the atomic propositions, we get the truth values for the rest of the set of grammatical sentences for free, by the constraint that our truth assignment be consistent. (For instance, if P₁ and P₂ are both true, then (P₁∧P₂) must also be true.)

Alright, so the set of ways the world could be corresponds to the set of truth assignments over our atomic propositions. The final ingredient is the notion that we can encode our present knowledge of the world as a set of sentences. Maybe we know by observation that P₅ is true, and either P₂ or P₃ is true but not both. Then to represent this state of knowledge, we can write the following set of sentences:

{P₅, (P₂∨P₃), ¬(P₂∧P₃)}

Any set of sentences picks out a set of ways the world could be, such that each of these possible worlds is compatible with that knowledge. If you know nothing at all, then the set of sentences representing your knowledge will be the empty set {}, and the set of possible worlds compatible with your knowledge will be the set of all possible worlds (all possible truth assignments). On the other extreme, you might know the truth values of every atomic proposition, in which case your state of knowledge uniquely picks out one possible world.

In general, as you add more sentences to your knowledge-set, you cut out more and more possible worlds. But this is not always true! Ask yourself what the set of possible worlds corresponding to the set {(P₁∨¬P₁), (P₂∨¬P₂), (P₃∨¬P₃)} is. Since each of these sentences is a tautology, no possible worlds are eliminated! So our set of possible worlds is still the set of all worlds.

Now we get to an interesting question: clearly for any knowledge-set of sentences, you can express a set of possible worlds consistent with that knowledge set. But is it the case that for any set of possible worlds, you can find a knowledge-set that uniquely picks it out? If I hand you a set of truth assignment functions and ask you to tell me a set of propositions which are consistent with that set of worlds and ONLY that set, is that always possible? Essentially, what we’re asking is if all sets of possible worlds are describable.

We’ve arrived at the main point of this essay. Take a minute to ponder this and think about whether it’s possible, and why/why not! For clarification, each sentence can only be finitely long. But! You’re allowed to include an infinity of sentences.

(…)

(Spoiler-hiding space…)

(…)

If there were only a finite number of atomic propositions, then you could pick out any set of possible worlds with just a single sentence in conjunctive normal form. But when we start talking about an infinity of atomic propositions, it turns out that it is not always possible! There are sets of possible worlds that are literally not describable, even though our language includes the capacity to describe each of those words and we’re allowed to include an infinite set of sentences.

There’s a super simple proof of this. Let’s give a name to the cardinality of the set of sentences: call it K. (We’ve been tacitly acting as if the cardinality is countable this whole time, but that doesn’t actually matter.) What’s the cardinality of the set of all truth assignments?

Well, each truth assignment is a function from all sentences to {True, False}. And there are 2^K such assignments. 2^K is strictly larger than K, so there are more possible worlds than there are sentences. Now, the cardinality of the set of sets of sentences is also 2^K. But the set of SETS of truth of assignments is 2^{2^K}!

What this means is that we can’t map sets of sentences onto sets of truth assignments without leaving some things out! This proof carries over to predicate logic as well. The language for both propositional and predicate logic is unable to express all sets of possible worlds corresponding to that language!

I love this result. It’s the first hint in mathematical logic that syntax and semantics can come apart.

That result is the climax of this post. What I want to do with the rest of this post is to actually give an explicit example of a set of truth assignments that are “indescribable” by any set of sentences, and to prove it. Warning: If you want to read on, things will get a bit more technical from here.

Alright, so we’ll use a shortcut to denote truth assignments. A truth assignment will be written as a string of “T”s and “F”s, where the nth character corresponds to how the truth assignment evaluates P_n. So the all-true truth assignment will just be written “TTTTTT…” and the all-false truth assignment will be written “FFFFF…”. The truth assignment corresponding to P₁ being false and everything else true will be written “FTTTTT…”. And so on.

Now, here’s our un-describable set of truth assignments. {“FFFFFF…”, “TFFFFF…”, “TTFFFF…”, “TTTFFF…”, …}. Formally, define V_n to be the truth assignment that assigns “True” to every atomic proposition up to and including P_n, and “False” to all others. Now our set of truth assignments is just {V_n | n ∈ ℕ}.

Let’s prove that no set of sentences uniquely picks out this set of truth assignments. We prove by contradiction. Suppose that we could find a set of sentences that uniquely pick out these truth assignments and none other. Let’s call this set A. Construct a new set of sentences A’ by appending all atomic propositions A: A’ = A ∪ {P₁, P₂, P₃, …}.

Is there any truth assignment that is consistent with all of A’? Well, we can answer this by using the Compactness Theorem: A’ has a truth assignment if and only if every finite subset of A’ has a truth assignment. But every finite subset of A’ involves sentences from A (which are consistent with Vn for each n by assumption), and a finite number of atomic propositions. Since each finite subset of A’ is only asserting the truth of a finite number of atomic sentences, we can always find a truth assignment V_k in our set that is consistent with it, by choosing one that switches to “False” long after the last atomic proposition that is asserted by our finite subset.

This means that each finite subset of A’ is consistent with at least one of our truth assignments, which means that A’ is consistent with at least one of our truth assignments. But A’ involves the assertion that all atomic propositions are true! The only truth assignment that is consistent with this assertion is the all-true assignment! And is that truth assignment in our set? No! And there we have it, we’ve reached our contradiction!

We cannot actually describe a set of possible worlds in which either all atomic propositions are false, or only the first is true, or only the first two are true, or only the first three are true, and so on forever. But this might prompt the question: didn’t you just describe it? How did you do that, if it’s impossible? Well, technically I didn’t describe it. I just described the first four possibilities and then said “and so on forever”, assuming that you knew what I meant. To have actually fully pinned down this set of possible worlds, I would have had to continue with this sentence forever. And importantly, since this sentence is a disjunction, I could not split this infinite sentence into an infinite set of finite sentences. This fundamental asymmetry between ∨ and ∧ is playing a big role here: while an infinite conjunction can be constructed by simply putting each clause in the conjunction as a separate sentence, an infinite disjunction cannot be. This places a fundamental limit on the ability of a language with only finite sentences to describe the world.

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31