Introduction to Mathematical Logic (Part 1)

Mathematical logic is the study of the type of reasoning we perform when we do mathematics, and the attempt to formulate a general language as the setting in which all mathematics is done. In essence, it is an attempt to form a branch of mathematics, of which all other branches of mathematics will emerge as special cases.

You might sort of think that when speaking at this level of abstraction, there will nothing general and interesting to say. After all, we’re trying to prove statements not within a particular domain of mathematics, but theorems that are true across a wide swath of mathematics, potentially encompassing all of it.

The surprising and amazing thing is that this is not the case. It turns out that there are VERY general and VERY surprising things you can discover by looking at the logical language of mathematics, a host of results going by names like the Completeness Theorem, the Incompleteness Theorem, the Compactness Theorem, the Löwenheim-Skolem Theorem, and so on. These results inevitably have a great deal of import to our attitudes towards the foundations of mathematics, being that they generally establish limitations or demonstrate eccentricities in the types of things that we can say in the language of mathematics.

My goal in this post is to provide a soft introduction to the art of dealing in mathematics at this level of ultimate abstraction, and then to present some of the most strange things that we know to be true. I think that this is a subject that’s sorely missing this type of soft introduction, and hope I can convey some of the subject’s awesomeness!

— — —

To start out with, why think that there is any subject matter to be explored here? Different branches of mathematics sometimes appear to be studying completely different types of structures. I remember an anecdote from an old math professor of mine, who worked within one very narrow and precisely defined area of number theory, and who told me that when she goes to talks that step even slightly outside her area of specialty, the content of the  lectures quickly incomprehensible to her. Why think that there is such a common language of mathematics, if specialists in mathematics can’t even understand each other when talking between fields?

The key thing to notice here is that although different fields of mathematics are certainly wildly different in many ways, there nevertheless remain certain fundamental features that are shared in all fields. Group theorists, geometrists, and number theorists will all accept the logical inference rule of modus ponens (if P is true and P implies Q, then Q is true), but none of them will accept its converse (if Q is true and P implies Q, then P is true). No matter what area of mathematics you study, you will accept that if P(x) is true for all x, then it is true for any particular x you choose. And so on. These similarities may seem obvious and trivial, but they HAVE to be obvious and trivial to be things that every mathematician agrees on. The goal, then, is to formalize a language that has these fundamental inference rules and concepts built in, and that has many special cases to account for the differences between domains of math, specified by some parameters that are freely chosen by any user of the system.

There are actually several distinct systems that attempt to accomplish this task. Generally speaking, there are three main branches, in order of increasing expressive power: propositional logic, first order (predicate) logic, and second order logic.

Reasoning In Zeroth Order

Let’s start with propositional logic, sometimes called “zeroth order logic.” Propositional logic is the framework developed to deal with the validity of the following types of arguments:

Argument 1

  1. 2+2=4.
  2. If 2+2=4, then 1+3=4.
  3. So 1+3=4.

Argument 2

  1. The Riemann hypothesis is false and P = NP.
  2. So P = NP.

Notice that it doesn’t matter if our premises are true or not. Logical validity doesn’t care about this, it just cares that the conclusions really do follow from the premises. This is a sign of the great generality at which we’re speaking. We’re perfectly fine with talking about a mathematical system in which the Riemann hypothesis is false, or in which 2+2 is not 4, just so long as we accept the logical implications of our assumptions.

Propositional logic can express the validity of these arguments by formalizing rules about valid uses of the concepts ‘and’, ‘if…then…’, ‘or’, and so on. It remains agnostic to the subject matter being discussed by not fully specifying the types of sentences that are allowed to be used in the language. Instead, any particular user of the language can choose their set of propositions that they want to speak about.

To flesh this out more, propositional logic fulfills the following three roles:

  1. Defines an alphabet of symbols.
  2. Specifies a set of rules for which strings are grammatical and which are not.
  3. Details rules for how to infer new strings from existing strings.

In more detail:

  1. The set of symbols in propositional logic are split into two categories: logical symbols and what I’ll call “fill-in-the-blank” symbols. The logical symbols are (, ), , , ¬, and →. The fill-in-the-blank symbols represent specific propositions, that are specified by any particular user of the logic.
  2. Some strings are sensible and others not. For example, the string “P∧∧” will be considered to be nonsensical, while “PQ” will not. Some synonyms for sensible strings are well-formed formulas (WFFs), grammatical sentences, and truth-apt sentences. There is a nice way to inductively generate the set of all WFFs: Any proposition is a WFF, and for any two WFFs F and F’, the following are also WFFs: (FF’), (FF’), ¬F, (F→F’).
  3. These include rules like modus ponens (from P and P→Q, derive Q), conjunction elimination (from PQ, derive P), double negation elimination (from ¬¬P, derive P), and several more. They are mechanical rules that tell you how to start with one set of strings and generate new ones in a logically valid way, such that if the starting strings are true than the derived ones must also be true. There are several different but equivalent formulations of the rules of inference in propositional logic.

A propositional language fills in the blanks in the logic. Say that I want to talk about two sentences using propositional logic: “The alarm is going off”, “A robber has broken in.” For conciseness, we’ll abbreviate these two propositions as A for alarm and R for robber. All I’ll do to specify my language is to say “I have two propositions: {A, R}”

The next step is to fill in some of the details about the relationships between the propositions in my language. This is done by supplementing the language with a set of axioms, and we call the resulting constrained structure a propositional theory. For instance, in my example above we might add the following axioms:

  1. A→R
  2. AR

In plain English, these axioms tell us that (1) if the alarm is going off, then a robber has broken in, and (2) an alarm is going off or a robber has broken in.

Finally, we talk about the models of our theory. Notice that up until now, we haven’t talked at all about whether any statements are true or false, just about syntactic properties like “The string P¬ is not grammatical” and about what strings follow from each other. Now we interpret our theory by seeing what possible assignments of truth values to the WFFs in our language are consistent with our axioms and logical inference rules. In our above example, there are exactly two interpretations:

Model 1: A is true, R is true
Model 2: A is false, R is true

These models can be thought of as the possible worlds that are consistent with our axioms. In one of them, the alarm has gone off and a robber has broken in, and in the other, the alarm hasn’t gone off and the robber has broken in.

Notice that R turns out true in both models. When a formula F is true in all models of a theory, we say that the theory semantically entails F, and write this as T F. When a formula can be proven from the axioms of the theory using the rules of inference given by the logic, then we say that the theory syntactically entails F, and write this as T F.

This distinction between syntax and semantics is really important, and will come back to us in later discussion of several important theorems (notably the completeness and incompleteness theorems). To give a sneak peek: above we found that R was semantically entailed by our theory. If you’re a little familiar with propositional logic, you might have also realized that R can be proven from the axioms. In general, syntactic truths will always be semantic truths (if you can prove something, then it must be true in all models, or else the models would be inconsistent. But a model is by definition a consistent assignment of truth values to all WFFs). But a natural question is: are all semantic consequences of a theory also syntactic consequences? That is, are all universal truths of the theory (things that are true in every model of the theory) provable from the theory?

Summary of mathematical logic meeting

If the answer is yes, then we say that our logic is complete. And it turns out that the answer is yes, for propositional logic. Whether more complex logics are complete turns out to be a more interesting question. More on this later.

This four-step process I just laid out (logic to language to theory to model) is a general pattern we’ll see over and over again. In general, we have the following division of labor between the four concepts:

  1. Logic: The logic tells us the symbols we may use (including some fill-in-the-blank categories of symbols), the rules of grammar, and a set of inference rules for deriving new strings from an existing set.
  2. Language: The language fills in the blanks in our logic, fully specifying the set of symbols we will be using.
  3. Theory: The theory adds axioms to the language.
  4. Model: A model is an assignment of truth values to all WFFs in the language, consistent with the axioms and the inference rules of our logic.

It’s about time that we apply this four-step division to a more powerful logic. Propositional logic is pretty weak. Not much interesting math can be done in a purely propositional language, and it’s wildly insufficient to capture our notion of logically valid reasoning. Consider, for example, the following argument:

  1. Socrates is a man.
  2. All men are mortal.
  3. So, Socrates is mortal.

This is definitely a valid argument. No rational agent could agree that 1 and 2 are true, and yet deny the truth of 3. But can we represent the validity of this argument in propositional logic? No!

Consider that the three sentences “Socrates is a man”, “All men are mortal”, and “Socrates is mortal” are distinct propositions, and the relationships between them are too subtle for propositional logic to capture. Propositional logic can’t see that the first proposition is asserting the membership of Socrates to a general class of things (“men”), and that the second proposition is then making a statement about a universal property of things in this class. It just sees two distinct propositions. To propositional logic, this argument just looks like

  1. P
  2. Q
  3. Therefore, R

But this is not logically valid! We could make it valid by adding as a premise the sentence (PQ)→R, which corresponds to the English sentence “If Socrates is a man and all men are mortal, then Socrates is mortal.” But this should be seen as a tautology, something that is provable in any first order theory that contains the propositions P Q and R, not a required additional assumption. Worse, if somebody came along and stated the proposition A = “Aristotle is a man”, then we would need a whole ‘nother assumption to assert that Aristotle is also mortal! And in general, for any individual instance of this argument, we’d need an independent explanation for its validity. This is not parsimonious, and indicative that propositional logic is missing something big.

Missing what? To understand why this argument is valid, you must be able to reason about objects, properties, and quantification. This is why we must move on to an enormously more powerful and interesting logic: first order logic.

Reasoning In First Order

First order logic is a logic, so it must fill the same three roles as we saw propositional logic did above. Namely, it must define the alphabet, the grammar, and the inference rules.

Symbols
Logical: ¬ → ( ) =
Variables: x y z w …
Constants: ______
Predicates: ______
Functions: ______

That’s right, in first order we have three distinct categories of fill-in-the-blank symbols. Intuitively, constants will be names that refer to objects, predicates will be functions from objects to truth values, and functions will take objects to objects. To take an everyday example, if the objects in consideration are people, then we might take ‘a’ to be a constant referring to a person named Alex, ’T’ to be a predicate representing ‘is tall’, and ‘f’ to be a function representing “the father of”. So T(a) is either true or false, while f(a) doesn’t have a truth value (it just refers to another object). But T(f(a)) does have a truth value, because it represents the sentence “Alex’s father is tall.”

Next, we define well-formed formulas. This process is more complicated than it was for propositional logic, because we have more types of things than we did before, but it’s not too bad. We start by defining a “term”. The set of terms is inductively generated by the following scheme: All constants and variables are terms, and any function of terms is itself a term. Intuitively, the set of terms is the set of objects that our language is able to “point at”.

With the concept of terms in hand, we can define WFFs through a similar inductive scheme: Any predicate of terms is a WFF. And for any WFFs F and F’, (FF’), (FF’), (¬F), (F→F’), x F, x F are all WFFs. The details of this construction are not actually that important, I just think it’s nice how you can generate all valid first order formulas from fairly simple rules.

Good! We have an alphabet, a grammar, and now all we need from our logic is a set of inference rules. It turns out that this set is just going to be the inference rules from propositional logic, plus some new ones:

Quantifier elimination: From x P(x) derive P(t) (for any term t and predicate P)
Quantifier introduction: From P(t) derive x P(x) (for any term t and predicate P)

That’s it, we’ve defined first order logic! Now let’s talk about a first order language. Just like before, the language is just obtained by filling in the blanks left open by our logic. So for instance, we might choose the following language:

Constants: a
Functions: f
Predicates: none

In specifying our function, we have to say exactly what type of function it is. Functions can take as inputs a single object (“the father of”) or multiple objects (“the nearest common ancestor of”), and this will make a difference to how they are treated. So for simplicity, let’s say that our function f just takes in a single object.

A first order theory will simply be a language equipped with some set of axioms. Using our language above, we might have as axioms:

  1. x (f(x) ≠ x)
  2. x (f(x) ≠ a)

In plain English, we’re saying that f never takes any object to itself or to a.

And lastly, we get to the models of a first order theory. There’s an interesting difference between models here and models in propositional logic, which is that to specify a first order model, you need to first decide on the size of your set of objects (the size of the ‘universe’, as it’s usually called), and then find a consistent assignment of truth values to all propositions about objects in this universe.

So, for instance, we can start searching for models of our theory above by starting with models with one object, then two objects, then three, and so on. We’ll draw little diagrams below in which points represent objects, and the arrow represents the action of the function f on an object.

  1. No 1-element universe.
    Summary-of-mathematical-logic-meeting-1.png
  2. No 2-element universe.
    Summary-of-mathematical-logic-meeting-2.png
  3. 3-element universe
    Summary-of-mathematical-logic-meeting-3.png
  4. 4 element universes
    Summary-of-mathematical-logic-meeting-5.pngSummary-of-mathematical-logic-meeting-4.pngSummary-of-mathematical-logic-meeting-7.pngSummary of mathematical logic meeting (6)
  5. And so on…

It’s a fun little exercise to go through these cases and see if you can figure out why there are no models of size 1 or 2, or why the one above is the only model of size 3. Notice, by the way, that in the last two images, we have an object for which there is no explicit term! We can’t get there just using our constant and our functions. Of course, in this case we can still be clever with our quantifiers to talk about the Nameless One indirectly (for instance, in both of these models we can refer to the object that is not equal to a, f(a), or f(f(a))) But in general, the set of all things in the universe is not going to be the same as the set of things in the universe that we can name.

Here’s a puzzle for you: Can you make a first order sentence that says “I have exactly four objects”? That is, can you craft a sentence that, if added as an axiom to our theory, will rule out all models besides the ones that have exactly four elements?

(…)

(Think about it for a moment before moving on…)

(…)

Here’s how to do it for a universe with two objects (as well as a few other statements of interest). The case of four objects follows pretty quickly from this.

  • “I have at most two objects” = xyz (z=x z=y)
  • “I have exactly two objects” = xy (x≠y z(z=x z=y))
  • “I have at least two objects” = xy (x≠y)

Another puzzle: How would you say “I have an infinity of objects”, ruling out all finite models?

(…)

(Think about it for a moment before moving on…)

(…)

This one is trickier. One way to do it is to introduce an infinite axiom schema: “I have at least n objects” for each n.

This brings up an interesting point: theories with infinite axioms are perfectly permissible for us. This is a choice that we make that we could perfectly well deny, and end up with a different and weaker system of logic. How about infinitely long sentences? Are those allowed? No, not in any of the logics we’re talking about here. A logic in which infinite sentences are allowed is called an infinitary logic, and I don’t know too much about such systems (besides that propositional, first, and second order logics are not infinitary).

Okay… so how about infinitely long derivations? Are those allowed? No, we won’t allow those either. This one is more easy to justify, because if infinite derivations were allowed, then we could prove any statement P simply by the following argument “P, because P, because P, because P, because …”. Each step logically follows from the previous, and in any finite system we’d eventually bottom out and realize that the proof has no basis, but in a naive infinite-proof system, we couldn’t see this.

Alright, one last puzzle. How would you form the sentence “I have a finite number of objects”? I.e., suppose you want to rule out all infinite models but keep the finite ones. How can you do it?

(…)

(Think about it for a moment before moving on…)

(…)

This one is especially tricky, because it turns out to be impossible! (Sorry about that.) You can prove, and we will prove in a little bit, that no first order axiom (or even infinite axiom schema) is in general capable of restricting us to finite models. We have run up against our first interesting expressive limitation of first-order logic!

Okay, let’s now revisit the question of completeness that I brought up earlier. Remember, a logic is complete if for any theory in that logic, the theory’s semantic implications are the same as its syntactic implications (all necessary truths are provable). Do you think that first order logic is complete?

The answer: Yes! Kurt Gödel proved that it is in his 1929 doctoral dissertation. Anything that is true in all models of a first order theory, can be proven from the axioms of that theory (and vice versa). This is a really nice feature to have in a logic. It’s exactly the type of thing that David Hilbert was hoping would be true of mathematics in general. (“We must know. We will know!”) But this hope was dashed by the same Kurt Gödel as above, in his infamous incompleteness theorems.

There will be a lot more to say about that in the future, but I’ll stop this post for now. Next time, we’ll harness the power of first order logic to create a first order model of number theory! This will give us a chance to apply some powerful results in mathematical logic, and to discover surprising truths about the logic’s limitations.

Thinking in first order: Are the natural numbers countable?

Here’s a weird fact about first-order logic: there is no first-order theory that has as a model the natural numbers and only the natural numbers. Any first-order theory talking about N can only say things that are also consistent with weird other systems that include uncountable infinities of nonstandard numbers. Here’s the proof:

Take first order Peano arithmetic (PA). We know that the natural numbers N are a model of PA, because all the axioms are true of PA. Now add a constant c to your language, and adjoin an infinite axiom schema: ‘c > 0’, ‘c > S0’, ‘c > SS0’, and so on. Call this new theory PA(c).

The natural numbers are clearly no longer a model of PA(c), because PA(c) says that there’s a number that’s larger than all natural numbers, which is false in N. But we can prove that PA(c) still has a model! We prove this using the Compactness Theorem, which tells us that a theory has a model if every finite subset of its axioms has a model. Consider any finite subset of the axioms of PA(c). For any such subset, there are only finitely many statements that look like c > X. So there are only finitely many numbers that c is larger than. But then this is always going to be satisfied by N! For any collection of natural numbers, you can always find a natural number larger than all of then. So N is a model of every finite subset of the axioms of PA(c). But then, by compactness, since every finite subset of the axioms of PA(c) has a model, PA(c) has a model!

Final step in the proof: PA(c) has a model, and this model is definitely not the natural numbers. Call it a nonstandard model. But now note that PA(c) was obtained from PA just by adding axioms. Adding axioms never increases the number of models, it only narrows them down. So if the nonstandard model is a model of PA(c), then it also must be a model of PA! And there it is, we’ve proved that first order Peano arithmetic has more models than the natural numbers.

Notice, also, that it didn’t matter that we started with first order Peano arithmetic. We could have started with any first order theory that has N as a model, added the constant c, and adjoined on the infinite axiom schema corresponding to c being larger than all natural numbers! So what we’ve actually proven is that no first order theory can “pin down” the natural numbers. Any first order theory that has N as a model, also has weird extra models that allow numbers greater than all natural numbers. Furthermore, you can then run the exact same type of proof again, adjoining a new constant that’s larger than all natural numbers and also larger than c, and by compactness show that it has a model, so that you’ve now found that any first order theory that has N as a model also has a model that has the natural numbers, plus a number larger than all natural numbers, plus a number that’s larger than even that. And you can do this an uncountable infinity of times, and end up proving that if your favored theory has N as a model, then it also has a model whose size is an uncountable infinity. And in fact, any first order theory that has N as a model, also has models of every infinite cardinality.

This is super wild, and really important. Why does it matter that we can’t pin down the natural numbers using any first order theory? Well, think about the set of statements that are true of N, but false of these nonstandard models. These include “there is no number larger than all natural numbers” and “There are a countable infinity of natural numbers”. These statements are not true in all models of any first order theory of N. And if they’re false in some models, then they can’t be proven from the axioms of the theory! (If the theory could prove a statement that was false in one of its models, then the model wouldn’t have been a model of the theory in the first place; models are consistent assignments of truth values to the sentences of the theory.)

In other words, no first order theory of the natural numbers can prove that the natural numbers are countable, or that no number is greater than all natural numbers. If you were a being that could only think in first-order sentences, then you would be unable to conclude these basic facts. To say these things, you need to go up a level to second-order logic, and quantify over properties in addition to objects. Even weirder, if you could only think in first-order logic, you wouldn’t even be able to talk about the natural numbers. No matter how hard you tried to say what you meant by the natural numbers, you would always fail to pin down just the natural numbers. There’d always be room for ambiguity, and everything you said and thought could be equally well interpreted as being about some bizarre non-standard model.

Extending this one step further, there’s a theorem in mathematical logic called the Löwenheim-Skolem Theorem. This theorem generalizes what we showed above about the natural numbers: any first order theory that has a model with countably infinite size, also has models with size every cardinality of infinity. So no first order theory can prove a statement that is true of countably infinite sets but false of uncountably infinite sets. And actually, the theorem is even stronger than that: any theory that has a model of any infinite cardinality, must also have models of all infinite cardinalities! So, for instance, any first order theory of the real numbers has a model that is the size of N! To first order logic, there is no expressible difference between the different infinite cardinalities. The statements “this size of this set is countable infinity” or “the size of this set is uncountable infinity” can’t be said in any first-order language, as doing so would cut out the models of other cardinalities, which the Löwenheim-Skolem Theorem tells us can’t happen.

The Central Paradox of Statistical Mechanics: The Problem of The Past

This is the third part in a three-part series on the foundations of statistical mechanics.

  1. The Necessity of Statistical Mechanics for Getting Macro From Micro
  2. Is The Fundamental Postulate of Statistical Mechanics A Priori?
  3. The Central Paradox of Statistical Mechanics: The Problem of The Past

— — —

What I’ve argued for so far is the following set of claims:

  1. To successfully predict the behavior of macroscopic systems, we need something above and beyond the microphysical laws.
  2. This extra thing we need is the fundamental postulate of statistical mechanics, which assigns a uniform distribution over the region of phase space consistent with what you know about the system. This postulate allows us to prove all the things we want to say about the future, such as “gases expand”, “ice cubes melt”, “people age” and so on.
  3. This fundamental postulate is not justifiable on a priori grounds, as it is fundamentally an empirical claim about how frequently different micro states pop up in our universe. Different initial conditions give rise to different such frequencies, so that a claim to a priori access to the fundamental postulate is a claim to a priori access to the precise details of the initial condition of the universe.

 There’s just one problem with all this… apply our postulate to the past, and everything breaks.

 Notice that I said that the fundamental postulate allows us to prove all the things we want to say about the future. That wording was chosen carefully. What happens if you try to apply the microphysical laws + the fundamental postulate to predict the past of some macroscopic system? It turns out that all hell breaks loose. Gases spontaneously contract, ice cubes form from puddles of water, and brains pop out of thermal equilibrium.

 Why does this happen? Very simply, we start with two fully time reversible premises (the microphysical laws and the fundamental postulate). We apply it to present knowledge of some state, the description of which does not specify a special time direction. So any conclusion we get must as a matter of logic be time reversible as well! You can’t start with premises that treat the past as the mirror image of the future, and using just the rules of logical equivalence derive a conclusion that treats the past as fundamentally different from the future. And what this means is that if you conclude that entropy increases towards the future, then you must also conclude that entropy increases towards the past. Which is to say that we came from a higher entropy state, and ultimately (over a long enough time scale and insofar as you think that our universe is headed to thermal equilibrium) from thermal equilibrium.

Let’s flesh this argument out a little more. Consider a half-melted ice cube sitting in the sun. The microphysical laws + the fundamental postulate tell us that the region of phase space consisting of states in which the ice cube is entirely melted is much much much larger than the region of phase space in which it is fully unmelted. So much larger, in fact, that it’s hard to express using ordinary English words. This is why we conclude that any trajectory through phase space that passes through the present state of the system (the half-melted cube) is almost certainly going to quickly move towards the regions of phase space in which the cube is fully melted. But for the exact same reason, if we look at the set of trajectories that pass through the present state of the system, the vast vast vast majority of them will have come from the fully-melted regions of phase space. And what this means is that the inevitable result of our calculation of the ice cube’s history will be that a few moments ago it was a puddle of water, and then it spontaneously solidified and formed into a half-melted ice cube.

This argument generalizes! What’s the most likely past history of you, according to statistical mechanics? It’s not that the solar system coalesced from a haze of gases strewn through space by a past supernova, such that a planet would form in the Goldilocks zone and develop life, which would then gradually evolve through natural selection to the point where you are sitting in whatever room you’re sitting in reading this post. This trajectory through phase space is enormously unlikely. The much much much more likely past trajectory of you through phase space is that a little while ago you were a bunch of particles dispersed through a universe at thermal equilibrium, which happened to spontaneously coalesce into a brain that has time to register a few moments of experience before dissipating back into chaos. “What about all of my memories of the past?” you say. As it happens the most likely explanation of these memories is not that they are veridical copies of real happenings in the universe but illusions, manufactured from randomness.

Basically, if you buy everything I’ve argued in the first two parts, then you are forced to conclude that the universe is most likely near thermal equilibrium, with your current experience of it arising as a spontaneous dip in entropy, just enough to produce a conscious brain but no more. There are at least two big problems with this view.

Problem 1: This conclusion is, we think, extremely empirically wrong! The ice cube in front of you didn’t spontaneously form from a puddle of water, uncracked eggs weren’t a moment ago scrambled, and your memories are to some degree veridical. If you really believe that you are merely a spontaneous dip in entropy, then your prediction for the next minute will be the gradual dissolution of your brain and loss of consciousness. Now, wait a minute and see if this happens. Still here? Good!

Problem 2: The conclusion cannot be simultaneously believed and justified. If you think that you’re a thermal fluctuation, then you shouldn’t credit any of your memories as telling you anything about the world. But then your whole justification to coming to the conclusion in the first place (the experiments that led us to conclude that physics is time-reversible and that the fundamental postulate is true) is undermined! Either you believe it without justification, or you don’t believe despite justification. Said another way, no reflective equilibrium exists at an entropy minimum. David Albert calls this peculiar epistemic state cognitively unstable, as it’s not clear where exactly it should leave you.

Reflect for a moment on how strange of a situation we are in here. Starting from very basic observations of the world, involving its time-reversibility on the micro scale and the increase in entropy of systems, we see that we are inevitably led to the conclusion that we are almost certainly thermal fluctuations, brains popping out of the void. I promise you that no trick has been pulled here, this really is the state of the philosophy of statistical mechanics! The big issue is how to deal with this strange situation.

One approach is to say the following: Our problem is that our predictions work towards the future but not the past. So suppose that we simply add as a new fundamental postulate the proposition that long long ago the universe had an incredibly low entropy. That is, suppose that instead of just starting with the microphysical laws and the fundamental postulate of statistical mechanics, we added a third claims: the Past Hypothesis.

The Past Hypothesis should be understood as an augmentation of our Fundamental Postulate. Taken together, the two postulates say that our probability distribution over possible microstates should not be uniform over phase space. Instead, it should be what you get when you take the uniform distribution, and then condition on the distant past being extremely low entropy. This process of conditioning clearly preferences one direction of time over the other, and so the symmetry is broken.

 It’s worth reflecting for a moment on the strangeness of the epistemic status of the Past Hypothesis. It happens that we have over time accumulated a ton of observational evidence for the occurrence of the Big Bang. But none of this evidence has anything to do with our reasons for accepting the Past Hypothesis. If we buy the whole line of argument so far, our conclusion that something like a Big Bang occurred becomes something that we are forced to believe for deep logical reasons, on pain of cognitive instability and self-undermining belief. Anybody that denies that the Big Bang (or some similar enormously low-entropy past state) occurred has to contend with their view collapsing in self-contradiction upon observing the physical laws!

Is The Fundamental Postulate of Statistical Mechanics A Priori?

This is the second part in a three-part series on the foundations of statistical mechanics.

  1. The Necessity of Statistical Mechanics for Getting Macro From Micro
  2. Is The Fundamental Postulate of Statistical Mechanics A Priori?
  3. The Central Paradox of Statistical Mechanics: The Problem of The Past

— — —

The fantastic empirical success of the fundamental postulate gives us a great amount of assurance that the postulate is good one. But it’s worth asking whether that’s the only reason that we should like this postulate, or if it has some solid a priori justification. The basic principle of “when you’re unsure, just distribute credences evenly over phase space” certainly strikes many people as highly intuitive and justifiable on a priori grounds. But there are some huge problems with this way of thinking, one of which I’ve already hinted at. Here’s a thought experiment that illustrates the problem.

There is a factory in your town that produces cubic boxes. All you know about this factory is that the boxes that they produce all have a volume between 0 m3 and 1 m3. You are going to be delivered a box produced by this factory, and are asked to represent your state of knowledge about the box with a probability distribution. What distribution should you use?

Suppose you say “I should be indifferent over all the possible boxes. So I should have a uniform distribution over the volumes from 0 m3 to 1 m3.” This might seem reasonable at first blush. But what if somebody else said “Yes, you should be indifferent over all the possible boxes, but actually the uniform distribution should be over the side lengths from 0 m to 1 m, not volumes.” This would be a very different probability distribution! For example, if the probability that the side length is greater than .5 m is 50%, then the probability that the volume is greater than (.5)3 = 1/8 is also 50%! Uniform over side length is not the same as uniform over volume (or surface area, for that matter). Now, how do you choose between a uniform distribution over volumes and a uniform distribution over side lengths? After all, you know nothing about the process that the factory is using to produce the boxes, and whether it is based off of volume or side length (or something else); all you know is that all boxes are between 0 m3 and 1 m3.

The lesson of this thought experiment is that the statement we started with (“I should be indifferent over all possible boxes”) was actually not even well-defined. There’s not just one unique measure over a continuous space, and in general the notion that “all possibilities are equally likely” is highly language-dependent.

The exact same applies to phase space, as position and momentum are continuous quantities. Imagine that somebody instead of talking about phase space, only talked about “craze space”, in which all positions become positions cubed, and all momentum values become natural logs of momentum. This space would still contain all possible microstates of your system. What’s more, the fundamental laws of nature could be rewritten in a way that uses only craze space quantities, not phase space quantities. And needless to say, being indifferent over phase space would not be the same as being indifferent over craze space.

Spend enough time looking at attempts to justify a unique interpretation of the statement “All states are equally likely”, when your space of states is a continuous infinity, and you’ll realize that all such attempts are deeply dependent upon arbitrary choices of language. The maximum information entropy probability distribution is afflicted with the exact same problem, because the entropy of your distribution is going to depend on the language you’re using to describe it! The entropy of a distribution in phase space is NOT the same as the entropy of the equivalent distribution transformed to craze space.

Let’s summarize this section. If somebody tells you that the fundamental postulate says that all microstates compatible with what you know about the macroscopic features of your system are equally likely, the proper response is something like “Equally likely? That sounds like you’re talking about a uniform distribution. But uniform over what? Oh, position and momentum? Well, why’d you make that choice?” And if they point out that the laws of physics are expressed in terms of position and momentum, you just disagree and say “No, actually I prefer writing the laws of physics in terms of position cubed and log momentum!” (Substitute in any choice of monotonic functions).

If they object on the grounds of simplicity, point out that position and momentum are only simple as measured from a standpoint that takes them to be the fundamental concepts, and that from your perspective, getting position and momentum requires applying complicated inverse transformations to your monotonic transformation of the chosen coordinates.

And if they object on the grounds of naturalness, the right response is probably something like “Tell me more about this ’naturalness’. How do you know what’s natural or unnatural? It seems to me that your choice of what physical concepts count as natural is a manifestation of deep selection pressures that push any beings whose survival depends on modeling and manipulating their surroundings towards forming an empirically accurate model of the macroscopic world. So that when you say that position is more natural than log(position), what I hear is that the fundamental postulate is a very useful tool. And you can’t use the naturalness of the choice of position to justify the fundamental postulate, when your perception of the naturalness of position is the result of the empirical success of the fundamental postulate!”

In my judgement, none of the a priori arguments work, and fundamentally the reason is that the fundamental postulate is an empirical claim. There’s no a priori principle of rationality that tells us that boxes of gases tend to equilibrate, because you can construct a universe whose initial microstate is such that its entire history is one of entropy radically decreasing, gases concentrating, eggs unscrambling, ice cubes unmelting, and so on. Why is this possible? Because it’s consistent with the microphysical laws that the universe started in an enormously low entropy configuration, so it’s gotta also be consistent with the microphysical laws for the entire universe to spend its entire lifetime decreasing in entropy. The general principle is: If you believe that something is physically possible, then you should believe its time-inverse is possible as well.

Let’s pause and take stock. What I’ve argued for so far is the following set of claims:

  1. To successfully predict the behavior of macroscopic systems, we need something above and beyond the microphysical laws.
  2. This extra thing we need is the fundamental postulate of statistical mechanics, which assigns a uniform distribution over the region of phase space consistent with what you know about the system. This postulate allows us to prove all the things we want to say about the future, such as “gases expand”, “ice cubes melt”, “people age” and so on.
  3. This fundamental postulate is not justifiable on a priori grounds, as it is fundamentally an empirical claim about how frequently different microstates pop up in our universe. Different initial conditions give rise to different such frequencies, so that a claim to a priori access to the fundamental postulate is a claim to a priori access to the precise details of the initial condition of the universe.

There’s just one problem with all this… apply our postulate to the past, and everything breaks.

Up next: Why does statistical mechanics give crazy answers about the past? Where did we go wrong?

The Necessity of Statistical Mechanics for Getting Macro From Micro

This is the first part in a three-part series on the foundations of statistical mechanics.

  1. The Necessity of Statistical Mechanics for Getting Macro From Micro
  2. Is The Fundamental Postulate of Statistical Mechanics A Priori?
  3. The Central Paradox of Statistical Mechanics: The Problem of The Past

— — —

Let’s start this out with a thought experiment. Imagine that you have access to the exact fundamental laws of physics. Suppose further that you have unlimited computing power, for instance, you have an oracle that can instantly complete any computable task. What then do you know about the world?

The tempting answer: Everything! But of course, upon further consideration, you are missing a crucial ingredient: the initial conditions of the universe. The laws themselves aren’t enough to tell you about your universe, as many different universes are compatible with the laws. By specifying the state of the universe at any one time (which incidentally does not have to be an “initial” time), though, you should be able to narrow down this set of compatible universes. So let’s amend our question:

Suppose that you have unlimited computing power, that you know the exact microphysical laws, and that you know the state of the universe at some moment. Then what do you know about the world?

The answer is: It depends! What exactly do you know about the state of the universe? Do you know it’s exact microstate? As in, do you know the position and momentum of every single particle in the universe? If so, then yes, the entire past and future of the universe are accessible to you. But suppose that instead of knowing the exact microstate, you only have access to a macroscopic description of the universe. For example, maybe you have a temperature map as well as a particle density function over the universe. Or perhaps you know the exact states of some particles, just not all of them.

Well, if you only have access to the macrostate of the system (which, notice, is the epistemic situation that we find ourselves in, being that full access to the exact microstate of the universe is as technologically remote as can be), then it should be clear that you can’t specify the exact microstate at all other times. This is nothing too surprising or interesting… starting with imperfect knowledge you will not arrive at perfect knowledge. But we might hope that in the absence of a full description of the microstate of the universe at all other times, you could at least give a detailed macroscopic description of the universe at other times.

That is, here’s what seems like a reasonable expectation: If I had infinite computational power, knew the exact microphysical laws, and knew, say, that a closed box was occupied by a cloud of noninteracting gas in its corner, then I should be able to draw the conclusion that “The gas will disperse.” Or, if I knew that an ice cube was sitting outdoors on a table in the sun, then I should be able to apply my knowledge of microphysics to conclude that “The ice cube will melt”. And we’d hope that in addition to being able to make statements like these, we’d also be able to produce precise predictions for how long it would take for the gas to become uniformly distributed over the box, or for how long it would take for the ice cube to melt.

Here is the interesting and surprising bit. It turns out that this is in principle impossible to do. Just the exact microphysical laws and an infinity of computing power is not enough to do the job! In fact, the microphysical laws will in general tell us almost nothing about the future evolution or past history of macroscopic systems!

Take this in for a moment. You might not believe me (especially if you’re a physicist). For one thing, we don’t know the exact form of the microphysical laws. It would seem that such a bold statement about their insufficiencies would require us to at least first know what they are, right? No, it turns out that the statement that microphysics is is far too weak to tell us about the behavior of macroscopic systems holds for an enormously large class of possible laws of physics, a class that we are very sure that our universe belongs to.

Let’s prove this. We start out with the following observation that will be familiar to physicists: the microphysical laws appear to be time-reversible. That is, it appears to be the case that for every possible evolution of a system compatible with the laws of physics, the time-reverse of that evolution (obtained by simply reversing the trajectories of all particles) is also perfectly compatible with the laws of physics.*

This is surprising! Doesn’t it seem like there are trajectories that are physically possible for particles to take, such that their time reverse is physically impossible? Doesn’t it seem like classical mechanics would say that a ball sitting on the ground couldn’t suddenly bounce up to your hand? An egg unscramble? A gas collect in the corner of a room? The answer to all of the above is no. Classical mechanics, and fundamental physics in general, admits the possibilities of all these things. A fun puzzle for you is to think about why the first example (the ball initially at rest on the ground bouncing up higher and higher until it comes to rest in your hand) is not a violation of the conservation of energy.

Now here’s the argument: Suppose that you have a box that you know is filled with an ideal gas at equilibrium (uniformly spread through the volume). There are many many (infinitely many) microstates that are compatible with this description. We can conclusively say that in 15 minutes the gas will still be dispersed only if all of these microstates, when evolved forward 15 minutes, end up dispersed.

But, and here’s the crucial step, we also know that there exist very peculiar states (such as the macrostate in which all the gas particles have come together to form a perfect statuette of Michael Jackson) such that these states will in 15 minutes evolve to the dispersed state. And by time reversibility, this tells us that there is another perfectly valid history of the gas that starts uniformly dispersed and evolves over 15 minutes into a perfect statuette of Michael Jackson. That is, if we believe that complicated configurations of gases disperse, and believe that physics is time-reversible, then you must also believe that there are microstates compatible with dispersed states of gas that will in the next moment coalesce into some complicated configuration.

  1. A collection of gas shaped exactly like Michael Jackson will disperse uniformly across its container.
  2. Physics is time reversible.
  3. So uniformly dispersed gases can coalesce into a collection of gases shaped exactly like Michael Jackson.

At this point you might be thinking “yeah, sure, microphysics doesn’t in principle rule out the possibility that a uniformly dispersed gas will coalesce into Michael Jackson, or any other crazy configuration. But who cares? It’s so incredibly unlikely!” To which the response is: Yes, exactly, it’s extremely unlikely. But nothing in the microphysical laws says this! Look as hard as you can at the laws of motion, you will not find a probability distribution over the likelihood of the different microstates compatible with a given macrostate. And indeed, different initial conditions of the universe will give different such frequencies distributions! To make any statements about the relative likelihood of some microstates over others, you need some principle above and beyond the microphysical laws.

To summarize. All that microphysics + infinite computing power allows you to say about a macrostate is the following: Here are all the microstates that are compatible with that macrostate, and here are all the past and future histories of each of these microstates. And given time reversibility, these future histories cover an enormously diverse set of predictions about the future, from “the gas will disperse” to “the gas will form into a statuette of Michael Jackson”. To get reasonable predictions about how the world will actually behave, we need some other principle, a principle that allows us to disregard these “perverse” microstates. And microphysics contains no such principle.

Statistical mechanics is thus the study of the necessary augmentation to a fundamental theory of physics that allows us to make predictions about the world, given that we are not in the position to know its exact microstate. This necessary augmentation is known as the fundamental postulate of statistical mechanics, and it takes the form of a probability distribution over microstates. Some people describe the postulate as saying “all microstates being equally likely”, but that phrasing is a big mistake, as the sentence “all states are equally likely” is not well defined over a continuous set of states. (More on that in a bit.) To really understand the fundamental postulate, we have to introduce the notion of phase space.

The phase space for a system is a mathematical space in which every point represents a full specification of the positions and momenta of all particles in the system. So, for example, a system consisting of 1000 classical particles swimming around in an infinite universe would have 6000 degrees of freedom (three position coordinates and three momentum coordinates per particle). Each of these degrees of freedom is isomorphic to the real numbers. So phase space for this system must be 6000, and a point in phase space is a specification of the values of all 6000 degrees of freedom. In general, for N classical particles, phase space is 6N.

With the concept of phase space in hand, we can define the fundamental postulate of statistical mechanics. This is: the probability distribution over microstates compatible with a given macrostate is uniform over the corresponding volume of phase space.

It turns out that if you just measure the volume of the “perverse states” in phase space, you end up finding that it composes approximately 0% of the volume of compatible microstates in phase space. This of course allows us to say of perverse states, “Sure they’re there, and technically it’s possible that my system is in such a state, but it’s so incredibly unlikely that it makes virtually no impact on my prediction of the future behavior of my system.” And indeed, when you start going through the math and seeing the way that systems most likely evolve given the fundamental postulate, you see that the predictions you get match beautifully with our observations of nature.

Next time: What is the epistemic status of the fundamental postulate? Do we have good a priori reasons to believe it?

— — —

* There are some subtleties here. For one, we think that there actually is a very small time asymmetry in the weak nuclear force. And some collapse interpretations of quantum mechanics have the collapse of the wave function as an irreversible process, although Everettian quantum mechanics denies this. For the moment, let’s disregard all of that. The time asymmetry in the weak nuclear force is not going to have any relevant effect on the proof made here, besides making it uglier and more complicated. What we need is technically not exact time-reversibility, but very-approximate time-reversibility. And that we have. Collapsing wave functions are a more troubling business, and are a genuine way out of the argument made in this post.

A Cognitive Instability Puzzle, Part 2

This is a follow of this previous post, in which I present three unusual cases of belief updating. Read it before you read this.

I find these cases very puzzling, and I don’t have a definite conclusion for any of them. They share some deep similarities. Let’s break all of them down into their basic logical structure:

Joe
Joe initially believes in classical logic and is certain of some other stuff, call it X.
An argument A exists that concludes that X can’t be true if classical logic is true.
If Joe believes classical logic, then he believes A.
If Joe believes intuitionist logic, then he doesn’t believe A.

Karl
Karl initially believes in God and is certain of some other stuff about evil, call it E.
An argument A exists that concludes that God can’t exist if E is true.
If Karl believes in God, then he believes A.
If Karl doesn’t believe in God, then he doesn’t believe A.

Tommy
Tommy initially believes in her brain’s reliability and is certain of some other stuff about her experiences, call it Q.
An argument A exists that concludes that hat her brain can’t be reliable if Q is true.
If Tommy believes in her brain’s reliability, then she believes A.
If Tommy doesn’t believe in her brain’s reliability, then she doesn’t believe A.

First of all, note that all three of these cases are ones in which Bayesian reasoning won’t work. Joe is uncertain about the law of the excluded middle, without which you don’t have probability theory. Karl is uncertain about the meaning of the term ‘evil’, such that the same proposition switches from being truth-apt to being meaningless when he updates his beliefs. Probability theory doesn’t accommodate such variability in its language. And Tommy is entertaining a hypothesis according to which she no longer accepts any deductive or inductive logic, which is inconsistent with Bayesianism in an even more extreme way than Joe.

The more important general theme is that in all three cases, the following two things are true: 1) If an agent believes A, then they also believe an argument that concludes -A. 2) If that agent believes -A, then they don’t believe the argument that concludes -A.

Notice that if an agent initially doesn’t believe A, then they have no problem. They believe -A, and also happen to not believe that specific argument concluding -A, and that’s fine! There’s no instability or self-contradiction there whatsoever. So that’s really not where the issue lies.

The mystery is the following: If the only reason that an agent changed their mind from A to -A is the argument that they no longer buy, then what should they do? Once they’ve adopted the stance that A is false, should they stay there, reasoning that if they accept A they will be led to a contradiction? Or should they jump back to A, reasoning that the initial argument that led them there was flawed?

Said another way, should they evaluate the argument against A from their own standards, or from A’s standards? If they use their own standards, then they are in an unstable position, where they jump back and forth between A and -A. And if they always use A’s standards… well, then we get the conclusion that Tommy should believe herself to be a Boltzmann brain. In addition, if they are asked why they don’t believe A, then they find themselves in the weird position of giving an explanation in terms of an argument that they believe to be false!

I find myself believing that either Joe should be an intuitionist, Karl an atheist, and Tommy a radical skeptic, OR Joe a classical-logician, Karl a theist, and Tommy a reliability-of-brain-believer-in. That is, it seems like there aren’t any significant enough disanalogies between these three cases to warrant concluding one thing in one case and then going the other direction in another.

Logic, Theism, and Boltzmann Brains: On Cognitively Unstable Beliefs

First case

Propositional logic accepts that the proposition A-A is necessarily true. This is called the law of the excluded middle. Intuitionist logic differs in that it denies this axiom.

Suppose that Joe is a believer in propositional logic (but also reserves some credence for intuitionist logic). Joe also believes a set of other propositions, whose conjunction we’ll call X, and has total certainty in X.

One day Joe discovers that a contradiction can be derived from X, in a proof that uses the law of the excluded middle. Since Joe is certain that X is true, he knows that X isn’t the problem, and instead it must be the law of the excluded middle. So Joe rejects the law of the excluded middle and becomes an intuitionist.

The problem is, as an intuitionist, Joe now no longer accepts the validity of the argument that starts at X and concludes -X! Why? Because it uses the law of the excluded middle, which he doesn’t accept.

Should Joe believe in propositional logic or intuitionism?

Second case

Karl is a theist. He isn’t absolutely certain that theism is correct, but holds a majority of his credence in theism (and the rest in atheism). Karl is also 100% certain in the following claim: “If atheism is true, then the concept of ‘evil’ is meaningless”, and believes that logically valid arguments cannot be made using meaningless concepts.

One day somebody presents the problem of evil to Karl, and he sees it as a crushing objection to theism. He realizes that theism, plus some other beliefs about evil that he’s 100% confident in, leads to a contradiction. So since he can’t deny these other beliefs, he is led to atheism.

The problem is, as an atheist, Karl no longer accepts the validity of the argument that starts at theism and concludes atheism! Why? Because the arguments rely on using the concept of ‘evil’, and he is now certain that this concept is meaningless, and thus cannot be used in logically valid arguments.

Should Karl be a theist or an atheist?

Third case

Tommy is a scientist, and she believes that her brain is reliable. By this, I mean that she trusts her ability to reason both deductively and inductively. However, she isn’t totally certain about this, and holds out a little credence for radical skepticism. She is also totally certain about the content of her experiences, though not its interpretation (i.e. if she sees red, she is 100% confident that she is experiencing red, although she isn’t necessarily certain about what in the external world is causing the experience).

One day Tommy discovers that reasoning deductively and inductively from her experiences leads her to a model of the world that entails that her brain is actually a quantum fluctuation blipping into existence outside the event hole of a black hole. She realizes that this means that with overwhelmingly high probability, her brain is not reliable and is just producing random noise uncorrelated with reality.

The problem is, if Tommy believes that her brain is not reliable, then she can no longer accept the validity of the argument that led her to this position! Why? Well, she no longer trusts her ability to reason deductively or inductively. So she can’t accept any argument, let alone this particular one.

What should Tommy believe?

— — —

How are these three cases similar and different? If you think that Joe should be an intuitionist, or Karl an atheist, then should Tommy believe herself to be a black hole brain? Because it turns out that many cosmologists have found themselves to be in a situation analogous to Case 3! (Link.) I have my own thoughts on this, but I won’t share them for now.

Philosophers of religion are religious. Why?

In 2009, David Chalmers organized a massive survey of over 3000 professional philosophers, grad students, and undergrads, asking them questions about all things philosophical and compiling the results. The results are broken down by area of specialization, age, race, gender, and everything else you might be interested in.

Here’s a link to the paper, and here to a listing of all survey results.

This is basically my favorite philosophy paper to read, and I find myself going back to look at the results all the time. I’d love to see an updated version of this survey, done ten years later, to see how things have changed (if at all).

There’s a whole lot I could talk about regarding this paper, but today I’ll just focus on one really striking result. Take a look at the following table from the paper:

Screen Shot 2019-09-11 at 1.36.13 PM.png

What’s shown is the questions which were answered most differently by specialists and non-specialists. At the very top of the list, with a discrepancy more than double the second highest, is the question of God’s existence. 86.78% of non-specialists said yes to atheism, and by contrast only 20.87% of philosophers of religion said yes to atheism. This is fascinating to me.

Here are two narratives one could construct to make sense of these results.

Narrative One

Philosophers that specialize in philosophy of religion probably select that specialization because they have a religious bias. A philosophically-minded devout Catholic is much more likely to go into philosophy of religion than, say, philosophy of language. And similarly, an atheistic philosopher would have less interest in studying philosophy of religion, being that they don’t even believe in the existence of the primary object of study, than a religious philosopher. So the result of the survey is exactly what you’d expect by the selection bias inherent in the specialization.

Narrative Two

Philosophers, like everybody else, are vulnerable to a presumption in favor of the beliefs of their society. Academics in general are quite secular, and in many quarters religion is treated as a product of a bygone age. So it’s only natural that philosophers that haven’t looked too deeply into the issue come out believing basically what the high-status individuals in their social class believe. But philosophers of religion, on the other hand, are those that have actually looked most closely and carefully at the arguments for and against atheism, and this gives them the ability to transcend their cultural bias and recognize the truth of religion.

As an atheist, it’s perhaps not surprising that my immediate reaction to seeing this result was something like Narrative One. And upon reflection, that still seems like the more likely explanation to me. But to a religious person, I’m sure that Narrative Two would seem like the obvious explanation. This, by the way, is what should happen from a Bayesian perspective. If two theories equally well explain some data, then the one with a higher prior should receive a larger credence bump than the one with a lower prior (although their odds ratio should stay fixed).

Ultimately, which of these stories is right? I don’t know. Perhaps both are right to some degree. But I think that it illustrates the difficulty of adjudicating expertise questions. Accusations of bias are quite easy to make, and can be hard to actually get to the bottom of. That said, it’s definitely possible to evaluate the first narrative, just by empirically looking at the reasons that philosophers of religion entered the field. If somebody knows of such a study, comment it or send me a message please! The results of a study like this could end up having a huge effect on my attitude towards questions of religion’s rationality.

Imagine that it turned out that most philosophers of religion were atheists when they entered the field, and only became religious after diving deep into the arguments. This is not what I’d expect to find, but if it was the case, it would serve as a super powerful argument against atheism for me.

Why do prediction markets work?

Is there a paradox in the continued existence of prediction markets? Recently I’ve been wondering this. Let me start with a little background for those that are unfamiliar with the concept of prediction markets.

Prediction markets are markets that allow you to bet on the outcomes of real-life events. This gives financial incentives to predict accurately, and as such the market price of a given bet reflects a kind of aggregate credence for that event occurring. There’s a whole bunch of results, theoretical and applied, that indicate that prediction markets serve to give robustly accurate probability estimates for real-world events.

Here’s a great paper by Robin Hanson about a political system based on prediction markets, named futarchy. Essentially, the idea is that voters determine a nation’s values, so as to generate some average national welfare metric, and then betting markets are used to decide policy. Some quotes:

On info-failures as a primary problem for democracy

According to many experts in economics and development, governments often choose policies that are “inefficient” in the sense that most everyone could expect to gain from other feasible policies. Many other kinds of experts also see existing policies as often clearly inferior to known alternatives.

If inferior policies would not have been adopted had most everyone known they are inferior, and if someone somewhere knew or could have learned that they are inferior, then we can blame inferior policies on a failure of our “info” institutions. By “info” here I just mean clues and analysis that should change our beliefs. Our info institutions are those within which we induce, express, and evaluate the acquiring and sharing of info. They include public relations teams, organized interest groups, news media, conversation forums, think tanks, universities, journals, elite committees, and state agencies. Inferior policies happen because our info institutions fail to induce people to acquire and share relevant info with properly-motivated decision makers.

[…]

Where might we find better info institutions? According to most experts in economics and finance, speculative markets are exemplary info institutions. That is, active speculative markets do very well at inducing people to acquire info, share it via trades, and collect that info into consensus prices that persuade wider audiences. This great success suggests that we should consider augmenting our political info institutions with speculative market institutions. That is, perhaps we should encourage people to create, trade in, and heed policy-relevant speculative markets, instead of discouraging such markets as we do today via anti-gambling laws.

Laying out the proposal

In futarchy, democracy would continue to say what we want, but betting markets would now say how to get it. That is, elected representatives would formally define and manage an after-the-fact measurement of national welfare, while market speculators would say which policies they expect to raise national welfare. The basic rule of government would be:

    • When a betting market clearly estimates that a proposed policy would increase expected national welfare, that proposal becomes law.

Futarchy is intended to be ideologically neutral; it could result in anything from an extreme socialism to an extreme minarchy, depending on what voters say they want, and on what speculators think would get it for them.

Futarchy seems promising if we accept the following three assumptions:

  • Democracies fail largely by not aggregating available information.
  • It is not that hard to tell rich happy nations from poor miserable ones.
  • Betting markets are our best known institution for aggregating information.

On the success of prediction markets

Betting markets, and speculative markets more generally, seem to do very well at aggregating information. To have a say in a speculative market, you have to “put your money where your mouth is.” Those who know they are not relevant experts shut up, and those who do not know this eventually lose their money, and then shut up. Speculative markets in essence offer to pay anyone who sees a bias in current market prices to come and correct that bias.

Speculative market estimates are not perfect. There seems to be a long-shot bias when there are high transaction costs, and perhaps also excess volatility in long term aggregate price movements. But such markets seem to do very well when compared to other institutions. For example, racetrack market odds improve on the predictions of racetrack experts, Florida orange juice commodity futures improve on government weather forecasts, betting markets beat opinion polls at predicting U.S. election results, and betting markets consistently beat Hewlett Packard official forecasts at predicting Hewlett Packard printer sales. In general, it is hard to find information that is not embodied in market prices.

On the possibility of manipulation of prediction markets

We want policy-related info institutions to resist manipulation, that is, to resist attempts to influence policy via distorted participation. Speculative markets do well here because they deal well with “noise trading,” that is, trading for reasons other than info about common asset values. When other traders can’t predict noise trading exactly, they compensate for its expected average by an opposite average trade, and compensate for its expected variation by trading more, and by working harder to find relevant info. Theory says that if trader risk-aversion is mild, and if more effort gives more info, then increased noise trading increases price accuracy. And in fact, the most accurate real speculative markets tend to be those with the most noise trading.

What do noise traders have to do with manipulators? Manipulators, who trade hoping to distort prices, are noise traders, since they trade for reasons other than asset value info. Thus adding manipulators to speculative markets doesn’t reduce average price accuracy. This has been verified in theory, in laboratory experiments, and in the field.

Futarchy remains for me one of the coolest and most exciting ideas I’ve heard in political philosophy, and prediction markets fascinate me. But for today, I have the following question about their feasibility:

If the only individuals that are able to consistently profit off the prediction market are the best predictors, then why wouldn’t the bottom 50% of predictors continuously drop out as they lose money on the market? If so, then as the population of market participants dwindles you would end up with a small fraction of really good predictors, each of whom sometimes gets lucky and makes money and sometimes is unlucky and loses some. On average, these people won’t be able to make money any more (as the ability to make money relies on the participation of inferior predictors in the market), so they’ll drop out as well.

If this line of reasoning is right, then it seems like prediction markets should inevitably collapse as their user base drops out. Why, then, do sites like PredictIt keep functioning?

One possibility is that there’s something wrong with the argument. This is honestly where most of my credence lies; tons of smart people endorse the idea, and this seems like a fairly obviously central flaw in the concept for them all to miss. If this argument isn’t wrong, though, then we have an interesting phenomenon to explain.

One explanation that came to my mind is that the continued survival of prediction markets is only possible because of a bug in human psychology, namely, a lack of epistemic humility. People are on average overly confident in their beliefs, and so uninformed people will continue confidently betting on propositions, even when they are generally betting against individuals with greater expertise.

Is this really what’s going on? I’m not sure. I would be surprised if humans were actually overconfident enough to continue betting on a market that they are consistently losing money on. Maybe they’d find some way to rationalize dropping out of the market that doesn’t amount to them admitting “My opinion is not worth as much as I thought it was”, but surely they would eventually stop betting after enough losses (putting aside whatever impulses drive people to gamble on guaranteed negative-expected-value games until they lose all their money.) On the other hand, it could be that the traffic of less-informed individuals does not consist of the same individuals betting over and over, and instead a constant crowd of new sheep coming in to be exploited by those more knowledgeable. What do you think? How do you explain this?

A Talmudic Probability Puzzle

Today we’ll take a little break from the more intense abstract math stuff I’ve been doing, and do a quick dive into a fun probabilistic puzzle I found on the internet.

Background for the puzzle: In Ancient Israel, there was a court of 23 wise men that tried important cases, known as the Sanhedrin. If you were being tried for a crime by the Sanhedrin and a majority of them found you guilty, you were convicted. But there was an interesting twist on this! According to the Talmud (Tractate Sanhedrin: Folio 17a), if the Sanhedrin unanimously found you guilty, you were to be acquitted.

If the Sanhedrin unanimously find [the accused] guilty, he is acquitted. Why? — Because we have learned by tradition that sentence must be postponed till the morrow in hope of finding new points in favour of the defence. But this cannot be anticipated in this case.

Putting aside the dubious logic of this rule, it gives rise to an interesting probability puzzle with a counterintuitive answer. Imagine that an accused murderer has been brought before the Sanhedrin, and that the evidence is strong enough that no judge has any doubt in their mind about his guilt. Each judge obviously wants for the murderer to be convicted, and would ordinarily vote to convict. But under this Talmudic rule, they need to be worried about the prospect of them all voting guilty and therefore letting him off scot-free!

So: If a probability p can be chosen such that each and every judge votes to convict with probability p, and to acquit with probability 1 – p, which p will give them the highest probability of ultimately convicting the guilty man?

Furthermore, imagine that the number of judges is not 23, but some arbitrarily high number. As the number of judges goes to infinity, what does p converge to?

I want you to think about this for a minute and test your intuitions before moving on.

 

(…)

 

 

(…)

 

 

(…)

 

So, it turns out that the optimal p for 23 judges is actually ≈ 75.3%. And as the number of judges goes to infinity? The optimal value of p converges to…

80%!

This was a big shock to me. I think the natural first thought is that when you have thousands and thousands of judges, you only need a minuscule chance for any one judge to vote ‘acquit’ in order to ensure a majority and prevent him from getting off free. So I initially guessed that p would be something like 99%, and would converge to 100% in the limit of infinite judges.

But this is wrong! And of the small sample of mathematically gifted friends I asked this question to, they mostly guessed the same as me.

There’s clearly a balance going on between the risk of a minority voting to convict and the risk of a unanimous vote to convict. For small p, the first of these is ~1 and the second is ~0, and for p ~ 1, the first is ~0 and the second ~1.  It seems that we are naturally underemphasizing the danger of a minority vote to convict, and overemphasizing the danger of the unanimous vote.

Here are some plots of the various relevant values, for different numbers of judges:

10 Judges23 Judges100 Judges150 Judges

One thing to notice is that as the number of judges gets larger, the graph’s peak becomes more and more of a plateau. And in the limit of infinite judges, you can show that the graph is actually just a simple step function: Pr(conviction) = 0 if p < .5, and 1 if p > .5. This means that while yes, technically, 80% is the optimal value, you can do pretty much equally well by choosing any value of p greater than 50%.

My challenge to you is to come up with some justification for the value 80%. Good luck!