Thinking in first order: Are the natural numbers countable?

September 30, 2019September 30, 2019 ~ ~ 1 Comment

Here’s a weird fact about first-order logic: there is no first-order theory that has as a model the natural numbers and only the natural numbers. Any first-order theory talking about N can only say things that are also consistent with weird other systems that include uncountable infinities of nonstandard numbers. Here’s the proof:

Take first order Peano arithmetic (PA). We know that the natural numbers N are a model of PA, because all the axioms are true of PA. Now add a constant c to your language, and adjoin an infinite axiom schema: ‘c > 0’, ‘c > S0’, ‘c > SS0’, and so on. Call this new theory PA(c).

The natural numbers are clearly no longer a model of PA(c), because PA(c) says that there’s a number that’s larger than all natural numbers, which is false in N. But we can prove that PA(c) still has a model! We prove this using the Compactness Theorem, which tells us that a theory has a model if every finite subset of its axioms has a model. Consider any finite subset of the axioms of PA(c). For any such subset, there are only finitely many statements that look like c > X. So there are only finitely many numbers that c is larger than. But then this is always going to be satisfied by N! For any collection of natural numbers, you can always find a natural number larger than all of then. So N is a model of every finite subset of the axioms of PA(c). But then, by compactness, since every finite subset of the axioms of PA(c) has a model, PA(c) has a model!

Final step in the proof: PA(c) has a model, and this model is definitely not the natural numbers. Call it a nonstandard model. But now note that PA(c) was obtained from PA just by adding axioms. Adding axioms never increases the number of models, it only narrows them down. So if the nonstandard model is a model of PA(c), then it also must be a model of PA! And there it is, we’ve proved that first order Peano arithmetic has more models than the natural numbers.

Notice, also, that it didn’t matter that we started with first order Peano arithmetic. We could have started with any first order theory that has N as a model, added the constant c, and adjoined on the infinite axiom schema corresponding to c being larger than all natural numbers! So what we’ve actually proven is that no first order theory can “pin down” the natural numbers. Any first order theory that has N as a model, also has weird extra models that allow numbers greater than all natural numbers. Furthermore, you can then run the exact same type of proof again, adjoining a new constant that’s larger than all natural numbers and also larger than c, and by compactness show that it has a model, so that you’ve now found that any first order theory that has N as a model also has a model that has the natural numbers, plus a number larger than all natural numbers, plus a number that’s larger than even that. And you can do this an uncountable infinity of times, and end up proving that if your favored theory has N as a model, then it also has a model whose size is an uncountable infinity. And in fact, any first order theory that has N as a model, also has models of every infinite cardinality.

This is super wild, and really important. Why does it matter that we can’t pin down the natural numbers using any first order theory? Well, think about the set of statements that are true of N, but false of these nonstandard models. These include “there is no number larger than all natural numbers” and “There are a countable infinity of natural numbers”. These statements are not true in all models of any first order theory of N. And if they’re false in some models, then they can’t be proven from the axioms of the theory! (If the theory could prove a statement that was false in one of its models, then the model wouldn’t have been a model of the theory in the first place; models are consistent assignments of truth values to the sentences of the theory.)

In other words, no first order theory of the natural numbers can prove that the natural numbers are countable, or that no number is greater than all natural numbers. If you were a being that could only think in first-order sentences, then you would be unable to conclude these basic facts. To say these things, you need to go up a level to second-order logic, and quantify over properties in addition to objects. Even weirder, if you could only think in first-order logic, you wouldn’t even be able to talk about the natural numbers. No matter how hard you tried to say what you meant by the natural numbers, you would always fail to pin down just the natural numbers. There’d always be room for ambiguity, and everything you said and thought could be equally well interpreted as being about some bizarre non-standard model.

Extending this one step further, there’s a theorem in mathematical logic called the Löwenheim-Skolem Theorem. This theorem generalizes what we showed above about the natural numbers: any first order theory that has a model with countably infinite size, also has models with size every cardinality of infinity. So no first order theory can prove a statement that is true of countably infinite sets but false of uncountably infinite sets. And actually, the theorem is even stronger than that: any theory that has a model of any infinite cardinality, must also have models of all infinite cardinalities! So, for instance, any first order theory of the real numbers has a model that is the size of N! To first order logic, there is no expressible difference between the different infinite cardinalities. The statements “this size of this set is countable infinity” or “the size of this set is uncountable infinity” can’t be said in any first-order language, as doing so would cut out the models of other cardinalities, which the Löwenheim-Skolem Theorem tells us can’t happen.

The Central Paradox of Statistical Mechanics: The Problem of The Past

September 28, 2019April 16, 2026 ~ ~ 2 Comments

This is the third part in a three-part series on the foundations of statistical mechanics.

The Necessity of Statistical Mechanics for Getting Macro From Micro
Is The Fundamental Postulate of Statistical Mechanics A Priori?
The Central Paradox of Statistical Mechanics: The Problem of The Past

— — —

What I’ve argued for so far is the following set of claims:

To successfully predict the behavior of macroscopic systems, we need something above and beyond the microphysical laws.
This extra thing we need is the fundamental postulate of statistical mechanics, which assigns a uniform distribution over the region of phase space consistent with what you know about the system. This postulate allows us to prove all the things we want to say about the future, such as “gases expand”, “ice cubes melt”, “people age” and so on.
This fundamental postulate is not justifiable on a priori grounds, as it is fundamentally an empirical claim about how frequently different micro states pop up in our universe. Different initial conditions give rise to different such frequencies, so that a claim to a priori access to the fundamental postulate is a claim to a priori access to the precise details of the initial condition of the universe.

There’s just one problem with all this… apply our postulate to the past, and everything breaks.

Notice that I said that the fundamental postulate allows us to prove all the things we want to say about the future. That wording was chosen carefully. What happens if you try to apply the microphysical laws + the fundamental postulate to predict the past of some macroscopic system? It turns out that all hell breaks loose. Gases spontaneously contract, ice cubes form from puddles of water, and brains pop out of thermal equilibrium.

Why does this happen? Very simply, we start with two fully time reversible premises (the microphysical laws and the fundamental postulate). We apply it to present knowledge of some state, the description of which does not specify a special time direction. So any conclusion we get must as a matter of logic be time reversible as well! You can’t start with premises that treat the past as the mirror image of the future, and using just the rules of logical equivalence derive a conclusion that treats the past as fundamentally different from the future. And what this means is that if you conclude that entropy increases towards the future, then you must also conclude that entropy increases towards the past. Which is to say that we came from a higher entropy state, and ultimately (over a long enough time scale and insofar as you think that our universe is headed to thermal equilibrium) from thermal equilibrium.

Let’s flesh this argument out a little more. Consider a half-melted ice cube sitting in the sun. The microphysical laws + the fundamental postulate tell us that the region of phase space consisting of states in which the ice cube is entirely melted is much much much larger than the region of phase space in which it is fully unmelted. So much larger, in fact, that it’s hard to express using ordinary English words. This is why we conclude that any trajectory through phase space that passes through the present state of the system (the half-melted cube) is almost certainly going to quickly move towards the regions of phase space in which the cube is fully melted. But for the exact same reason, if we look at the set of trajectories that pass through the present state of the system, the vast vast vast majority of them will have come from the fully-melted regions of phase space. And what this means is that the inevitable result of our calculation of the ice cube’s history will be that a few moments ago it was a puddle of water, and then it spontaneously solidified and formed into a half-melted ice cube.

This argument generalizes! What’s the most likely past history of you, according to statistical mechanics? It’s not that the solar system coalesced from a haze of gases strewn through space by a past supernova, such that a planet would form in the Goldilocks zone and develop life, which would then gradually evolve through natural selection to the point where you are sitting in whatever room you’re sitting in reading this post. This trajectory through phase space is enormously unlikely. The much much much more likely past trajectory of you through phase space is that a little while ago you were a bunch of particles dispersed through a universe at thermal equilibrium, which happened to spontaneously coalesce into a brain that has time to register a few moments of experience before dissipating back into chaos. “What about all of my memories of the past?” you say. As it happens the most likely explanation of these memories is not that they are veridical copies of real happenings in the universe but illusions, manufactured from randomness.

Basically, if you buy everything I’ve argued in the first two parts, then you are forced to conclude that the universe is most likely near thermal equilibrium, with your current experience of it arising as a spontaneous dip in entropy, just enough to produce a conscious brain but no more. There are at least two big problems with this view.

Problem 1: This conclusion is, we think, extremely empirically wrong! The ice cube in front of you didn’t spontaneously form from a puddle of water, uncracked eggs weren’t a moment ago scrambled, and your memories are to some degree veridical. If you really believe that you are merely a spontaneous dip in entropy, then your prediction for the next minute will be the gradual dissolution of your brain and loss of consciousness. Now, wait a minute and see if this happens. Still here? Good!

Problem 2: The conclusion cannot be simultaneously believed and justified. If you think that you’re a thermal fluctuation, then you shouldn’t credit any of your memories as telling you anything about the world. But then your whole justification to coming to the conclusion in the first place (the experiments that led us to conclude that physics is time-reversible and that the fundamental postulate is true) is undermined! Either you believe it without justification, or you don’t believe despite justification. Said another way, no reflective equilibrium exists at an entropy minimum. David Albert calls this peculiar epistemic state cognitively unstable, as it’s not clear where exactly it should leave you.

Reflect for a moment on how strange of a situation we are in here. Starting from very basic observations of the world, involving its time-reversibility on the micro scale and the increase in entropy of systems, we see that we are inevitably led to the conclusion that we are almost certainly thermal fluctuations, brains popping out of the void. I promise you that no trick has been pulled here, this really is the state of the philosophy of statistical mechanics! The big issue is how to deal with this strange situation.

One approach is to say the following: Our problem is that our predictions work towards the future but not the past. So suppose that we simply add as a new fundamental postulate the proposition that long long ago the universe had an incredibly low entropy. That is, suppose that instead of just starting with the microphysical laws and the fundamental postulate of statistical mechanics, we added a third claims: the Past Hypothesis.

The Past Hypothesis should be understood as an augmentation of our Fundamental Postulate. Taken together, the two postulates say that our probability distribution over possible microstates should not be uniform over phase space. Instead, it should be what you get when you take the uniform distribution, and then condition on the distant past being extremely low entropy. This process of conditioning clearly preferences one direction of time over the other, and so the symmetry is broken.

It’s worth reflecting for a moment on the strangeness of the epistemic status of the Past Hypothesis. It happens that we have over time accumulated a ton of observational evidence for the occurrence of the Big Bang. But none of this evidence has anything to do with our reasons for accepting the Past Hypothesis. If we buy the whole line of argument so far, our conclusion that something like a Big Bang occurred becomes something that we are forced to believe for deep logical reasons, on pain of cognitive instability and self-undermining belief. Anybody that denies that the Big Bang (or some similar enormously low-entropy past state) occurred has to contend with their view collapsing in self-contradiction upon observing the physical laws!

Is The Fundamental Postulate of Statistical Mechanics A Priori?

September 24, 2019April 16, 2026 ~ ~ 6 Comments

This is the second part in a three-part series on the foundations of statistical mechanics.

The Necessity of Statistical Mechanics for Getting Macro From Micro
Is The Fundamental Postulate of Statistical Mechanics A Priori?
The Central Paradox of Statistical Mechanics: The Problem of The Past

— — —

The fantastic empirical success of the fundamental postulate gives us a great amount of assurance that the postulate is good one. But it’s worth asking whether that’s the only reason that we should like this postulate, or if it has some solid a priori justification. The basic principle of “when you’re unsure, just distribute credences evenly over phase space” certainly strikes many people as highly intuitive and justifiable on a priori grounds. But there are some huge problems with this way of thinking, one of which I’ve already hinted at. Here’s a thought experiment that illustrates the problem.

There is a factory in your town that produces cubic boxes. All you know about this factory is that the boxes that they produce all have a volume between 0 m³ and 1 m³. You are going to be delivered a box produced by this factory, and are asked to represent your state of knowledge about the box with a probability distribution. What distribution should you use?

Suppose you say “I should be indifferent over all the possible boxes. So I should have a uniform distribution over the volumes from 0 m³ to 1 m³.” This might seem reasonable at first blush. But what if somebody else said “Yes, you should be indifferent over all the possible boxes, but actually the uniform distribution should be over the side lengths from 0 m to 1 m, not volumes.” This would be a very different probability distribution! For example, if the probability that the side length is greater than .5 m is 50%, then the probability that the volume is greater than (.5)³ = 1/8 is also 50%! Uniform over side length is not the same as uniform over volume (or surface area, for that matter). Now, how do you choose between a uniform distribution over volumes and a uniform distribution over side lengths? After all, you know nothing about the process that the factory is using to produce the boxes, and whether it is based off of volume or side length (or something else); all you know is that all boxes are between 0 m³ and 1 m³.

The lesson of this thought experiment is that the statement we started with (“I should be indifferent over all possible boxes”) was actually not even well-defined. There’s not just one unique measure over a continuous space, and in general the notion that “all possibilities are equally likely” is highly language-dependent.

The exact same applies to phase space, as position and momentum are continuous quantities. Imagine that somebody instead of talking about phase space, only talked about “craze space”, in which all positions become positions cubed, and all momentum values become natural logs of momentum. This space would still contain all possible microstates of your system. What’s more, the fundamental laws of nature could be rewritten in a way that uses only craze space quantities, not phase space quantities. And needless to say, being indifferent over phase space would not be the same as being indifferent over craze space.

Spend enough time looking at attempts to justify a unique interpretation of the statement “All states are equally likely”, when your space of states is a continuous infinity, and you’ll realize that all such attempts are deeply dependent upon arbitrary choices of language. The maximum information entropy probability distribution is afflicted with the exact same problem, because the entropy of your distribution is going to depend on the language you’re using to describe it! The entropy of a distribution in phase space is NOT the same as the entropy of the equivalent distribution transformed to craze space.

Let’s summarize this section. If somebody tells you that the fundamental postulate says that all microstates compatible with what you know about the macroscopic features of your system are equally likely, the proper response is something like “Equally likely? That sounds like you’re talking about a uniform distribution. But uniform over what? Oh, position and momentum? Well, why’d you make that choice?” And if they point out that the laws of physics are expressed in terms of position and momentum, you just disagree and say “No, actually I prefer writing the laws of physics in terms of position cubed and log momentum!” (Substitute in any choice of monotonic functions).

If they object on the grounds of simplicity, point out that position and momentum are only simple as measured from a standpoint that takes them to be the fundamental concepts, and that from your perspective, getting position and momentum requires applying complicated inverse transformations to your monotonic transformation of the chosen coordinates.

And if they object on the grounds of naturalness, the right response is probably something like “Tell me more about this ’naturalness’. How do you know what’s natural or unnatural? It seems to me that your choice of what physical concepts count as natural is a manifestation of deep selection pressures that push any beings whose survival depends on modeling and manipulating their surroundings towards forming an empirically accurate model of the macroscopic world. So that when you say that position is more natural than log(position), what I hear is that the fundamental postulate is a very useful tool. And you can’t use the naturalness of the choice of position to justify the fundamental postulate, when your perception of the naturalness of position is the result of the empirical success of the fundamental postulate!”

In my judgement, none of the a priori arguments work, and fundamentally the reason is that the fundamental postulate is an empirical claim. There’s no a priori principle of rationality that tells us that boxes of gases tend to equilibrate, because you can construct a universe whose initial microstate is such that its entire history is one of entropy radically decreasing, gases concentrating, eggs unscrambling, ice cubes unmelting, and so on. Why is this possible? Because it’s consistent with the microphysical laws that the universe started in an enormously low entropy configuration, so it’s gotta also be consistent with the microphysical laws for the entire universe to spend its entire lifetime decreasing in entropy. The general principle is: If you believe that something is physically possible, then you should believe its time-inverse is possible as well.

Let’s pause and take stock. What I’ve argued for so far is the following set of claims:

To successfully predict the behavior of macroscopic systems, we need something above and beyond the microphysical laws.
This extra thing we need is the fundamental postulate of statistical mechanics, which assigns a uniform distribution over the region of phase space consistent with what you know about the system. This postulate allows us to prove all the things we want to say about the future, such as “gases expand”, “ice cubes melt”, “people age” and so on.
This fundamental postulate is not justifiable on a priori grounds, as it is fundamentally an empirical claim about how frequently different microstates pop up in our universe. Different initial conditions give rise to different such frequencies, so that a claim to a priori access to the fundamental postulate is a claim to a priori access to the precise details of the initial condition of the universe.

There’s just one problem with all this… apply our postulate to the past, and everything breaks.

Up next: Why does statistical mechanics give crazy answers about the past? Where did we go wrong?

The Necessity of Statistical Mechanics for Getting Macro From Micro

September 23, 2019April 16, 2026 ~ ~ 2 Comments

This is the first part in a three-part series on the foundations of statistical mechanics.

The Necessity of Statistical Mechanics for Getting Macro From Micro
Is The Fundamental Postulate of Statistical Mechanics A Priori?
The Central Paradox of Statistical Mechanics: The Problem of The Past

— — —

Let’s start this out with a thought experiment. Imagine that you have access to the exact fundamental laws of physics. Suppose further that you have unlimited computing power, for instance, you have an oracle that can instantly complete any computable task. What then do you know about the world?

The tempting answer: Everything! But of course, upon further consideration, you are missing a crucial ingredient: the initial conditions of the universe. The laws themselves aren’t enough to tell you about your universe, as many different universes are compatible with the laws. By specifying the state of the universe at any one time (which incidentally does not have to be an “initial” time), though, you should be able to narrow down this set of compatible universes. So let’s amend our question:

Suppose that you have unlimited computing power, that you know the exact microphysical laws, and that you know the state of the universe at some moment. Then what do you know about the world?

The answer is: It depends! What exactly do you know about the state of the universe? Do you know it’s exact microstate? As in, do you know the position and momentum of every single particle in the universe? If so, then yes, the entire past and future of the universe are accessible to you. But suppose that instead of knowing the exact microstate, you only have access to a macroscopic description of the universe. For example, maybe you have a temperature map as well as a particle density function over the universe. Or perhaps you know the exact states of some particles, just not all of them.

Well, if you only have access to the macrostate of the system (which, notice, is the epistemic situation that we find ourselves in, being that full access to the exact microstate of the universe is as technologically remote as can be), then it should be clear that you can’t specify the exact microstate at all other times. This is nothing too surprising or interesting… starting with imperfect knowledge you will not arrive at perfect knowledge. But we might hope that in the absence of a full description of the microstate of the universe at all other times, you could at least give a detailed macroscopic description of the universe at other times.

That is, here’s what seems like a reasonable expectation: If I had infinite computational power, knew the exact microphysical laws, and knew, say, that a closed box was occupied by a cloud of noninteracting gas in its corner, then I should be able to draw the conclusion that “The gas will disperse.” Or, if I knew that an ice cube was sitting outdoors on a table in the sun, then I should be able to apply my knowledge of microphysics to conclude that “The ice cube will melt”. And we’d hope that in addition to being able to make statements like these, we’d also be able to produce precise predictions for how long it would take for the gas to become uniformly distributed over the box, or for how long it would take for the ice cube to melt.

Here is the interesting and surprising bit. It turns out that this is in principle impossible to do. Just the exact microphysical laws and an infinity of computing power is not enough to do the job! In fact, the microphysical laws will in general tell us almost nothing about the future evolution or past history of macroscopic systems!

Take this in for a moment. You might not believe me (especially if you’re a physicist). For one thing, we don’t know the exact form of the microphysical laws. It would seem that such a bold statement about their insufficiencies would require us to at least first know what they are, right? No, it turns out that the statement that microphysics is is far too weak to tell us about the behavior of macroscopic systems holds for an enormously large class of possible laws of physics, a class that we are very sure that our universe belongs to.

Let’s prove this. We start out with the following observation that will be familiar to physicists: the microphysical laws appear to be time-reversible. That is, it appears to be the case that for every possible evolution of a system compatible with the laws of physics, the time-reverse of that evolution (obtained by simply reversing the trajectories of all particles) is also perfectly compatible with the laws of physics.*

This is surprising! Doesn’t it seem like there are trajectories that are physically possible for particles to take, such that their time reverse is physically impossible? Doesn’t it seem like classical mechanics would say that a ball sitting on the ground couldn’t suddenly bounce up to your hand? An egg unscramble? A gas collect in the corner of a room? The answer to all of the above is no. Classical mechanics, and fundamental physics in general, admits the possibilities of all these things. A fun puzzle for you is to think about why the first example (the ball initially at rest on the ground bouncing up higher and higher until it comes to rest in your hand) is not a violation of the conservation of energy.

Now here’s the argument: Suppose that you have a box that you know is filled with an ideal gas at equilibrium (uniformly spread through the volume). There are many many (infinitely many) microstates that are compatible with this description. We can conclusively say that in 15 minutes the gas will still be dispersed only if all of these microstates, when evolved forward 15 minutes, end up dispersed.

But, and here’s the crucial step, we also know that there exist very peculiar states (such as the macrostate in which all the gas particles have come together to form a perfect statuette of Michael Jackson) such that these states will in 15 minutes evolve to the dispersed state. And by time reversibility, this tells us that there is another perfectly valid history of the gas that starts uniformly dispersed and evolves over 15 minutes into a perfect statuette of Michael Jackson. That is, if we believe that complicated configurations of gases disperse, and believe that physics is time-reversible, then you must also believe that there are microstates compatible with dispersed states of gas that will in the next moment coalesce into some complicated configuration.

A collection of gas shaped exactly like Michael Jackson will disperse uniformly across its container.
Physics is time reversible.
So uniformly dispersed gases can coalesce into a collection of gases shaped exactly like Michael Jackson.

At this point you might be thinking “yeah, sure, microphysics doesn’t in principle rule out the possibility that a uniformly dispersed gas will coalesce into Michael Jackson, or any other crazy configuration. But who cares? It’s so incredibly unlikely!” To which the response is: Yes, exactly, it’s extremely unlikely. But nothing in the microphysical laws says this! Look as hard as you can at the laws of motion, you will not find a probability distribution over the likelihood of the different microstates compatible with a given macrostate. And indeed, different initial conditions of the universe will give different such frequencies distributions! To make any statements about the relative likelihood of some microstates over others, you need some principle above and beyond the microphysical laws.

To summarize. All that microphysics + infinite computing power allows you to say about a macrostate is the following: Here are all the microstates that are compatible with that macrostate, and here are all the past and future histories of each of these microstates. And given time reversibility, these future histories cover an enormously diverse set of predictions about the future, from “the gas will disperse” to “the gas will form into a statuette of Michael Jackson”. To get reasonable predictions about how the world will actually behave, we need some other principle, a principle that allows us to disregard these “perverse” microstates. And microphysics contains no such principle.

Statistical mechanics is thus the study of the necessary augmentation to a fundamental theory of physics that allows us to make predictions about the world, given that we are not in the position to know its exact microstate. This necessary augmentation is known as the fundamental postulate of statistical mechanics, and it takes the form of a probability distribution over microstates. Some people describe the postulate as saying “all microstates being equally likely”, but that phrasing is a big mistake, as the sentence “all states are equally likely” is not well defined over a continuous set of states. (More on that in a bit.) To really understand the fundamental postulate, we have to introduce the notion of phase space.

The phase space for a system is a mathematical space in which every point represents a full specification of the positions and momenta of all particles in the system. So, for example, a system consisting of 1000 classical particles swimming around in an infinite universe would have 6000 degrees of freedom (three position coordinates and three momentum coordinates per particle). Each of these degrees of freedom is isomorphic to the real numbers. So phase space for this system must be ℝ⁶⁰⁰⁰, and a point in phase space is a specification of the values of all 6000 degrees of freedom. In general, for N classical particles, phase space is ℝ^6N.

With the concept of phase space in hand, we can define the fundamental postulate of statistical mechanics. This is: the probability distribution over microstates compatible with a given macrostate is uniform over the corresponding volume of phase space.

It turns out that if you just measure the volume of the “perverse states” in phase space, you end up finding that it composes approximately 0% of the volume of compatible microstates in phase space. This of course allows us to say of perverse states, “Sure they’re there, and technically it’s possible that my system is in such a state, but it’s so incredibly unlikely that it makes virtually no impact on my prediction of the future behavior of my system.” And indeed, when you start going through the math and seeing the way that systems most likely evolve given the fundamental postulate, you see that the predictions you get match beautifully with our observations of nature.

Next time: What is the epistemic status of the fundamental postulate? Do we have good a priori reasons to believe it?

— — —

* There are some subtleties here. For one, we think that there actually is a very small time asymmetry in the weak nuclear force. And some collapse interpretations of quantum mechanics have the collapse of the wave function as an irreversible process, although Everettian quantum mechanics denies this. For the moment, let’s disregard all of that. The time asymmetry in the weak nuclear force is not going to have any relevant effect on the proof made here, besides making it uglier and more complicated. What we need is technically not exact time-reversibility, but very-approximate time-reversibility. And that we have. Collapsing wave functions are a more troubling business, and are a genuine way out of the argument made in this post.

A Cognitive Instability Puzzle, Part 2

September 18, 2019April 16, 2026 ~ ~ 4 Comments

This is a follow of this previous post, in which I present three unusual cases of belief updating. Read it before you read this.

I find these cases very puzzling, and I don’t have a definite conclusion for any of them. They share some deep similarities. Let’s break all of them down into their basic logical structure:

Joe
Joe initially believes in classical logic and is certain of some other stuff, call it X.
An argument A exists that concludes that X can’t be true if classical logic is true.
If Joe believes classical logic, then he believes A.
If Joe believes intuitionist logic, then he doesn’t believe A.

Karl
Karl initially believes in God and is certain of some other stuff about evil, call it E.
An argument A exists that concludes that God can’t exist if E is true.
If Karl believes in God, then he believes A.
If Karl doesn’t believe in God, then he doesn’t believe A.

Tommy
Tommy initially believes in her brain’s reliability and is certain of some other stuff about her experiences, call it Q.
An argument A exists that concludes that hat her brain can’t be reliable if Q is true.
If Tommy believes in her brain’s reliability, then she believes A.
If Tommy doesn’t believe in her brain’s reliability, then she doesn’t believe A.

First of all, note that all three of these cases are ones in which Bayesian reasoning won’t work. Joe is uncertain about the law of the excluded middle, without which you don’t have probability theory. Karl is uncertain about the meaning of the term ‘evil’, such that the same proposition switches from being truth-apt to being meaningless when he updates his beliefs. Probability theory doesn’t accommodate such variability in its language. And Tommy is entertaining a hypothesis according to which she no longer accepts any deductive or inductive logic, which is inconsistent with Bayesianism in an even more extreme way than Joe.

The more important general theme is that in all three cases, the following two things are true: 1) If an agent believes A, then they also believe an argument that concludes -A. 2) If that agent believes -A, then they don’t believe the argument that concludes -A.

Notice that if an agent initially doesn’t believe A, then they have no problem. They believe -A, and also happen to not believe that specific argument concluding -A, and that’s fine! There’s no instability or self-contradiction there whatsoever. So that’s really not where the issue lies.

The mystery is the following: If the only reason that an agent changed their mind from A to -A is the argument that they no longer buy, then what should they do? Once they’ve adopted the stance that A is false, should they stay there, reasoning that if they accept A they will be led to a contradiction? Or should they jump back to A, reasoning that the initial argument that led them there was flawed?

Said another way, should they evaluate the argument against A from their own standards, or from A’s standards? If they use their own standards, then they are in an unstable position, where they jump back and forth between A and -A. And if they always use A’s standards… well, then we get the conclusion that Tommy should believe herself to be a Boltzmann brain. In addition, if they are asked why they don’t believe A, then they find themselves in the weird position of giving an explanation in terms of an argument that they believe to be false!

I find myself believing that either Joe should be an intuitionist, Karl an atheist, and Tommy a radical skeptic, OR Joe a classical-logician, Karl a theist, and Tommy a reliability-of-brain-believer-in. That is, it seems like there aren’t any significant enough disanalogies between these three cases to warrant concluding one thing in one case and then going the other direction in another.

Logic, Theism, and Boltzmann Brains: On Cognitively Unstable Beliefs

September 16, 2019September 18, 2019 ~ ~ 7 Comments

First case

Propositional logic accepts that the proposition A∨-A is necessarily true. This is called the law of the excluded middle. Intuitionist logic differs in that it denies this axiom.

Suppose that Joe is a believer in propositional logic (but also reserves some credence for intuitionist logic). Joe also believes a set of other propositions, whose conjunction we’ll call X, and has total certainty in X.

One day Joe discovers that a contradiction can be derived from X, in a proof that uses the law of the excluded middle. Since Joe is certain that X is true, he knows that X isn’t the problem, and instead it must be the law of the excluded middle. So Joe rejects the law of the excluded middle and becomes an intuitionist.

The problem is, as an intuitionist, Joe now no longer accepts the validity of the argument that starts at X and concludes -X! Why? Because it uses the law of the excluded middle, which he doesn’t accept.

Should Joe believe in propositional logic or intuitionism?

Second case

Karl is a theist. He isn’t absolutely certain that theism is correct, but holds a majority of his credence in theism (and the rest in atheism). Karl is also 100% certain in the following claim: “If atheism is true, then the concept of ‘evil’ is meaningless”, and believes that logically valid arguments cannot be made using meaningless concepts.

One day somebody presents the problem of evil to Karl, and he sees it as a crushing objection to theism. He realizes that theism, plus some other beliefs about evil that he’s 100% confident in, leads to a contradiction. So since he can’t deny these other beliefs, he is led to atheism.

The problem is, as an atheist, Karl no longer accepts the validity of the argument that starts at theism and concludes atheism! Why? Because the arguments rely on using the concept of ‘evil’, and he is now certain that this concept is meaningless, and thus cannot be used in logically valid arguments.

Should Karl be a theist or an atheist?

Third case

Tommy is a scientist, and she believes that her brain is reliable. By this, I mean that she trusts her ability to reason both deductively and inductively. However, she isn’t totally certain about this, and holds out a little credence for radical skepticism. She is also totally certain about the content of her experiences, though not its interpretation (i.e. if she sees red, she is 100% confident that she is experiencing red, although she isn’t necessarily certain about what in the external world is causing the experience).

One day Tommy discovers that reasoning deductively and inductively from her experiences leads her to a model of the world that entails that her brain is actually a quantum fluctuation blipping into existence outside the event hole of a black hole. She realizes that this means that with overwhelmingly high probability, her brain is not reliable and is just producing random noise uncorrelated with reality.

The problem is, if Tommy believes that her brain is not reliable, then she can no longer accept the validity of the argument that led her to this position! Why? Well, she no longer trusts her ability to reason deductively or inductively. So she can’t accept any argument, let alone this particular one.

What should Tommy believe?

— — —

How are these three cases similar and different? If you think that Joe should be an intuitionist, or Karl an atheist, then should Tommy believe herself to be a black hole brain? Because it turns out that many cosmologists have found themselves to be in a situation analogous to Case 3! (Link.) I have my own thoughts on this, but I won’t share them for now.

Philosophers of religion are religious. Why?

September 12, 2019 ~ ~ 5 Comments

In 2009, David Chalmers organized a massive survey of over 3000 professional philosophers, grad students, and undergrads, asking them questions about all things philosophical and compiling the results. The results are broken down by area of specialization, age, race, gender, and everything else you might be interested in.

Here’s a link to the paper, and here to a listing of all survey results.

This is basically my favorite philosophy paper to read, and I find myself going back to look at the results all the time. I’d love to see an updated version of this survey, done ten years later, to see how things have changed (if at all).

There’s a whole lot I could talk about regarding this paper, but today I’ll just focus on one really striking result. Take a look at the following table from the paper:

Screen Shot 2019-09-11 at 1.36.13 PM.png

What’s shown is the questions which were answered most differently by specialists and non-specialists. At the very top of the list, with a discrepancy more than double the second highest, is the question of God’s existence. 86.78% of non-specialists said yes to atheism, and by contrast only 20.87% of philosophers of religion said yes to atheism. This is fascinating to me.

Here are two narratives one could construct to make sense of these results.

Narrative One

Philosophers that specialize in philosophy of religion probably select that specialization because they have a religious bias. A philosophically-minded devout Catholic is much more likely to go into philosophy of religion than, say, philosophy of language. And similarly, an atheistic philosopher would have less interest in studying philosophy of religion, being that they don’t even believe in the existence of the primary object of study, than a religious philosopher. So the result of the survey is exactly what you’d expect by the selection bias inherent in the specialization.

Narrative Two

Philosophers, like everybody else, are vulnerable to a presumption in favor of the beliefs of their society. Academics in general are quite secular, and in many quarters religion is treated as a product of a bygone age. So it’s only natural that philosophers that haven’t looked too deeply into the issue come out believing basically what the high-status individuals in their social class believe. But philosophers of religion, on the other hand, are those that have actually looked most closely and carefully at the arguments for and against atheism, and this gives them the ability to transcend their cultural bias and recognize the truth of religion.

As an atheist, it’s perhaps not surprising that my immediate reaction to seeing this result was something like Narrative One. And upon reflection, that still seems like the more likely explanation to me. But to a religious person, I’m sure that Narrative Two would seem like the obvious explanation. This, by the way, is what should happen from a Bayesian perspective. If two theories equally well explain some data, then the one with a higher prior should receive a larger credence bump than the one with a lower prior (although their odds ratio should stay fixed).

Ultimately, which of these stories is right? I don’t know. Perhaps both are right to some degree. But I think that it illustrates the difficulty of adjudicating expertise questions. Accusations of bias are quite easy to make, and can be hard to actually get to the bottom of. That said, it’s definitely possible to evaluate the first narrative, just by empirically looking at the reasons that philosophers of religion entered the field. If somebody knows of such a study, comment it or send me a message please! The results of a study like this could end up having a huge effect on my attitude towards questions of religion’s rationality.

Imagine that it turned out that most philosophers of religion were atheists when they entered the field, and only became religious after diving deep into the arguments. This is not what I’d expect to find, but if it was the case, it would serve as a super powerful argument against atheism for me.

Why do prediction markets work?

September 11, 2019September 11, 2019 ~ ~ Leave a comment

Is there a paradox in the continued existence of prediction markets? Recently I’ve been wondering this. Let me start with a little background for those that are unfamiliar with the concept of prediction markets.

Prediction markets are markets that allow you to bet on the outcomes of real-life events. This gives financial incentives to predict accurately, and as such the market price of a given bet reflects a kind of aggregate credence for that event occurring. There’s a whole bunch of results, theoretical and applied, that indicate that prediction markets serve to give robustly accurate probability estimates for real-world events.

Here’s a great paper by Robin Hanson about a political system based on prediction markets, named futarchy. Essentially, the idea is that voters determine a nation’s values, so as to generate some average national welfare metric, and then betting markets are used to decide policy. Some quotes:

On info-failures as a primary problem for democracy

According to many experts in economics and development, governments often choose policies that are “inefficient” in the sense that most everyone could expect to gain from other feasible policies. Many other kinds of experts also see existing policies as often clearly inferior to known alternatives.

If inferior policies would not have been adopted had most everyone known they are inferior, and if someone somewhere knew or could have learned that they are inferior, then we can blame inferior policies on a failure of our “info” institutions. By “info” here I just mean clues and analysis that should change our beliefs. Our info institutions are those within which we induce, express, and evaluate the acquiring and sharing of info. They include public relations teams, organized interest groups, news media, conversation forums, think tanks, universities, journals, elite committees, and state agencies. Inferior policies happen because our info institutions fail to induce people to acquire and share relevant info with properly-motivated decision makers.

[…]

Where might we find better info institutions? According to most experts in economics and finance, speculative markets are exemplary info institutions. That is, active speculative markets do very well at inducing people to acquire info, share it via trades, and collect that info into consensus prices that persuade wider audiences. This great success suggests that we should consider augmenting our political info institutions with speculative market institutions. That is, perhaps we should encourage people to create, trade in, and heed policy-relevant speculative markets, instead of discouraging such markets as we do today via anti-gambling laws.

Laying out the proposal

In futarchy, democracy would continue to say what we want, but betting markets would now say how to get it. That is, elected representatives would formally define and manage an after-the-fact measurement of national welfare, while market speculators would say which policies they expect to raise national welfare. The basic rule of government would be:

When a betting market clearly estimates that a proposed policy would increase expected national welfare, that proposal becomes law.

Futarchy is intended to be ideologically neutral; it could result in anything from an extreme socialism to an extreme minarchy, depending on what voters say they want, and on what speculators think would get it for them.

Futarchy seems promising if we accept the following three assumptions:

Democracies fail largely by not aggregating available information.

It is not that hard to tell rich happy nations from poor miserable ones.

Betting markets are our best known institution for aggregating information.

On the success of prediction markets

Betting markets, and speculative markets more generally, seem to do very well at aggregating information. To have a say in a speculative market, you have to “put your money where your mouth is.” Those who know they are not relevant experts shut up, and those who do not know this eventually lose their money, and then shut up. Speculative markets in essence offer to pay anyone who sees a bias in current market prices to come and correct that bias.

Speculative market estimates are not perfect. There seems to be a long-shot bias when there are high transaction costs, and perhaps also excess volatility in long term aggregate price movements. But such markets seem to do very well when compared to other institutions. For example, racetrack market odds improve on the predictions of racetrack experts, Florida orange juice commodity futures improve on government weather forecasts, betting markets beat opinion polls at predicting U.S. election results, and betting markets consistently beat Hewlett Packard official forecasts at predicting Hewlett Packard printer sales. In general, it is hard to find information that is not embodied in market prices.

On the possibility of manipulation of prediction markets

We want policy-related info institutions to resist manipulation, that is, to resist attempts to influence policy via distorted participation. Speculative markets do well here because they deal well with “noise trading,” that is, trading for reasons other than info about common asset values. When other traders can’t predict noise trading exactly, they compensate for its expected average by an opposite average trade, and compensate for its expected variation by trading more, and by working harder to find relevant info. Theory says that if trader risk-aversion is mild, and if more effort gives more info, then increased noise trading increases price accuracy. And in fact, the most accurate real speculative markets tend to be those with the most noise trading.

What do noise traders have to do with manipulators? Manipulators, who trade hoping to distort prices, are noise traders, since they trade for reasons other than asset value info. Thus adding manipulators to speculative markets doesn’t reduce average price accuracy. This has been verified in theory, in laboratory experiments, and in the field.

Futarchy remains for me one of the coolest and most exciting ideas I’ve heard in political philosophy, and prediction markets fascinate me. But for today, I have the following question about their feasibility:

If the only individuals that are able to consistently profit off the prediction market are the best predictors, then why wouldn’t the bottom 50% of predictors continuously drop out as they lose money on the market? If so, then as the population of market participants dwindles you would end up with a small fraction of really good predictors, each of whom sometimes gets lucky and makes money and sometimes is unlucky and loses some. On average, these people won’t be able to make money any more (as the ability to make money relies on the participation of inferior predictors in the market), so they’ll drop out as well.

If this line of reasoning is right, then it seems like prediction markets should inevitably collapse as their user base drops out. Why, then, do sites like PredictIt keep functioning?

One possibility is that there’s something wrong with the argument. This is honestly where most of my credence lies; tons of smart people endorse the idea, and this seems like a fairly obviously central flaw in the concept for them all to miss. If this argument isn’t wrong, though, then we have an interesting phenomenon to explain.

One explanation that came to my mind is that the continued survival of prediction markets is only possible because of a bug in human psychology, namely, a lack of epistemic humility. People are on average overly confident in their beliefs, and so uninformed people will continue confidently betting on propositions, even when they are generally betting against individuals with greater expertise.

Is this really what’s going on? I’m not sure. I would be surprised if humans were actually overconfident enough to continue betting on a market that they are consistently losing money on. Maybe they’d find some way to rationalize dropping out of the market that doesn’t amount to them admitting “My opinion is not worth as much as I thought it was”, but surely they would eventually stop betting after enough losses (putting aside whatever impulses drive people to gamble on guaranteed negative-expected-value games until they lose all their money.) On the other hand, it could be that the traffic of less-informed individuals does not consist of the same individuals betting over and over, and instead a constant crowd of new sheep coming in to be exploited by those more knowledgeable. What do you think? How do you explain this?

A Talmudic Probability Puzzle

September 2, 2019 ~ ~ 2 Comments

Today we’ll take a little break from the more intense abstract math stuff I’ve been doing, and do a quick dive into a fun probabilistic puzzle I found on the internet.

Background for the puzzle: In Ancient Israel, there was a court of 23 wise men that tried important cases, known as the Sanhedrin. If you were being tried for a crime by the Sanhedrin and a majority of them found you guilty, you were convicted. But there was an interesting twist on this! According to the Talmud (Tractate Sanhedrin: Folio 17a), if the Sanhedrin unanimously found you guilty, you were to be acquitted.

If the Sanhedrin unanimously find [the accused] guilty, he is acquitted. Why? — Because we have learned by tradition that sentence must be postponed till the morrow in hope of finding new points in favour of the defence. But this cannot be anticipated in this case.

Putting aside the dubious logic of this rule, it gives rise to an interesting probability puzzle with a counterintuitive answer. Imagine that an accused murderer has been brought before the Sanhedrin, and that the evidence is strong enough that no judge has any doubt in their mind about his guilt. Each judge obviously wants for the murderer to be convicted, and would ordinarily vote to convict. But under this Talmudic rule, they need to be worried about the prospect of them all voting guilty and therefore letting him off scot-free!

So: If a probability p can be chosen such that each and every judge votes to convict with probability p, and to acquit with probability 1 – p, which p will give them the highest probability of ultimately convicting the guilty man?

Furthermore, imagine that the number of judges is not 23, but some arbitrarily high number. As the number of judges goes to infinity, what does p converge to?

I want you to think about this for a minute and test your intuitions before moving on.

(…)

So, it turns out that the optimal p for 23 judges is actually ≈ 75.3%. And as the number of judges goes to infinity? The optimal value of p converges to…

80%!

This was a big shock to me. I think the natural first thought is that when you have thousands and thousands of judges, you only need a minuscule chance for any one judge to vote ‘acquit’ in order to ensure a majority and prevent him from getting off free. So I initially guessed that p would be something like 99%, and would converge to 100% in the limit of infinite judges.

But this is wrong! And of the small sample of mathematically gifted friends I asked this question to, they mostly guessed the same as me.

There’s clearly a balance going on between the risk of a minority voting to convict and the risk of a unanimous vote to convict. For small p, the first of these is ~1 and the second is ~0, and for p ~ 1, the first is ~0 and the second ~1. It seems that we are naturally underemphasizing the danger of a minority vote to convict, and overemphasizing the danger of the unanimous vote.

Here are some plots of the various relevant values, for different numbers of judges:

10 Judges 23 Judges 100 Judges 150 Judges

One thing to notice is that as the number of judges gets larger, the graph’s peak becomes more and more of a plateau. And in the limit of infinite judges, you can show that the graph is actually just a simple step function: Pr(conviction) = 0 if p < .5, and 1 if p > .5. This means that while yes, technically, 80% is the optimal value, you can do pretty much equally well by choosing any value of p greater than 50%.

My challenge to you is to come up with some justification for the value 80%. Good luck!

Galois’ Unsolvable Polynomials

September 1, 2019April 16, 2026 ~ ~ Leave a comment

Galois’ process for finding out if a polynomial is solvable by radicals (i.e., if you can write its roots using a finite number of rational numbers and the symbols +, -, ×, /, and √) is actually surprisingly simple to describe. Here’s a quick 4-step summary of the process:

Take a polynomial p(x).
Define E to be the smallest field that contains all rational numbers, as well as all the roots of p(x). This field is called the splitting field of p(x).
Define Gal(E/Q) to be the set of all isomorphisms from E to itself that hold fixed all rational numbers. This set is a group, and is called the Galois group of p(x).
p(x) is solvable by radicals if and only if Gal(E/Q) is a solvable group.

The proof of (4) is pretty complicated (much more involved than the proof I gave in the last post). Galois’ method is also a little weaker in that it only allows you to conclude unsolvability using the symbols +, -, ×, /, and √, whereas the last post also concluded unsolvability with √ as well as any univalent function (exp, log, sin, or whatever). However, it does allow you to not just generally state that some order-five polynomials are not solvable by radicals, but to determine for any given polynomial whether it is solvable. The process also gives you a good deal of insight into the nature of the roots of the polynomial you’re studying.

Now, the crucial step in this 4-step process is (4), which equates solvable polynomials with solvable groups. There are a few different but equivalent definitions of solvable groups:

Composition Series Definition

A composition series of G is a series of subgroups of G, such that each subgroup is a maximal normal subgroup of the previous subgroup (maximal means that no normal subgroup contains it besides G itself).
- {1} = G₀ ◃ G₁ ◃ … ◃ G_n = G.
G is solvable if and only if the factors of its composition series are all cyclic of prime order (i.e. if for all k, G_k+1/G_k ≅ ℤ_p for prime p).

Derived Series Definition

The derived series of G is the series of subgroups where each subgroup is the commutator subgroup of the previous subgroup.
- … ◃ [[G,G], [G,G]]◃ [G, G] ◃ G.
G is solvable if and only if this series eventually reaches the trivial group.

Subnormal Series Definition

A subnormal series of G is a series of subgroups of G where each subgroup is a normal subgroup of the previous subgroup, and which terminates in the trivial group.
- {1} = G₀ ◃ G₁ ◃ … ◃ G_n = G.
G is solvable if and only if all the factors of a subnormal series of G are abelian.

Solvability is an interesting group-theoretic property aside from its application in analyzing polynomial roots. Here are some things we know about solvable groups, in order of generally decreasing power:

Every group of odd order is solvable.
Every abelian group is solvable.
Every group of prime power order is solvable. (i.e. if |G| = pⁿ for prime p and any n)
Every group of product of prime powers order is solvable. (i.e. if |G| = pⁿq^m for primes p, q and any n, m)
Every group whose Sylow subgroups are cyclic is solvable.
No finite non-abelian simple groups are solvable.
If H and G/H are solvable, then G is solvable.
S_n is not solvable for all n ≥ 5.
Dihedral groups are all solvable.

A fun exercise is to see how many of these you can prove. (Some are much easier than others, and in fact the first one took 255 pages to prove!)

It turns out that most groups are solvable. In fact, the smallest non-solvable group has 60 elements (A₅). Here’s the first few numbers in the sequence of sizes of non-solvable groups:

60, 120, 168, 180, 240, 300, 336, 360, 420, 480, 504, 540, 600, 660, 672, 720, 780, 840, 900, 960, …

Determining the elements of Gal(E/ℚ)

Practically, the hardest part of the above 4-step process is determining what the Galois group is for a given polynomial. Usually one knows very little about the roots of the polynomial in question, and therefore also knows little about its splitting field. So how to determine the Galois group (the set of automorphisms of the splitting field that fix ℚ) without even knowing what the splitting field is? Well, it turns out that there are some really useful general tricks.

For one: Sometimes you can look closely at the polynomial and discover some simple algebraic relations that must hold among the roots. For instance, say p(x) = x⁴ + 2x² – 1. Since there are only even powers of x in p(x), for any root r of p(x), -r must also be a root. This means that roots of p(x) can be written {A, -A, B, -B} for some reals A and B. And by the properties of homomorphisms, whichever root X that A is mapped to by a member of the Galois group, -A must also map to -X. Another way to say this is that the Galois group of any even polynomial is not the full group of permutations of the roots S_n (as the evenness imposes the above restriction on the allowed automorphisms).

A superb trick related to this is to look at the modular algebraic relations between roots. In general, you can take a complicated irreducible polynomial, and break it into smaller irreducible factors modulo a prime p. Dedekind’s theorem tells us that if there are no repeated factors, then the Galois group contains permutations of the roots with cycle type corresponding to the degrees of the factors.

Example 1

Let p(x) = x² – 1.
The splitting field E of p(x) is clearly ℚ(√2) = {a + b√2 for all a, b in ℚ}.
Gal(E/ℚ) = the set of automorphisms on E that hold fixed all rationals.
If f is in Gal(E/ℚ), then f(√2)² = f(√2 × √2) = f(2) = 2. So f(√2) = ±√2. So there are two automorphisms in Gal(E/ℚ): f(a + b√2) = a + b√2 and f(a + b√2) = a – b√2.
So Gal(E/ℚ) ≅ ℤ₂
Since ℤ₂ is solvable, so p(x) is solvable (as was obvious from the outset).

Example 2

Let p(x) = x⁵ + x⁴ + x + 3. Then we can write:

p(x) = (x + 1)(x² + x + 1)(x³ + x + 1) mod 2
p(x) = x(x + 2)(x⁴ + x³ + 2x² + 2x + 2) mod 3
p(x) = (x + 3)² (x⁴ + 4x³ + 3x² + x + 2) mod 5
p(x) = (x² + 5x + 2)(x⁴ + 2x³ + 3x² + 2x + 5) mod 7
p(x) = (x + 6)(x⁵ + 5x⁴ + 4x³ + 9x² + x + 6) mod 11
p(x) = (x² + 8x + 1)(x² + 9x + 10)(x² + 9x + 12) mod 13

From this and Dedekind’s theorem we can conclude:

Gal(E/ℚ) contains a permutation of cycle type (1,2,3)
Gal(E/ℚ) contains a permutation of cycle type (1,1,4): a 4-cycle
Repeated factor of x+3, so we don’t learn anything
Gal(E/ℚ) contains a permutation of cycle type (2,4)
Gal(E/ℚ) contains a permutation of cycle type (1,5): a 5-cycle
Gal(E/ℚ) contains a permutation of cycle type (2,2,2)

This automatically gives us a lot of insight into Gal(E/ℚ)! We have permutations of the form:

(a b)(c d e)
(a b c d)
N/A
(a b)(c d e f)
(a b c d e)
(a b)(c d)(e f)

We can go further and show that the only group of automorphisms that contains all of these types of elements is S₅. (Every symmetric group can generated by a transposition and a cycle. We have a cycle by #5, and we can get a cycle by cubing #1.) And since S₅ is not solvable (its commutator subgroup is A₅, whose commutator subgroup is itself, and thus the derived series never terminates), the polynomial x⁵ + x⁴ + x + 3 is not solvable by radicals.

One final note: There is a fantastic computational algebra system designed to solve problems in high-level mathematics known as Magma. There’s an online Magma calculator here which is free to use. To use it to find the Galois group of the above polynomial (for example), you type:

P<x>:=PolynomialRing(Rationals());
GaloisGroup(x^5+x^4+x+3);

The Inverse Galois Problem

Now we get to the most tantalizing part!

Instead of starting with a polynomial p(x) and finding its Galois group, one could equally well start with a group G and ask whether G is the Galois group of any polynomial with rational coefficients! If so, we say that G is realizable, and is realized by p(x).

It turns out that whether or not every finite group is realizable turns out to be one of the big unsolved questions in mathematics. We’d like to prove that you can find a polynomial with coefficients from ℚ with Galois group G, for every finite group G, but in general it’s not known if this is possible! This paper lists all the non-abelian simple groups of cardinality 100 million that are currently not known to be realizable.

We do know the answer for some general categories of groups. Here are some things that are known, in order of decreasing strength.

All solvable groups are realizable.
All finite abelian groups are realizable.
All the symmetric and alternating groups are realizable.
All cyclic groups are realizable (special case of finite abelian groups being realizable)
25 of the 26 sporadic groups are known to be realizable (the missing sporadic group is the Mathieu group M₂₃, whose realizability remains an open question).
- Amazingly, this implies the existence of a polynomial with rational coefficients whose Galois group is the Monster group!