This is the second part in a three-part series on the foundations of statistical mechanics.

- The Necessity of Statistical Mechanics for Getting Macro From Micro
- Is The Fundamental Postulate of Statistical Mechanics A Priori?
- The Central Paradox of Statistical Mechanics: The Problem of The Past

— — —

The fantastic empirical success of the fundamental postulate gives us a great amount of assurance that the postulate is good one. But it’s worth asking whether that’s the *only* reason that we should like this postulate, or if it has some solid a priori justification. The basic principle of “when you’re unsure, just distribute credences evenly over phase space” certainly strikes many people as highly intuitive and justifiable on a priori grounds. But there are some huge problems with this way of thinking, one of which I’ve already hinted at. Here’s a thought experiment that illustrates the problem.

There is a factory in your town that produces cubic boxes. All you know about this factory is that the boxes that they produce all have a volume between 0 m^{3} and 1 m^{3}. You are going to be delivered a box produced by this factory, and are asked to represent your state of knowledge about the box with a probability distribution. What distribution should you use?

Suppose you say “I should be indifferent over all the possible boxes. So I should have a uniform distribution over the volumes from 0 m^{3} to 1 m^{3}.” This might seem reasonable at first blush. But what if somebody else said “Yes, you should be indifferent over all the possible boxes, but actually the uniform distribution should be over the *side lengths* from 0 m to 1 m, not volumes.” This would be a very different probability distribution! For example, if the probability that the side length is greater than .5 m is 50%, then the probability that the volume is greater than (.5)^{3} = 1/8 is also 50%! Uniform over side length is not the same as uniform over volume (or surface area, for that matter). Now, how do you choose between a uniform distribution over volumes and a uniform distribution over side lengths? After all, you know nothing about the process that the factory is using to produce the boxes, and whether it is based off of volume or side length (or something else); all you know is that all boxes are between 0 m^{3} and 1 m^{3}.

The lesson of this thought experiment is that the statement we started with (“I should be indifferent over all possible boxes”) was actually not even well-defined. There’s not just one unique measure over a continuous space, and in general the notion that “all possibilities are equally likely” is highly language-dependent.

The *exact* same applies to phase space, as position and momentum are continuous quantities. Imagine that somebody instead of talking about phase space, only talked about “craze space”, in which all positions become positions cubed, and all momentum values become natural logs of momentum. This space would still contain all possible microstates of your system. What’s more, the fundamental laws of nature could be rewritten in a way that uses only craze space quantities, not phase space quantities. And needless to say, being indifferent over phase space would *not* be the same as being indifferent over craze space.

Spend enough time looking at attempts to justify a unique interpretation of the statement “All states are equally likely”, when your space of states is a continuous infinity, and you’ll realize that all such attempts are deeply dependent upon arbitrary choices of language. The maximum information entropy probability distribution is afflicted with the exact same problem, because the entropy of your distribution is going to depend on the language you’re using to describe it! The entropy of a distribution in phase space is NOT the same as the entropy of the equivalent distribution transformed to craze space.

Let’s summarize this section. If somebody tells you that the fundamental postulate says that all microstates compatible with what you know about the macroscopic features of your system are equally likely, the proper response is something like “Equally likely? That sounds like you’re talking about a uniform distribution. But uniform over what? Oh, position and momentum? Well, why’d you make that choice?” And if they point out that the laws of physics are expressed in terms of position and momentum, you just disagree and say “No, actually I prefer writing the laws of physics in terms of position cubed and log momentum!” (Substitute in any choice of monotonic functions).

If they object on the grounds of simplicity, point out that position and momentum are only simple as measured from a standpoint that takes them to be the fundamental concepts, and that from your perspective, getting position and momentum requires applying complicated inverse transformations to your monotonic transformation of the chosen coordinates.

And if they object on the grounds of naturalness, the right response is probably something like “Tell me more about this ’naturalness’. How do you know what’s natural or unnatural? It seems to me that your choice of what physical concepts count as natural is a manifestation of deep selection pressures that push any beings whose survival depends on modeling and manipulating their surroundings towards forming an empirically accurate model of the macroscopic world. So that when you say that position is more natural than log(position), what I hear is that the fundamental postulate is a very useful tool. And you can’t use the naturalness of the choice of position to justify the fundamental postulate, when your perception of the naturalness of position is the *result* of the empirical success of the fundamental postulate!”

In my judgement, none of the a priori arguments work, and fundamentally the reason is that the fundamental postulate is an empirical claim. There’s no a priori principle of rationality that tells us that boxes of gases tend to equilibrate, because you can construct a universe whose initial microstate is such that its entire history is one of entropy radically decreasing, gases concentrating, eggs unscrambling, ice cubes unmelting, and so on. Why is this possible? Because it’s consistent with the microphysical laws that the universe started in an enormously low entropy configuration, so it’s gotta also be consistent with the microphysical laws for the entire universe to spend its entire lifetime decreasing in entropy. The general principle is: If you believe that something is physically possible, then you should believe its time-inverse is possible as well.

Let’s pause and take stock. What I’ve argued for so far is the following set of claims:

- To successfully predict the behavior of macroscopic systems, we need something above and beyond the microphysical laws.
- This extra thing we need is the fundamental postulate of statistical mechanics, which assigns a uniform distribution over the region of phase space consistent with what you know about the system. This postulate allows us to prove all the things we want to say about the future, such as “gases expand”, “ice cubes melt”, “people age” and so on.
- This fundamental postulate is not justifiable on a priori grounds, as it is fundamentally an empirical claim about how frequently different microstates pop up in our universe. Different initial conditions give rise to different such frequencies, so that a claim to a priori access to the fundamental postulate is a claim to a priori access to the precise details of the initial condition of the universe.

There’s just one problem with all this… apply our postulate to the past, and everything breaks.

Up next: Why does statistical mechanics give crazy answers about the past? Where did we go wrong?

Some approaches get the postulates of thermodynamics from the postulates of quantum mechanics, see Zurek’s derivation of the Born’s probability rule for some of those references.

I know a little about Zurek and envariance, but had never seen this derivation! Thanks for the reference, quite cool. That said, just getting the Born rule is not enough to get you all of statistical mechanics; you also need a separate postulate for how to place a probability distribution over a set of microstates compatible with macroscopic knowledge.