The Central Paradox of Statistical Mechanics: The Problem of The Past

This is the third part in a three-part series on the foundations of statistical mechanics.

  1. The Necessity of Statistical Mechanics for Getting Macro From Micro
  2. Is The Fundamental Postulate of Statistical Mechanics A Priori?
  3. The Central Paradox of Statistical Mechanics: The Problem of The Past

— — —

What I’ve argued for so far is the following set of claims:

  1. To successfully predict the behavior of macroscopic systems, we need something above and beyond the microphysical laws.
  2. This extra thing we need is the fundamental postulate of statistical mechanics, which assigns a uniform distribution over the region of phase space consistent with what you know about the system. This postulate allows us to prove all the things we want to say about the future, such as “gases expand”, “ice cubes melt”, “people age” and so on.
  3. This fundamental postulate is not justifiable on a priori grounds, as it is fundamentally an empirical claim about how frequently different micro states pop up in our universe. Different initial conditions give rise to different such frequencies, so that a claim to a priori access to the fundamental postulate is a claim to a priori access to the precise details of the initial condition of the universe.

 There’s just one problem with all this… apply our postulate to the past, and everything breaks.

 Notice that I said that the fundamental postulate allows us to prove all the things we want to say about the future. That wording was chosen carefully. What happens if you try to apply the microphysical laws + the fundamental postulate to predict the past of some macroscopic system? It turns out that all hell breaks loose. Gases spontaneously contract, ice cubes form from puddles of water, and brains pop out of thermal equilibrium.

 Why does this happen? Very simply, we start with two fully time reversible premises (the microphysical laws and the fundamental postulate). We apply it to present knowledge of some state, the description of which does not specify a special time direction. So any conclusion we get must as a matter of logic be time reversible as well! You can’t start with premises that treat the past as the mirror image of the future, and using just the rules of logical equivalence derive a conclusion that treats the past as fundamentally different from the future. And what this means is that if you conclude that entropy increases towards the future, then you must also conclude that entropy increases towards the past. Which is to say that we came from a higher entropy state, and ultimately (over a long enough time scale and insofar as you think that our universe is headed to thermal equilibrium) from thermal equilibrium.

Let’s flesh this argument out a little more. Consider a half-melted ice cube sitting in the sun. The microphysical laws + the fundamental postulate tell us that the region of phase space consisting of states in which the ice cube is entirely melted is much much much larger than the region of phase space in which it is fully unmelted. So much larger, in fact, that it’s hard to express using ordinary English words. This is why we conclude that any trajectory through phase space that passes through the present state of the system (the half-melted cube) is almost certainly going to quickly move towards the regions of phase space in which the cube is fully melted. But for the exact same reason, if we look at the set of trajectories that pass through the present state of the system, the vast vast vast majority of them will have come from the fully-melted regions of phase space. And what this means is that the inevitable result of our calculation of the ice cube’s history will be that a few moments ago it was a puddle of water, and then it spontaneously solidified and formed into a half-melted ice cube.

This argument generalizes! What’s the most likely past history of you, according to statistical mechanics? It’s not that the solar system coalesced from a haze of gases strewn through space by a past supernova, such that a planet would form in the Goldilocks zone and develop life, which would then gradually evolve through natural selection to the point where you are sitting in whatever room you’re sitting in reading this post. This trajectory through phase space is enormously unlikely. The much much much more likely past trajectory of you through phase space is that a little while ago you were a bunch of particles dispersed through a universe at thermal equilibrium, which happened to spontaneously coalesce into a brain that has time to register a few moments of experience before dissipating back into chaos. “What about all of my memories of the past?” you say. As it happens the most likely explanation of these memories is not that they are veridical copies of real happenings in the universe but illusions, manufactured from randomness.

Basically, if you buy everything I’ve argued in the first two parts, then you are forced to conclude that the universe is most likely near thermal equilibrium, with your current experience of it arising as a spontaneous dip in entropy, just enough to produce a conscious brain but no more. There are at least two big problems with this view.

Problem 1: This conclusion is, we think, extremely empirically wrong! The ice cube in front of you didn’t spontaneously form from a puddle of water, uncracked eggs weren’t a moment ago scrambled, and your memories are to some degree veridical. If you really believe that you are merely a spontaneous dip in entropy, then your prediction for the next minute will be the gradual dissolution of your brain and loss of consciousness. Now, wait a minute and see if this happens. Still here? Good!

Problem 2: The conclusion cannot be simultaneously believed and justified. If you think that you’re a thermal fluctuation, then you shouldn’t credit any of your memories as telling you anything about the world. But then your whole justification to coming to the conclusion in the first place (the experiments that led us to conclude that physics is time-reversible and that the fundamental postulate is true) is undermined! Either you believe it without justification, or you don’t believe despite justification. Said another way, no reflective equilibrium exists at an entropy minimum. David Albert calls this peculiar epistemic state cognitively unstable, as it’s not clear where exactly it should leave you.

Reflect for a moment on how strange of a situation we are in here. Starting from very basic observations of the world, involving its time-reversibility on the micro scale and the increase in entropy of systems, we see that we are inevitably led to the conclusion that we are almost certainly thermal fluctuations, brains popping out of the void. I promise you that no trick has been pulled here, this really is the state of the philosophy of statistical mechanics! The big issue is how to deal with this strange situation.

One approach is to say the following: Our problem is that our predictions work towards the future but not the past. So suppose that we simply add as a new fundamental postulate the proposition that long long ago the universe had an incredibly low entropy. That is, suppose that instead of just starting with the microphysical laws and the fundamental postulate of statistical mechanics, we added a third claims: the Past Hypothesis.

The Past Hypothesis should be understood as an augmentation of our Fundamental Postulate. Taken together, the two postulates say that our probability distribution over possible microstates should not be uniform over phase space. Instead, it should be what you get when you take the uniform distribution, and then condition on the distant past being extremely low entropy. This process of conditioning clearly preferences one direction of time over the other, and so the symmetry is broken.

 It’s worth reflecting for a moment on the strangeness of the epistemic status of the Past Hypothesis. It happens that we have over time accumulated a ton of observational evidence for the occurrence of the Big Bang. But none of this evidence has anything to do with our reasons for accepting the Past Hypothesis. If we buy the whole line of argument so far, our conclusion that something like a Big Bang occurred becomes something that we are forced to believe for deep logical reasons, on pain of cognitive instability and self-undermining belief. Anybody that denies that the Big Bang (or some similar enormously low-entropy past state) occurred has to contend with their view collapsing in self-contradiction upon observing the physical laws!

Is The Fundamental Postulate of Statistical Mechanics A Priori?

This is the second part in a three-part series on the foundations of statistical mechanics.

  1. The Necessity of Statistical Mechanics for Getting Macro From Micro
  2. Is The Fundamental Postulate of Statistical Mechanics A Priori?
  3. The Central Paradox of Statistical Mechanics: The Problem of The Past

— — —

The fantastic empirical success of the fundamental postulate gives us a great amount of assurance that the postulate is good one. But it’s worth asking whether that’s the only reason that we should like this postulate, or if it has some solid a priori justification. The basic principle of “when you’re unsure, just distribute credences evenly over phase space” certainly strikes many people as highly intuitive and justifiable on a priori grounds. But there are some huge problems with this way of thinking, one of which I’ve already hinted at. Here’s a thought experiment that illustrates the problem.

There is a factory in your town that produces cubic boxes. All you know about this factory is that the boxes that they produce all have a volume between 0 m3 and 1 m3. You are going to be delivered a box produced by this factory, and are asked to represent your state of knowledge about the box with a probability distribution. What distribution should you use?

Suppose you say “I should be indifferent over all the possible boxes. So I should have a uniform distribution over the volumes from 0 m3 to 1 m3.” This might seem reasonable at first blush. But what if somebody else said “Yes, you should be indifferent over all the possible boxes, but actually the uniform distribution should be over the side lengths from 0 m to 1 m, not volumes.” This would be a very different probability distribution! For example, if the probability that the side length is greater than .5 m is 50%, then the probability that the volume is greater than (.5)3 = 1/8 is also 50%! Uniform over side length is not the same as uniform over volume (or surface area, for that matter). Now, how do you choose between a uniform distribution over volumes and a uniform distribution over side lengths? After all, you know nothing about the process that the factory is using to produce the boxes, and whether it is based off of volume or side length (or something else); all you know is that all boxes are between 0 m3 and 1 m3.

The lesson of this thought experiment is that the statement we started with (“I should be indifferent over all possible boxes”) was actually not even well-defined. There’s not just one unique measure over a continuous space, and in general the notion that “all possibilities are equally likely” is highly language-dependent.

The exact same applies to phase space, as position and momentum are continuous quantities. Imagine that somebody instead of talking about phase space, only talked about “craze space”, in which all positions become positions cubed, and all momentum values become natural logs of momentum. This space would still contain all possible microstates of your system. What’s more, the fundamental laws of nature could be rewritten in a way that uses only craze space quantities, not phase space quantities. And needless to say, being indifferent over phase space would not be the same as being indifferent over craze space.

Spend enough time looking at attempts to justify a unique interpretation of the statement “All states are equally likely”, when your space of states is a continuous infinity, and you’ll realize that all such attempts are deeply dependent upon arbitrary choices of language. The maximum information entropy probability distribution is afflicted with the exact same problem, because the entropy of your distribution is going to depend on the language you’re using to describe it! The entropy of a distribution in phase space is NOT the same as the entropy of the equivalent distribution transformed to craze space.

Let’s summarize this section. If somebody tells you that the fundamental postulate says that all microstates compatible with what you know about the macroscopic features of your system are equally likely, the proper response is something like “Equally likely? That sounds like you’re talking about a uniform distribution. But uniform over what? Oh, position and momentum? Well, why’d you make that choice?” And if they point out that the laws of physics are expressed in terms of position and momentum, you just disagree and say “No, actually I prefer writing the laws of physics in terms of position cubed and log momentum!” (Substitute in any choice of monotonic functions).

If they object on the grounds of simplicity, point out that position and momentum are only simple as measured from a standpoint that takes them to be the fundamental concepts, and that from your perspective, getting position and momentum requires applying complicated inverse transformations to your monotonic transformation of the chosen coordinates.

And if they object on the grounds of naturalness, the right response is probably something like “Tell me more about this ’naturalness’. How do you know what’s natural or unnatural? It seems to me that your choice of what physical concepts count as natural is a manifestation of deep selection pressures that push any beings whose survival depends on modeling and manipulating their surroundings towards forming an empirically accurate model of the macroscopic world. So that when you say that position is more natural than log(position), what I hear is that the fundamental postulate is a very useful tool. And you can’t use the naturalness of the choice of position to justify the fundamental postulate, when your perception of the naturalness of position is the result of the empirical success of the fundamental postulate!”

In my judgement, none of the a priori arguments work, and fundamentally the reason is that the fundamental postulate is an empirical claim. There’s no a priori principle of rationality that tells us that boxes of gases tend to equilibrate, because you can construct a universe whose initial microstate is such that its entire history is one of entropy radically decreasing, gases concentrating, eggs unscrambling, ice cubes unmelting, and so on. Why is this possible? Because it’s consistent with the microphysical laws that the universe started in an enormously low entropy configuration, so it’s gotta also be consistent with the microphysical laws for the entire universe to spend its entire lifetime decreasing in entropy. The general principle is: If you believe that something is physically possible, then you should believe its time-inverse is possible as well.

Let’s pause and take stock. What I’ve argued for so far is the following set of claims:

  1. To successfully predict the behavior of macroscopic systems, we need something above and beyond the microphysical laws.
  2. This extra thing we need is the fundamental postulate of statistical mechanics, which assigns a uniform distribution over the region of phase space consistent with what you know about the system. This postulate allows us to prove all the things we want to say about the future, such as “gases expand”, “ice cubes melt”, “people age” and so on.
  3. This fundamental postulate is not justifiable on a priori grounds, as it is fundamentally an empirical claim about how frequently different microstates pop up in our universe. Different initial conditions give rise to different such frequencies, so that a claim to a priori access to the fundamental postulate is a claim to a priori access to the precise details of the initial condition of the universe.

There’s just one problem with all this… apply our postulate to the past, and everything breaks.

Up next: Why does statistical mechanics give crazy answers about the past? Where did we go wrong?

A Cognitive Instability Puzzle, Part 2

This is a follow of this previous post, in which I present three unusual cases of belief updating. Read it before you read this.

I find these cases very puzzling, and I don’t have a definite conclusion for any of them. They share some deep similarities. Let’s break all of them down into their basic logical structure:

Joe
Joe initially believes in classical logic and is certain of some other stuff, call it X.
An argument A exists that concludes that X can’t be true if classical logic is true.
If Joe believes classical logic, then he believes A.
If Joe believes intuitionist logic, then he doesn’t believe A.

Karl
Karl initially believes in God and is certain of some other stuff about evil, call it E.
An argument A exists that concludes that God can’t exist if E is true.
If Karl believes in God, then he believes A.
If Karl doesn’t believe in God, then he doesn’t believe A.

Tommy
Tommy initially believes in her brain’s reliability and is certain of some other stuff about her experiences, call it Q.
An argument A exists that concludes that hat her brain can’t be reliable if Q is true.
If Tommy believes in her brain’s reliability, then she believes A.
If Tommy doesn’t believe in her brain’s reliability, then she doesn’t believe A.

First of all, note that all three of these cases are ones in which Bayesian reasoning won’t work. Joe is uncertain about the law of the excluded middle, without which you don’t have probability theory. Karl is uncertain about the meaning of the term ‘evil’, such that the same proposition switches from being truth-apt to being meaningless when he updates his beliefs. Probability theory doesn’t accommodate such variability in its language. And Tommy is entertaining a hypothesis according to which she no longer accepts any deductive or inductive logic, which is inconsistent with Bayesianism in an even more extreme way than Joe.

The more important general theme is that in all three cases, the following two things are true: 1) If an agent believes A, then they also believe an argument that concludes -A. 2) If that agent believes -A, then they don’t believe the argument that concludes -A.

Notice that if an agent initially doesn’t believe A, then they have no problem. They believe -A, and also happen to not believe that specific argument concluding -A, and that’s fine! There’s no instability or self-contradiction there whatsoever. So that’s really not where the issue lies.

The mystery is the following: If the only reason that an agent changed their mind from A to -A is the argument that they no longer buy, then what should they do? Once they’ve adopted the stance that A is false, should they stay there, reasoning that if they accept A they will be led to a contradiction? Or should they jump back to A, reasoning that the initial argument that led them there was flawed?

Said another way, should they evaluate the argument against A from their own standards, or from A’s standards? If they use their own standards, then they are in an unstable position, where they jump back and forth between A and -A. And if they always use A’s standards… well, then we get the conclusion that Tommy should believe herself to be a Boltzmann brain. In addition, if they are asked why they don’t believe A, then they find themselves in the weird position of giving an explanation in terms of an argument that they believe to be false!

I find myself believing that either Joe should be an intuitionist, Karl an atheist, and Tommy a radical skeptic, OR Joe a classical-logician, Karl a theist, and Tommy a reliability-of-brain-believer-in. That is, it seems like there aren’t any significant enough disanalogies between these three cases to warrant concluding one thing in one case and then going the other direction in another.

Logic, Theism, and Boltzmann Brains: On Cognitively Unstable Beliefs

First case

Propositional logic accepts that the proposition A-A is necessarily true. This is called the law of the excluded middle. Intuitionist logic differs in that it denies this axiom.

Suppose that Joe is a believer in propositional logic (but also reserves some credence for intuitionist logic). Joe also believes a set of other propositions, whose conjunction we’ll call X, and has total certainty in X.

One day Joe discovers that a contradiction can be derived from X, in a proof that uses the law of the excluded middle. Since Joe is certain that X is true, he knows that X isn’t the problem, and instead it must be the law of the excluded middle. So Joe rejects the law of the excluded middle and becomes an intuitionist.

The problem is, as an intuitionist, Joe now no longer accepts the validity of the argument that starts at X and concludes -X! Why? Because it uses the law of the excluded middle, which he doesn’t accept.

Should Joe believe in propositional logic or intuitionism?

Second case

Karl is a theist. He isn’t absolutely certain that theism is correct, but holds a majority of his credence in theism (and the rest in atheism). Karl is also 100% certain in the following claim: “If atheism is true, then the concept of ‘evil’ is meaningless”, and believes that logically valid arguments cannot be made using meaningless concepts.

One day somebody presents the problem of evil to Karl, and he sees it as a crushing objection to theism. He realizes that theism, plus some other beliefs about evil that he’s 100% confident in, leads to a contradiction. So since he can’t deny these other beliefs, he is led to atheism.

The problem is, as an atheist, Karl no longer accepts the validity of the argument that starts at theism and concludes atheism! Why? Because the arguments rely on using the concept of ‘evil’, and he is now certain that this concept is meaningless, and thus cannot be used in logically valid arguments.

Should Karl be a theist or an atheist?

Third case

Tommy is a scientist, and she believes that her brain is reliable. By this, I mean that she trusts her ability to reason both deductively and inductively. However, she isn’t totally certain about this, and holds out a little credence for radical skepticism. She is also totally certain about the content of her experiences, though not its interpretation (i.e. if she sees red, she is 100% confident that she is experiencing red, although she isn’t necessarily certain about what in the external world is causing the experience).

One day Tommy discovers that reasoning deductively and inductively from her experiences leads her to a model of the world that entails that her brain is actually a quantum fluctuation blipping into existence outside the event hole of a black hole. She realizes that this means that with overwhelmingly high probability, her brain is not reliable and is just producing random noise uncorrelated with reality.

The problem is, if Tommy believes that her brain is not reliable, then she can no longer accept the validity of the argument that led her to this position! Why? Well, she no longer trusts her ability to reason deductively or inductively. So she can’t accept any argument, let alone this particular one.

What should Tommy believe?

— — —

How are these three cases similar and different? If you think that Joe should be an intuitionist, or Karl an atheist, then should Tommy believe herself to be a black hole brain? Because it turns out that many cosmologists have found themselves to be in a situation analogous to Case 3! (Link.) I have my own thoughts on this, but I won’t share them for now.

How will quantum computing impact the world?

A friend of mine recently showed me an essay series on quantum computers. These essays are fantastically well written and original, and I highly encourage anybody with the slightest interest in the topic to check them out. They are also interesting to read from a pedagogical perspective, as experiments in a new style of teaching (self-described as an “experimental mnemonic medium”).

There’s one particular part of the post which articulated the potential impact of quantum computing better than I’ve seen it articulated before. Reading it has made me update some of my opinions about the way that quantum computers will change the world, and so I want to post that section here with full credit to the original authors Michael Nielsen and Andy Matuschak. Seriously, go to the original post and read the whole thing! You won’t regret it.

No, really, what are quantum computers good for?

It’s comforting that we can always simulate a classical circuit – it means quantum computers aren’t slower than classical computers – but doesn’t answer the question of the last section: what problems are quantum computers good for? Can we find shortcuts that make them systematically faster than classical computers? It turns out there’s no general way known to do that. But there are some interesting classes of computation where quantum computers outperform classical.

Over the long term, I believe the most important use of quantum computers will be simulating other quantum systems. That may sound esoteric – why would anyone apart from a quantum physicist care about simulating quantum systems? But everybody in the future will (or, at least, will care about the consequences). The world is made up of quantum systems. Pharmaceutical companies employ thousands of chemists who synthesize molecules and characterize their properties. This is currently a very slow and painstaking process. In an ideal world they’d get the same information thousands or millions of times faster, by doing highly accurate computer simulations. And they’d get much more useful information, answering questions chemists can’t possibly hope to answer today. Unfortunately, classical computers are terrible at simulating quantum systems.

The reason classical computers are bad at simulating quantum systems isn’t difficult to understand. Suppose we have a molecule containing n atoms – for a small molecule, n may be 1, for a complex molecule it may be hundreds or thousands or even more. And suppose we think of each atom as a qubit (not true, but go with it): to describe the system we’d need 2^n different amplitudes, one amplitude for each bit computational basis state, e.g., |010011.

Of course, atoms aren’t qubits. They’re more complicated, and we need more amplitudes to describe them. Without getting into details, the rough scaling for an natom molecule is that we need k^n amplitudes, where . The value of k depends upon context – which aspects of the atom’s behavior are important. For generic quantum simulations k may be in the hundreds or more.

That’s a lot of amplitudes! Even for comparatively simple atoms and small values of n, it means the number of amplitudes will be in the trillions. And it rises very rapidly, doubling or more for each extra atom. If , then even natoms will require 100 million trillion amplitudes. That’s a lot of amplitudes for a pretty simple molecule.

The result is that simulating such systems is incredibly hard. Just storing the amplitudes requires mindboggling amounts of computer memory. Simulating how they change in time is even more challenging, involving immensely complicated updates to all the amplitudes.

Physicists and chemists have found some clever tricks for simplifying the situation. But even with those tricks simulating quantum systems on classical computers seems to be impractical, except for tiny molecules, or in special situations. The reason most educated people today don’t know simulating quantum systems is important is because classical computers are so bad at it that it’s never been practical to do. We’ve been living too early in history to understand how incredibly important quantum simulation really is.

That’s going to change over the coming century. Many of these problems will become vastly easier when we have scalable quantum computers, since quantum computers turn out to be fantastically well suited to simulating quantum systems. Instead of each extra simulated atom requiring a doubling (or more) in classical computer memory, a quantum computer will need just a small (and constant) number of extra qubits. One way of thinking of this is as a loose quantum corollary to Moore’s law:

The quantum corollary to Moore’s law: Assuming both quantum and classical computers double in capacity every few years, the size of the quantum system we can simulate scales linearly with time on the best available classical computers, and exponentially with time on the best available quantum computers.

In the long run, quantum computers will win, and win easily.

The punchline is that it’s reasonable to suspect that if we could simulate quantum systems easily, we could greatly speed up drug discovery, and the discovery of other new types of materials.

I will risk the ire of my (understandably) hype-averse colleagues and say bluntly what I believe the likely impact of quantum simulation will be: there’s at least a 50 percent chance quantum simulation will result in one or more multi-trillion dollar industries. And there’s at least a 30 percent chance it will completely change human civilization. The catch: I don’t mean in 5 years, or 10 years, or even 20 years. I’m talking more over 100 years. And I could be wrong.

What makes me suspect this may be so important?

For most of history we humans understood almost nothing about what matter is. That’s changed over the past century or so, as we’ve built an amazingly detailed understanding of matter. But while that understanding has grown, our ability to control matter has lagged. Essentially, we’ve relied on what nature accidentally provided for us. We’ve gotten somewhat better at doing things like synthesizing new chemical elements and new molecules, but our control is still very primitive.

We’re now in the early days of a transition where we go from having almost no control of matter to having almost complete control of matter. Matter will become programmable; it will be designable. This will be as big a transition in our understanding of matter as the move from mechanical computing devices to modern computers was for computing. What qualitatively new forms of matter will we create? I don’t know, but the ability to use quantum computers to simulate quantum systems will be an essential part of this burgeoning design science.

Quantum computing for the very curious
(Andy Matuschak and Michael Nielsen)

The EPR Paradox

The Paradox

I only recently realized how philosophical the original EPR paper was. It starts out by providing a sufficient condition for something to be an “element of reality”, and proceeds from there to try to show the incompleteness of quantum mechanics. Let’s walk through this argument here:

The EPR Reality Condition: If at time t we can know the value of a measurable quantity with certainty without in any way disturbing the system, then there is an element of reality corresponding to that measurable quantity at time t. (i.e. this is a sufficient condition for a measurable property of a system at some moment to be an element of the reality of that system at that moment:)

Example 1: If you measure an electron spin to be up in the z direction, then quantum mechanics tells you that you can predict with certainty that the spin in the z direction will up at any future measurement. Since you can predict this with certainty, there must be an aspect or reality corresponding to the electron z-spin after you have measured it to be up the first time.

Example 2: If you measure an electron spin to be up in the z-direction, then QM tells you that you cannot predict the result of measuring the spin in the x-direction at a later time. So the EPR reality condition does not entail that the x-spin is an element of the reality of this electron. It also doesn’t entail that the x-spin is NOT an element of the reality of this electron, because the EPR reality condition is merely a sufficient condition, not a necessary condition.

Now, what does the EPR reality condition have to say about two particles with entangled spins? Well, suppose the state of the system is initially

|Ψ> = (|↑↓ – |↓↑) / √2

This state has the unusual property that it has the same form no matter what basis you express it in. You can show for yourself that in the x-spin basis, the state is equal to

|Ψ> = (|→← – |←→) / √2

Now, suppose that you measure the first electron in the z-basis and find it to be up. If you do this, then you know with certainty that the other electron will also be measured to be up. This means that after measuring it in the z-basis, the EPR reality condition says that electron 2 has z-spin up as an element of reality.

What if you instead measure the first electron in the x-basis and find it to be right? Well, then the EPR reality condition will tell you that the electron 2 has x-spin right as an element of reality.

Okay, so we have two claims:

  1. That after measuring the z-spin of electron 1, electron 2 has a definite z-spin, and
  2. that after measuring the x-spin of electron 1, electron 2 has a definite x-spin.

But notice that these two claims are not necessarily inconsistent with the quantum formalism, since they refer to the state of the system after a particular measurement. What’s required to bring out a contradiction is a further assumption, namely the assumption of locality.

For our purposes here, locality just means that it’s possible to measure the spin of electron 1 in such a way as to not disturb the state of electron 2. This is a really weak assumption! It’s not saying that any time you measure the spin of electron 1, you will not have disturbed electron 2. It’s just saying that it’s possible in principle to set up a measurement of the first electron in such a way as to not disturb the second one. For instance, take electrons 1 and 2 to opposite sides of the galaxy, seal them away in totally closed off and causally isolated containers, and then measure electron 1. If you agree that this should not disturb electron 2, then you agree with the assumption of locality.

Now, with this additional assumption, Einstein Podolsky and Rosen realized that our earlier claims (1) and (2) suddenly come into conflict! Why? Because if it’s possible to measure the z-spin of electron 1 in a way that doesn’t disturb electron 2 at all, then electron 2 must have had a definite z-spin even before the measurement of electron 1!

And similarly, if it’s possible to measure the x-spin of electron 1 in a way that doesn’t disturb electron 2, then electron 2 must have had a definite x-spin before the first electron was measured!

What this amounts to is that our two claims become the following:

  1. Electron 2 has a definite z-spin at time t before the measurement.
  2. Electron 2 has a definite x-spin at time t before the measurement.

And these two claims are in direct conflict with quantum theory! Quantum mechanics refuses to assign a simultaneous x and z spin to an electron, since these are incompatible observables. This entails that if you buy into locality and the EPR reality condition, then you must believe that quantum mechanics is an incomplete description of nature, or in other words that there are elements of reality that can not described by quantum mechanics.

The Resolution(s)

Our argument rested on two premises: the EPR reality condition and locality. Its conclusion was that quantum mechanics was incomplete. So naturally, there are three possible paths you can take to respond: accept the conclusion, deny the second premise, or deny the first premise.

To accept the conclusion is to agree that quantum mechanics is incomplete. This is where hidden variable approaches fall, and was the path that Einstein dearly hoped would be vindicated. For complicated reasons that won’t be covered in this post, but which I talk about here, the prospects for any local realist hidden variables theory (which was what Einstein wanted) look pretty dim.

To deny the second premise is to say that in fact, measuring the spin of the first electron necessarily disturbs the state of the second electron, no matter how you set things up. This is in essence a denial of locality, since the two electrons can be time-like separated, meaning that this disturbance must have propagated faster than the speed of light. This is a pretty dramatic conclusion, but is what orthodox quantum mechanics in fact says. (It’s implied by the collapse postulate.)

To deny the first premise is to say that in fact there can be some cases in which you can predict with certainty a measurable property of a system, but where nonetheless there is no element of reality corresponding to this property. I believe that this is where Many-Worlds falls, since measurement of z-spin doesn’t result in an electron in an unambiguous z-spin state, but in a combined superposition of yourself, your measuring device, the electron, and the environment. Needless to say, in this complicated superposition there is no definite fact about the z-spin of the electron.

I’m a little unsure about where the right place to put psi-epistemic approaches like Quantum Bayesianism, which resolve the paradox by treating the wave function not as a description of reality, but solely as a description of our knowledge. In this way of looking at things, it’s not surprising that learning something about an electron at one place can instantly tell you something about an electron at a distant location. This does not imply any faster-than-light communication, because all that’s being described is the way that information-processing occurs in a rational agent’s brain.

Visualizing Special Relativity

I’ve been thinking a lot about special relativity recently, and wrote up a fun program for visualizing some of its stranger implications. Before going on to these visualizations, I want to recommend the Youtube channel MinutePhysics, which made a fantastic primer on the subject. I’ll link the first few of these here, as they might help with understanding the rest of the post. I highly recommend the entire series, even if you’re already pretty familiar with the subject.

Now, on to the pretty images! I’m still trying to determine whether it’s possible to embed applets in my posts, so that you can play with the program for yourself. Until I figure that out, GIFs will have to suffice.

lots of particles

Let me explain what’s going on in the image.

First of all, the vertical direction is time (up is the future, down is the past), and the horizontal direction is space (which is 1D for simplicity). What we’re looking at is the universe as described by an observer at a particular point in space and time. The point that this observer is at is right smack-dab in the center of the diagram, where the two black diagonal lines meet. These lines represent the observer’s light cone: the paths through spacetime that would be taken by beams of light emitted in either direction. And finally, the multicolored dots scattered in the upper quadrant represent other spacetime events in the observer’s future.

Now, what is being varied is the velocity of the observer. Again, keep in mind that the observer is not actually moving through time in this visualization. What is being shown is the way that other events would be arranged spatially and temporally if the observer had different velocities.

Take a second to reflect on how you would expect this diagram to look classically. Obviously the temporal positions of events would not depend upon your velocity. What about the spatial positions of events? Well, if you move to the right, events in your future and to the right of you should be nearer to you than they would be had you not been in motion. And similarly, events in your future left should be further to the left. We can easily visualize this by plugging in the classical Galilean transformation:

Classical Transformation.gif

Just as we expected, time positions stay constant and spatial positions shift according to your velocity! Positive velocity (moving to the right) moves future events to the left, and negative velocity moves them to the right. Now, technically this image is wrong. I’ve kept the light paths constant, but even these would shift under the classical transformation. In reality we’d get something like this:

Classical Corrected.gif

Of course, the empirical falsity of this prediction that the speed of light should vary according to your own velocity is what drove Einstein to formulate special relativity. Here’s what happens with just a few particles when we vary the velocity:

RGB Transform

What I love about this is how you can see so many effects in one short gif. First of all, the speed of light stays constant. That’s a good sign! A constant speed of light is pretty much the whole point of special relativity. Secondly, and incredibly bizarrely, the temporal positions of objects depend on your velocity!! Objects to your future right don’t just get further away spatially when you move away from them, they also get further away temporally!

Another thing that you can see in this visualization is the relativity of simultaneity. When the velocity is zero, Red and Blue are at the same moment of time. But if our velocity is greater than zero, Red falls behind Blue in temporal order. And if we travel at a negative velocity (to the left), then we would observe Red as occurring after Blue in time. In fact, you can find a velocity that makes any two of these three points simultaneous!

This leads to the next observation we can make: The temporal order of events is relative! The orderings of events that you can observe include Red-Green-Blue, Green-Red-Blue, Green-Blue-Red, and Blue-Green-Red. See if you can spot them all!

This is probably the most bonkers consequence of special relativity. In general, we cannot say without ambiguity that Event A occurred before or after Event B. The notion of an objective temporal ordering of events simply must be discarded if we are to hold onto the observation of a constant speed of light.

Are there any constraints on the possible temporal orderings of events? Or does special relativity commit us to having to say that from some valid frames of reference, the basketball going through the net preceded the throwing of the ball? Well, notice that above we didn’t get all possible orders… in particular we didn’t have Red-Blue-Green or Blue-Red-Green. It turns out that in general, there are some constraints we can place on temporal orderings.

Just for fun, we can add in the future light cones of each of the three events:

RGB with Light Cones.gif

Two things to notice: First, all three events are outside each others’ light cones. And second, no event ever crosses over into another event’s light cone. This makes some intuitive sense, and gives us a constant that will hold true in all reference frames: Events that are outside each others’ light cones from one perspective, are outside each others’ light cones from all perspectives. Same thing for events that are inside each others’ light cones.

Conceptually, events being inside each others’ light cones corresponds to them being in causal contact. So another way we can say this is that all observers will agree on what the possible causal relationships in the universe are. (For the purposes of this post, I’m completely disregarding the craziness that comes up when we consider quantum entanglement and “spooky action at a distance.”) 

Now, is it ever possible for events in causal contact to switch temporal order upon a change in reference frame? Or, in other words, could effects precede their causes? Let’s look at a diagram in which one event is contained inside the light cone of another:

RGB Causal

Looking at this visualization, it becomes quite obvious that this is just not possible! Blue is fully contained inside the future light cone of Red, and no matter what frame of reference we choose, it cannot escape this. Even though we haven’t formally proved it, I think that the visualization gives the beginnings of an intuition about why this is so. Let’s postulate this as another absolute truth: If Event A is contained within the light cone of Event B, all observers will agree on the temporal order of the two events. Or, in plainer language, there can be no controversy over whether a cause precedes its effects.

I’ll leave you with some pretty visualizations of hundreds of colorful events transforming as you change reference frames:

Pretty Transforms LQ

And finally, let’s trace out the set of possible space-time locations of each event.

Hyperbolas

Screen Shot 2018-12-06 at 3.22.43 PM.png

Try to guess what geometric shape these paths are! (They’re not parabolas.) Hint.

 

Bad Science Reporting

(Sorry, I know I promised to describe an experiment that would give evidence for different theories of consciousness, but I want to do a quick rant about something else first. The consciousness/anthropics post is coming soon.)

From The Economist: Spare the rod: Spanking makes your children stupid

The article cites “nearly 30 studies from various countries” that “show that children who are regularly spanked become more aggressive themselves” and are “more likely to be depressed or take drugs.” And most relevant to the title of the article, another large study found that “young children in homes with little or no spanking showed swifter cognitive development than their peers.”

Now ask yourself, is it likely that the studies they are citing actually provide evidence for the causal claim they are using to motivate their parenting advice?

To do so, you would need a study involving some type of intervention in parenting behavior (either natural or experimental). That is, you would want a group of researchers who randomly select a group of parents, and tell half of them to beat their kids and the other half not to. Now, do you suspect that these are the types of studies the Economist is citing?? I think not. (I hope not…)

Maybe they got lucky and found a historical circumstance in which there was a natural intervention (one not induced by the experimenters but by some natural phenomenon), and found data about outcomes before and after this intervention. But for this, we’d have to find an occasion where suddenly a random group of parents were forced to stop or start beating their children, while another random group kept at it without changing their behavior. Maybe we could find something like this (like if there were two nearby towns with very similar parenting habits, and one of them suddenly enacted legislation banning corporal punishment), but it seems pretty unlikely. And indeed, if you look at the studies themselves, you find that they are just standard correlational studies.

Maybe they were able to control for all the major confounding variables, and thus get at the real causal relationship? But… really? All the major confounding variables? There are a few ways to do this (like twin studies), but the studies cited don’t do this.

Now, maybe you’re thinking I’m nitpicking. Sure, the studies only find correlations between corporal punishment and outcomes, but isn’t the most reasonable explanation of this that corporal punishment is causing those outcomes?

Judy Rich Harris would disagree. Her famous book The Nurture Assumption looked at the best research attempting to study the causal effects of parenting style on late-life outcomes and found astoundingly little evidence for any at all. That is, when you really are able to measure how much parenting style influences kids’ outcomes down the line, it’s hard to make out any effect in most areas.  And anyway, regardless of the exact strength of the effect of parenting style on later-life outcomes, one thing that’s clear to me is that it is not as strong as we might intuitively suspect. We humans are very good at observing correlation and assuming causation, and can often be surprised at what we find when rigorous causal studies are done.

Plus, it’s not too hard to think of alternative explanations for the observed correlations. To name one, we know that intelligence is heritable, that intelligence is highly correlated with positive life outcomes, and that poor people practice corporal punishment more than rich people. Given these three facts, we’d actually be more surprised if we found no correlation between intelligence and corporal punishment, even if we had no belief that the latter causes the former. And another: aggression is heritable and correlated with general antisocial behavior, which is in turn correlated with negative life outcomes. And I’m sure you can come up with more.

This is not really the hill I want to die on. I agree that corporal punishment is a bad strategy for parenting. But this is not because of a strong belief that in the end spanking leads to depression, drug addiction, and stupidity. I actually suspect that in the long run, spanking is pretty nearly net neutral; the effects probably wash out like most everything else in a person’s childhood. (My prior in this is not that strong and could easily be swayed by seeing actual causal studies that report the opposite.) There’s a much simpler reason to not hit your kids: that it hurts them! Hurting children is bad, so hitting your kids is bad; QED.

Regardless, what does matter to me is good science journalism, especially when it involves giving behavioral advice on the basis of a misleading interpretation of the science. I used to rail against the slogan “correlation does not imply causation”, as, in fact, in some cases correlational data can prove causal claims. But I now have a better sense of why this slogan is so important to promulgate. The cases where correlation proves causation are a tiny subset of the cases where correlation is claimed to prove causation by overenthusiastic science reporters unconcerned with the dangers of misleading their audience. I can’t tell you how often I see pop-science articles making exactly this mistake to very dramatic effect (putting more books in your home will raise your child’s IQ! Climate change is making suicide rates rise!! Eating yogurt causes cancer!!!)

So this is a PSA. Watch out for science reporting that purports to demonstrate causation. Ask yourself how researchers could have established these causal claims, and whether or not it seems plausible that they did so. And read the papers themselves. You might have to work through some irritating academese, but the scientists themselves typically do a good job making disclaimers like NOTICE THAT WE HAVEN’T ACTUALLY DEMONSTRATED A CAUSAL LINK HERE. (These are then often conveniently missed by the journalists reporting on them.)

For example, in the introduction to the paper cited by the Economist the authors write that “we should be very careful about drawing any causal conclusions here, even when there are robust associations. It is very likely that there will be other factors associated with both spanking and child outcomes. If certain omitted variables are correlated with both, we may confound the two effects, that is, inappropriately attribute an effect to spanking. For example, parents who spank their children may be weaker parents overall, and spanking is simply one way in which this difference in parenting quality manifests itself.”

This is a very explicit disclaimer to miss and then to go on writing a headline that gives explicit parenting advice that relies on a causal interpretation of the data!

Some simple probability puzzles

(Most of these are taken from Ian Hacking’s Introduction to Probability and Inductive Logic.)

  1. About as many boys as girls are born in hospitals. Many babies are born every week at City General. In Cornwall, a country town, there is a small hospital where only a few babies are born every week.

    Define a normal week as one where between 45% and 55% of babies are female. An unusual week is one where more than 55% or less than 45% are girls.

    Which of the following is true:
    (a) Unusual weeks occur equally often at City General and at Cornwall.
    (b) Unusual weeks are more common at City General than at Cornwall.
    (c) Unusual weeks are more common at Cornwall than at City General.

  2. Pia is 31 years old, single, outspoken, and smart. She was a philosophy major. When a student, she was an ardent supporter of Native American rights, and she picketed a department store that had no facilities for nursing mothers.

    Which of the following statements are most probable? Which are least probable?

    (a) Pia is an active feminist.
    (b) Pia is a bank teller.
    (c) Pia works in a small bookstore.
    (d) Pia is a bank teller and an active feminist.
    (e) Pia is a bank teller and an active feminist who takes yoga classes.
    (f) Pia works in a small bookstore and is an active feminist who takes yoga classes.

  3. You have been called to jury duty in a town with only green and blue taxis. Green taxis dominate the market, with 85% of the taxis on the road.

    On a misty winter night a taxi sideswiped another car and drove off. A witness said it was a blue cab. This witness is tested under similar conditions, and gets the color right 80% of the time.

    You conclude about the sideswiping taxi:
    (a) The probability that it is blue is 80%.
    (b) It is probably blue, but with a lower probability than 80%.
    (c) It is equally likely to be blue or green.
    (d) It is more likely than not to be green.

  4. You are a physician. You think that it’s quite likely that a patient of yours has strep throat. You take five swabs from the throat of this patient and send them to a lab for testing.

    If the patient has strep throat, the lab results are right 70% of the time. If not, then the lab is right 90% of the time.

    The test results come back: YES, NO, NO, YES, NO

    You conclude:
    (a) The results are worthless.
    (b) It is likely that the patient does not have strep throat.
    (c) It is slightly more likely than not that the patient does have strep throat.
    (d) It is very much more likely than not that the patient does have strep throat.

  5. In a country, all families wants a boy. They keep having babies till a boy is born. What is the expected ratio of boys and girls in the country?
  6.  Answer the following series of questions:

    If you flip a fair coin twice, do you have the same chance of getting HH as you have of getting HT?

    If you flip the coin repeatedly until you get HH, does this result in the same average number of flips as if you repeat until you get HT?

    If you flip it repeatedly until either HH emerges or HT emerges, is either outcome equally likely?

    You play a game with a friend in which you each choose a sequence of three possible flips (e.g HHT and TTH). You then flip the coin repeatedly until one of the two patterns emerges, and whosever pattern it is wins the game. You get to see your friend’s choice of pattern before deciding yours. Are you ever able to bias the game in your favor?

    Are you always able to bias the game in your favor?

 

Solutions (and lessons)

  1. The correct answer is (a): Unusual weeks occur more often at Cornwall than at City General. Even though the chance of a boy is the same at Cornwall as it is at City General, the percentage of boys from week to week is larger in the smaller city (for N patients a week, the percentage boys goes like 1/sqrt(N)). Indeed, if you think about an extreme case where Cornwall has only one birth a week, then every week will be an unusual week (100% boys or 0% boys).
  2. There is room to debate the exact answer but whatever it is, it has to obey some constraints. Namely, the most probable statement cannot be (d), (e), or (f), and the least probable statement cannot be (a), (b), or (c). Why? Because of the conjunction rule of probability: each of (d), (e), and (f) are conjunctions of (a), (b), and (c), so they cannot be more likely. P(A & B) ≤ P(A).

    It turns out that most people violate this constraint. Many people answer that (f) is the most probable description, and (b) is the least probable. This result is commonly interpreted to reveal a cognitive bias known as the representativeness heuristic – essentially, that our judgements of likelihood are done by considering which descriptions most closely resemble the known facts. In this case,

    Another factor to consider is that prior to considering the evidence, your odds on a given person being a bank teller as opposed to working in a small bookstore should be heavily weighted towards her being a bank teller. There are just far more bank tellers than small bookstore workers (maybe a factor of around 20:1). This does not necessarily mean that (b) is more likely than (c), but it does mean that the evidence must discriminate strongly enough against her being a bank teller so as to overcome the prior odds.

    This leads us to another lesson, which is to not neglect the base rate. It is easy to ignore the prior odds when it feels like we have strong evidence (Pia’s age, her personality, her major, etc.). But the base rate on small bookstore workers and bank tellers are very relevant to the final judgement.

  3. The correct answer is (d) – it is more likely than not that the sideswiper was green. This is a basic case of base rate neglect – many people would see that the witness is right 80% of the time and conclude that the witness’s testimony has an 80% chance of being correct. But this is ignoring the prior odds on the content of the witness’s testimony.

    In this case, there were prior odds of 17:3 (85%:15%) in favor of the taxi being green. The evidence had a strength of 1:4 (20%:80%), resulting in the final odds being 17:12 in favor of the taxi being green. Translating from odds to probabilities, we get a roughly 59% chance of the taxi having been green.

    We could have concluded (d) very simply by just comparing the prior probability (85% for green) with the evidence (80% for blue), and noticing that the evidence would not be strong enough to make blue more likely than green (since 85% > 80%). Being able to very quickly translate between statistics and conclusions is a valuable skill to foster.

  4. The right answer is (d). We calculate this just like we did the last time:

    The results were YES, NO, NO, YES, NO.

    Each YES provides evidence with strength 7:1 (70%/10%) in favor of strep, and each NO provides evidence with strength 1:3 (30%/90%).

    So our strength of evidence is 7:1 ⋅ 1:3 ⋅ 1:3 ⋅ 7:1 ⋅ 1:3 = 49:27, or roughly 1.81:1 in favor of strep. This might be a little surprising… we got more NOs than YESs and the NO was correct 90% of the time for people without strep, compared to the YES being correct only 70% of the time in people with strep.

    Since the evidence is in favor of strep, and we started out already thinking that strep was quite likely, in the end we should be very convinced that they have strep. If our prior on the patient having strep was 75% (3:1 odds), then our probability after getting evidence will be 84% (49:9 odds).

    Again, surprising! The patient who sees these results and hears the doctor declaring that the test strengthens their belief that the patient has strep might feel that this is irrational and object to the conclusion. But the doctor would be right!

  5. Supposing as before that the chance of any given birth being a boy is equal to the chance of it being a girl, we end up concluding…

    The expected ratio of boys and girls in the country is 1! That is, this strategy doesn’t allow you to “cheat” – it has no impact at all on the ratio. Why? I’ll leave this one for you to figure out. Here’s a diagram for a hint:

    36666658_10216831977421805_8359037287605993472_n

    This is important because it applies to the problem of p-hacking. Imagine that all researchers just repeatedly do studies until they get the results they like, and only publish these results. Now suppose that all the researchers in the world are required to publish every study that they do. Now, can they still get a bias in favor of results they like? No! Even though they always stop when getting the result they like, the aggregate of their studies is unbiased evidence. They can’t game the system!

  6.  Answers, in order:

    If you flip a fair coin twice, do you have the same chance of getting HH as you have of getting HT? (Yes)

    If you flip it repeatedly until you get HH, does this result in the same average number of flips as if you repeat until you get HT? (No)

    If you flip it repeatedly until either HH emerges or HT emerges, is either outcome equally likely? (Yes)

    You play a game with a friend in which you each choose a sequence of three coin flips (e.g HHT and TTH). You then flip a coin repeatedly until one of the two patterns emerges, and whosever pattern it is wins the game. You get to see your friend’s choice of pattern before deciding yours. Are you ever able to bias the game in your favor? (Yes)

    Are you always able to bias the game in your favor? (Yes!)

    Here’s a wiki page with a good explanation of this: LINK. A table from that page illustrating a winning strategy for any choice your friend makes:

    1st player’s choice 2nd player’s choice Odds in favour of 2nd player
    HHH THH 7 to 1
    HHT THH 3 to 1
    HTH HHT 2 to 1
    HTT HHT 2 to 1
    THH TTH 2 to 1
    THT TTH 2 to 1
    TTH HTT 3 to 1
    TTT HTT 7 to 1

Explanation is asymmetric

We all regularly reason in terms of the concept of explanation, but rarely think hard about what exactly we mean by this explanation. What constitutes a scientific explanation? In this post, I’ll point out some features of explanation that may not be immediately obvious.

Let’s start with one account of explanation that should seem intuitively plausible. This is the idea that to explain X to a person is to give that person some information I that would have allowed them to predict X.

For instance, suppose that Janae wants an explanation of why Ari is not pregnant. Once we tell Janae that Ari is a biological male, she is satisfied and feels that the lack of pregnancy has been explained. Why? Well, because had Janae known that Ari was a male, she would have been able to predict that Ari would not get pregnant.

Let’s call this the “predictive theory of explanation.” On this view, explanation and prediction go hand-in-hand. When somebody learns a fact that explains a phenomenon, they have also learned a fact that allows them to predict that phenomenon.

 To spell this out very explicitly, suppose that Janae’s state of knowledge at some initial time is expressed by

K1 = “Males cannot get pregnant.”

At this point, Janae clearly cannot conclude anything about whether Ari is pregnant. But now Janae learns a new piece of information, and her state of knowledge is updated to

K2 = “Ari is a male & males cannot get pregnant.”

Now Janae is warranted in adding the deduction

K’ = “Ari cannot get pregnant”

This suggests that added information explains Ari’s non-pregnancy for the same reason that it allows the deduction of Ari’s non-pregnancy.

Now, let’s consider a problem with this view: the problem of relevance.

Suppose a man named John is not pregnant, and somebody explains this with the following two premises:

  1. People who take birth control pills almost certainly don’t get pregnant.
  2. John takes birth control pills regularly.

Now, these two premises do successfully predict that John will not get pregnant. But the fact that John takes birth control pills regularly gives no explanation at all of his lack of pregnancy. Naively applying the predictive theory of explanation gives the wrong answer here.

You might have also been suspicious of the predictive theory of explanation on the grounds that it relied on purely logical deduction and a binary conception of knowledge, not allowing us to accommodate the uncertainty inherent in scientific reasoning. We can fix this by saying something like the following:

What it is to explain X to somebody that knows K is to give them information I such that

(1) P(X | K) is small, and
(2) P(X | K, I) is large.

“Small” and “large’ here are intentionally vague; it wouldn’t make sense to draw a precise line in the probabilities.

The idea here is that explanations are good insofar as they (1) make their explanandum sufficiently likely, where (2) it would be insufficiently likely without them.

We can think of this as a correlational account of explanation. It attempts to root explanations in sufficiently strong correlations.

First of all, we can notice that this doesn’t suffer from a problem with irrelevant information. We can find relevance relationships by looking for independencies between variables. So maybe this is a good definition of scientific explanation?

Unfortunately, this “correlational account of explanation” has its own problems.

Take the following example.

uploaded image

This flagpole casts a shadow of length L because of the angle of elevation of the sun and the height of the flagpole (H). In other words, we can explain the length of the shadow with the following pieces of information:

I1 =  “The angle of elevation of the sun is θ”
I2 = “The height of the lamp post is H”
I3 = Details involving the rectilinear propagation of light and the formation of shadows

Both the predictive and correlational theory of explanation work fine here. If somebody wanted an explanation for why the shadow’s length is L, then telling them I1, I2, and I3 would suffice. Why? Because I1, I2, and Ijointly allow us to predict the shadow’s length! Easy.

X = “The length of the shadow is L.”
(I1 & I2 & I3) ⇒ X
So I1 & I2 & I3 explain X.

And similarly, P(X | I1 & I2 & I3) is large, and P(X) is small. So on the correlational account, the information given explains X.

But now, consider the following argument:

(I1 & I3 & X) ⇒ I2
So I1 & I3 & X explain I2.

The predictive theory of explanation applies here. If we know the length of the shadow and the angle of elevation of the sun, we can deduce the height of the flagpole. And the correlational account tells us the same thing.

But it’s clearly wrong to say that the explanation for the height of the flagpole is the length of the shadow!

What this reveals is an asymmetry in our notion of explanation. If somebody already knows how light propagates and also knows θ, then telling them H explains L. But telling them L does not explain H!

In other words, the correlational theory of explanation fails, because correlation possesses symmetry properties that explanation does not.

This thought experiment also points the way to a more complete account of explanation. Namely, the relevant asymmetry between the length of the shadow and the height of the flagpole is one of causality. The reason why the height of the flagpole explains the shadow length but not vice versa, is that the flagpole is the cause of the shadow and not the reverse.

In other words, what this reveals to us is that scientific explanation is fundamentally about finding causes, not merely prediction or statistical correlation. This causal theory of explanation can be summarized in the following:

An explanation of A is a description of its causes that renders it intelligible.

More explicitly, an explanation of A (relative to background knowledge K) is a set of causes of A that render X intelligible to a rational agent that knows K.