One interesting fact about Bayesianism is that you should never be able to predict beforehand how a piece of evidence will change your credences.

For example, sometimes I think that the more I study economics, the more I will become impressed by the power of free markets, and that therefore I will become more of a capitalist. Another example is that I often think that over time I will become and more moderate in many ways (this is partially just because of induction on the direction of my change – it seems like when I’ve held extreme beliefs in the past, these beliefs tend to have become more moderate over time.)

Now, in these two cases I might be right as a matter of psychology. It might be the case that if I study economics, I’ll be more likely to end up a supporter of capitalism than if I don’t. It might be the case that as time passes, I’ll become increasingly moderate until I die out of boredom with my infinitely bland beliefs.

But even if I’m right about these things as a matter of psychology, I’m wrong about them as a matter of probability theory. It should *not* be the case that I can predict beforehand how my beliefs will change, for if it were, then why wait to change them? If I am really convinced that studying economics will make me a hardcore capitalist, then I should update in favor of capitalism in advance.

I think that this is a pretty common human mistake to make. Part of an explanation for this might be based off of the argumentative theory of reason – the idea that human rationality evolved as a method of winning arguments, not necessarily being well-calibrated in our pursuit of truth. If this is right, then it would make sense that we would only want to hold strong beliefs when we can confidently display strong evidence for them. It’s not that easy to convincingly argue for your position when part of our justification for it is something like “Well I don’t have the evidence yet, but I’ve got a pretty strong hunch that it’ll come out in this direction.”

Another factor might be the way that personality changes influence belief changes. It might be that when we say “I will become more moderate in beliefs as I age”, we’re really saying something like “My personality will become less contrarian as I age.” There’s still some type of mistake here, but it has to do with an overly strong influence of personality on beliefs, not the epistemic principle in question here.

Suppose that you have in front of you a coin with some unknown bias X. Your beliefs about the bias of the coin are captured in a probability distribution over possible values of X, P(X). Now you flip the coin and observe whether it lands heads or tails. If it lands H, your new state of belief is P(X | H). If it lands T, your new state of belief is P(X | T). So before you observe the coin flip, your new *expected* distribution over the possible biases is just the weighted sum of these two:

Posterior is P(X | H) with probability P(H)

Posterior is P(X | T) with probability P(T)

Thus, expected posterior is P(X | H) P(H) + P(X | T) P(T)

= P(X)

Our expected posterior distribution is exactly as our prior posterior distribution! In other words, you can’t anticipate any particular change in your prior distribution. This makes some intuitive sense… if you knew beforehand that your prior would change in a particular direction, then you should have* already changed* in that direction!

In general: Take any hypothesis H and some piece of relevant evidence that will either turn out E or -E. Suppose you have some prior credence P(H) in this hypothesis before observing either E or -E. Your expected final credence is just:

P(H | E) P(E) + P(H | -E) P(-E)

Which of course is just another way of writing the prior credence P(H).

P(H) = P(H | E) P(E) + P(H | -E) P(-E)

Eliezer Yudkowsky has named this idea “conservation of expected evidence” – for any possible expectation of evidence you might receive, you should have an equal and opposite expectation of evidence in the other direction. It should never be the case that a hypothesis is confirmed no matter what evidence you receive. If E counts as evidence for H, then -E should count against it. And if you have a strong expectation of weak evidence E, then you should have a weak expectation of strong evidence -E. (If you strongly expect to see H, then it is weak evidence, and correspondingly you should have a weak expectation of seeing T, which would be strong evidence.)

This is a pretty powerful idea, and I find myself applying it as a thinking tool pretty regularly. (And Yudkowsky’s writings on LessWrong are chalk full of this type of probability theoretic wisdom, I highly recommend them.)

Today I was thinking about how we could determine, if not the average *change* in beliefs, the average *amount* that beliefs change. That is, while you can’t say beforehand that you will become more confident that a hypothesis is true, you can still say something about how much you expect your confidence to change* as an absolute value*.

Let’s stick with our simple coin-tossing example for illustration. Our prior distribution over possible biases is P(X). This distribution has some characteristic mean µ and standard deviation σ. We can also describe the mean of each possible posterior distribution:

Now we can look at how far away each of these updated means is from the original mean:

We want to average these differences, weighted by how likely we think we are to observe them. But if we did this, we’d just get zero. Why? Conservation of expected evidence rearing its head again! The average of the differences in means is the average amount that you think that your expectation of heads will move, and this cannot be nonzero.

What we want is a quantity that captures the absolute distance between the new means and the original mean. The standard way of doing this is the following:

This gives us:

This gives us a measure of how strong of a belief update we should expect to receive. I haven’t heard very much about this quantity (the square root of the weighted sum of the squares of the changes in means for all possible evidential updates), but it seems pretty important.

Notice also that this scales with the variance on our prior distribution. This makes a lot of sense, because a small variance on your prior implies a high degree of confidence, which entails a weak belief update. Similarly, a large variance on your prior implies a weakly held belief, and thus a strong belief update.

Let’s see what this measure gives for some simple distributions. First, the uniform distribution:

In this case, the predicted change in mean is exactly right! If we get H, our new mean becomes 2/3 (a change of +1/6) and if we get T, our new mean becomes 1/3 (a change of -1/6). Either way, the mean value of our distribution shifts by 1/6, just as our calculation predicted!

Let’s imagine you start maximally uncertain about the bias (the uniform prior) and then observe a H. Now with this new distribution, what is your expected magnitude of belief change?

Notice that this magnitude of belief change is smaller than for the original distribution. Once again, this makes a lot of sense: after getting more information, your beliefs become more ossified and less mutable.

In general, we can calculate ∆*µ* for the distribution that results from observing n heads and m tails, starting with a flat prior:

Asymptotically as n and m go to infinity (as you flip the coin arbitrarily many times), this relation becomes

Which clearly goes to zero, and pretty quickly as well. This amounts to the statement that as you get lots and lots of information, each subsequent piece of information matters comparatively less.