Bayes’ rule is a pretty simple piece of mathematics, and it’s extraordinary to me the amount of deep insight that can be plumbed by looking closely at it and considering its implications.

## Principle 1: The surprisingness of an observation is proportional to the amount of evidence it provides.

Evidence that you expect to observe is weak evidence, while evidence that is unexpected is strong evidence.

This follows directly from Bayes’ theorem:

If E is very unexpected, then P(E) is very small. This puts an upwards pressure on the posterior probability, entailing a large belief update. If E is thoroughly unsurprising, then P(E) is near 1, which means that this upward pressure is not there.

A more precise way to say this is to talk about how surprising evidence is *given a particular theory*.

On the left is a term that (1) is large when E provides strong evidence for H, (2) is near zero when it provides strong evidence against H, and (3) is near 1 when it provides weak evidence regarding H.

On the right is a term that (1) is large if E is very unsurprising given H, (2) is near zero when E is very surprising given H, and (3) is near 1 when E is not made much more surprising or unsurprising by H.

What we get is that (1) E provides strong evidence for H when E is very unsurprising given H, (2) E provides strong evidence against H when it is very surprising given H, and (3) E provides weak evidence regarding H when it is not much more surprising or unsurprising given H.

This makes a lot of sense when you think through it. Theories that make strong and surprising predictions that turn out to be right, are given stronger evidential weight than theories that make weak and unsurprising predictions.

## Principle 2: Conservation of expected evidence

I stole the name of this principle from Eliezer Yudkowsky, who wrote about this here.

The idea here is that for any expectation you have of receiving evidence for a belief, you should have an equal and opposite expectation of receiving evidence against a belief. It cannot be the case that all possible observations support a theory. If some observations support a theory, then there must be some other observations that undermine it. And the precise *amount* that these observations undermine this theory balances the expected evidential support of the theory.

Proof of this:

The first term is the expected change in credence in H after observing E, and the second is the expected change in credence in H after observing -E. Thus, the average expected change in credence is exactly zero.

Putting these together, we see that a strong expectation corresponds to weak evidence, and this strong expectation of weak evidence also corresponds to a *weak* expectation of *strong* evidence!