Two principles of Bayesian reasoning

Bayes’ rule is a pretty simple piece of mathematics, and it’s extraordinary to me the amount of deep insight that can be plumbed by looking closely at it and considering its implications.

Principle 1: The surprisingness of an observation is proportional to the amount of evidence it provides.

Evidence that you expect to observe is weak evidence, while evidence that is unexpected is strong evidence.

This follows directly from Bayes’ theorem:

Screen Shot 2018-08-08 at 11.46.26 PM.png

If E is very unexpected, then P(E) is very small. This puts an upwards pressure on the posterior probability, entailing a large belief update. If E is thoroughly unsurprising, then P(E) is near 1, which means that this upward pressure is not there.

A more precise way to say this is to talk about how surprising evidence is given a particular theory.

Screen Shot 2018-08-08 at 11.39.42 PM

On the left is a term that (1) is large when E provides strong evidence for H, (2) is near zero when it provides strong evidence against H, and (3) is near 1 when it provides weak evidence regarding H.

On the right is a term that (1) is large if E is very unsurprising given H, (2) is near zero when E is very surprising given H, and (3) is near 1 when E is not made much more surprising or unsurprising by H.

What we get is that (1) E provides strong evidence for H when E is very unsurprising given H, (2) E provides strong evidence against H when it is very surprising given H, and (3) E provides weak evidence regarding H when it is not much more surprising or unsurprising given H.

This makes a lot of sense when you think through it. Theories that make strong and surprising predictions that turn out to be right, are given stronger evidential weight than theories that make weak and unsurprising predictions.

Principle 2: Conservation of expected evidence

I stole the name of this principle from Eliezer Yudkowsky, who wrote about this here.

The idea here is that for any expectation you have of receiving evidence for a belief, you should have an equal and opposite expectation of receiving evidence against a belief. It cannot be the case that all possible observations support a theory. If some observations support a theory, then there must be some other observations that undermine it. And the precise amount that these observations undermine this theory balances the expected evidential support of the theory.

Proof of this:

Screen Shot 2018-08-09 at 12.18.55 AM.png

The first term is the expected change in credence in H after observing E, and the second is the expected change in credence in H after observing -E. Thus, the average expected change in credence is exactly zero.

Putting these together, we see that a strong expectation corresponds to weak evidence, and this strong expectation of weak evidence also corresponds to a weak expectation of strong evidence!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s