Noisy Evidence

Scope insensitivity is a cognitive bias that involves a failure to internalize the true scale of quantities. Some of the most striking and frankly depressing examples of this phenomenon involve altruistic behavior, where people care just as much about a cause regardless of how many lives are concerned. In some cases, increasing numbers of affected people result in decreasing willingness to pay.

This issue arises when quantitative metrics don’t line up with our intuitive metrics – 10 billion doesn’t feel 1000 times larger than 10 million. A solution that might be sometimes possible is to adjust the numerical scale you are dealing with to try to get the true scale to match the intuitive scale.

This is a large part of what I think is great about the notion of evidence as noise.

Humans have scope insensitivity with respect to very large and very small probabilities. 99.99% doesn’t feel that different to us from 99.9999%. But they are extremely different. The amount of evidence required to push you from 99.99% to 99.9999% is the same as the amount of evidence that would have pushed you from 9% to 91%. There is a big difference between 99.99% and 99.9999% in terms of the state of knowledge represented.

The problem is that as the probability approaches 100%, the number looks to us like it is barely budging. This can be fixed by making our scale logarithmic. We do this by first converting our probabilities to odds ratios (so 50% becomes 1:1 odds, 75% becomes 3:1 odds, etc), and then taking a logarithm. This is exactly analogous to the decibel scale for noise, so this is called the decibel (dB) scale for evidence.

Probability of A = P(A)
Odds of A = P(A) / P(~A)
Decibel strength of A = 10 · log10(P(A) / P(~A))

Very strong evidence is very noisy, and weak evidence is silent, barely affecting our beliefs. This is also nice because Bayes’ rule becomes additive:

Posterior Odds Ratio = Likelihood Ratio · Prior Odds Ratio
O(T | E) = L(E | T) · O(T)
becomes…
OdB(T | E) = LdB(E | T) + OdB(T)

If your evidence E is equally likely whether or not the theory T is true, then L(E | T) = 1 and LdB(E | T) = 0. Thus you add 0, and end up with the same odds as you started with.

Theories that are very high or very low in credence are very noisy, while those that are around 50% are silent.

Now what’s the difference between 99.99% and 99.9999%?

99.99% = 9999:1 = 40 dB
99.9999% = 999999:1 = 60 dB

A 20 dB difference in strength of belief is a lot easier to wrap your head around than a 0.0099% difference!

In addition, equally strong evidence always looks equally strong when expressed in dB, while it can look increasingly weak when expressed in probabilities.

For example, imagine that somebody comes up to you and claims to be able to read your mind. To test them, you decide to ask her to tell you what number between 0 and 10 is in your head right now. If she gets this right, then this counts as 10 decibels of evidence for her psychic abilities.

L(correct | psychic) = P(correct | psychic) ÷ P(correct | not psychic)
≈ 100% / 10% = 10

10 log₁₀(10) = 10 dB

So if your previous belief in her psychic abilities was at -50 decibels (100,000:1 odds against), then it should now be at -40 decibels (10,000:1 odds against).

The same calculation would tell you that another successful test would nudge you another +10 dB, from -40 to -30. Extrapolation seems to indicate that you should be pretty much agnostic as to whether or not she is psychic after three more such successful tests, and strong believers after only eight total tests.

Initial strength of belief = -50 dB
First test gives evidence of +10 dB
New strength of belief = -40 dB
Four more tests give total evidence of +40 dB
New strength of belief = 0 dB
Three more tests give total evidence of +30 dB
Final strength of belief = +30 dB (99.9%)

This example actually gets things wrong in a very important way. Eight tests like those that I described is probably not sufficient to establish psychic abilities. This is a little off topic, but is useful to go into as a demonstration of how naive usage of Bayes’ rule can lead you off the rails.

Where we went wrong was in the very first step, in calculating the decibel strength of the evidence.

L(correct | psychic) = P(correct | psychic) ÷ P(correct | not psychic)
≈ 100% / 10% = 10

The presumption behind this calculation is that if she were psychic, then she would almost definitely be able to get the number right (≈ 100%), but if not, then she would have a random shot (10%). But “psychic” and “random” are not the only two theories! For instance, maybe the apparent psychic has actually just figured out a masterful method for reading subtle facial movements to guess at the number being guessed, rather than actually being able to look into your mind.

The face-reading hypothesis seems unlikely, but probably less so than true mind-reading abilities. Let’s give it a decibel score of -20 (corresponding to an initial credence of about 1%). This should barely factor into our initial calculation, so let’s suppose that +10 dB is the actual strength of evidence for psychic abilities.

Now PdB(psychic) goes from -50 dB to -40 dB, and PdB(face-reading) goes from -20 dB to -10 dB. They have both gotten more likely, because they both successfully predicted the outcome! And now for the second test, face-reading should have a bigger effect on the calculation! I’ll skip the algebra and just present the new strengths of evidence for the second test:

L(correct | psychic) = 7 dB
L(correct | face-reading) = 10 dB

Notice that the evidence is now weaker for the “psychic” hypothesis, because it has a more likely competing hypothesis. The evidence is still equally strong for face-reading, on the other hand, because its competing hypothesis (that she is psychic) is still very weak.

So we update again!

Psychic: -40 dB to -33 dB (.05%)
Face-reading: -10 dB to 0 dB (50%)

Now the face-reading hypothesis is 50% – apparently equally likely to be true and false! This will sway the strength of the evidence for the ‘psychic’ hypothesis even more on the third trial:

L(correct | psychic) = 3 dB
L(correct | face-reading) = 10 dB

Now with such a likely alternative explanation, the evidence is even weaker than previously for the psychic hypothesis. After our third trial, our beliefs will update as follows:

Psychic: -33 dB to -30 dB (.1%)
Face-reading: 0 dB to 10 dB (90%)

As you can see, the face-reading hypothesis takes off, while the psychic hypothesis ends up staying stuck around .1%.

I’ll talk more about this in a post tomorrow, in which I show how the exact same simple error in our first argument is being made in fine-tuning arguments for God!

Leave a Reply