Patterns of inductive inference

I’m currently reading through Judea Pearl’s wonderful book Probabilistic Inference in Intelligent Systems. It’s chock-full of valuable insights into the subtle patterns involved in inductive reasoning.

Here are some of the patterns of reasoning described in Chapter 1, ordered in terms of increasing unintuitiveness. Any good system of inductive inference should be able to accommodate all of the following.

Abduction:

If A implies B, then finding that B is true makes A more likely.

Example: If fire implies smoke, smoke suggests fire.

Asymmetry of inference:

There are two types of inference that function differently: predictive vs explanatory. Predictive inference reasons from causes to consequences, whereas explanatory inference reasons from consequences to causes.

Example: Seeing fire suggests that there is smoke (predictive). Seeing smoke suggests that there is a fire (diagnostic).

Induced Dependency:

If you know A, then learning B can suggest C where it wouldn’t have if you hadn’t known A.

Example: Ordinarily, burglaries and earthquakes are unrelated. But if you know that your alarm is going off, then whether or not there was an earthquake is relevant to whether or not there was a burglary.

Correlated Evidence:

Upon discovering that multiple sources of evidences have a common origin, the credibility of the hypothesis should be decreased.

Example: You learn on a radio report, TV report, and newspaper report that thousands died. You then learn that all three reports got their information from the same source. This decreases the credibility that thousands died.

Explaining away:

Finding a second explanation for an item of data makes the first explanation less credible. If A and B both suggest C, and C is true, then finding that B is true makes A less credible.

Example: Finding that my light bulb emits red light makes it less credible that the red-hued object in my hand is truly red.

Rule of the hypothetical middle:

If two diametrically opposed assumptions impart two different degrees of belief onto a proposition Q, then the unconditional degree of belief should be somewhere between the two.

Example: The plausibility of an animal being able to fly is somewhere between the plausibility of a bird flying and the plausibility of a non-bird flying.

Defeaters or Suppressors:

Even if as a general rule B is more likely given A, this does not necessarily mean that learning A makes B more credible. There may be other elements in your knowledge base K that explain A away. In fact, learning B might cause A to become less likely (Simpson’s paradox). In other words, updating beliefs must involve searching your entire knowledge base for defeaters of general rules that are not directly inferentially connected to the evidence you receive.

Example 1: Learning that the ground is wet does not permit us to increase the certainty of “It rained”, because the knowledge base might contain “The sprinkler is on.”
Example 2: You have kidney stones and are seeking treatment. You additionally know that Treatment A makes you more likely to recover from kidney stones than Treatment B. But if you also have the background information that your kidney stones are large, then your recovery under Treatment A becomes less credible than under Treatment B.

Non-Transitivity:

Even if A suggests B and B suggests C, this does not necessarily mean that A suggests C.

Example 1: Your card being an ace suggests it is an ace of clubs. If your card is an ace of clubs, then it is a club. But if it is an ace, this does not suggest that it is a club.
Example 2: If the sprinkler was on, then the ground is wet. If the ground is wet, then it rained. But it’s not the case that if the sprinkler was on, then it rained.

Non-detachment:

Just learning that a proposition has changed in credibility is not enough to analyze the effects of the change; the reason for the change in credibility is relevant.

Example: You get a phone call telling you that your alarm is going off. Worried about a burglar, you head towards your home. On the way, you hear a radio announcement of an earthquake near your home. This makes it more credible that your alarm really is going off, but less credible that there was a burglary. In other words, your alarm going off decreased the credibility of a burglary, because it happened as a result of the earthquake, whereas typically an alarm going off would make a burglary more credible.

✯✯✯

All of these patterns should make a lot of sense to you when you give them a bit of thought. It turns out, though, that accommodating them in a system of inference is no easy matter.

Pearl distinguishes between extensional and intensional systems, and talks about the challenges for each approach. Extensional systems (including fuzzy logic and non-monotonic logic) focus on extending the truth values of propositions from {0,1} to a continuous range of uncertainty [0, 1], and then modifying the rules according to which propositions combine (for instance, the proposition “A & B” has the truth value min{A, B} in some extensional systems and A*B in others). The locality and simplicity of these combination rules turns out to be their primary failing; they lack the subtlety and nuance required to capture the complicated reasoning patterns above. Their syntactic simplicity makes them easy to work with, but curses them with semantic sloppiness.

On the other hand, intensional systems (like probability theory) involve assigning a function from entire world-states (rather than individual propositions) to degrees of plausibility. This allows for the nuance required to capture all of the above patterns, but results in a huge blow up in complexity. True perfect Bayesianism is ridiculously computationally infeasible, as the operation of belief updating blows up exponentially as the number of atomic propositions increases. Thus, intensional systems are semantically clear, but syntactically messy.

A good summary of this from Pearl (p 12):

We have seen that handling uncertainties is a rather tricky enterprise. It requires a fine balance between our desire to use the computational permissiveness of extensional systems and our ability to refrain from committing semantic sins. It is like crossing a minefield on a wild horse. You can choose a horse with good instincts, attach certainty weights to it and hope it will keep you out of trouble, but the danger is real, and highly skilled knowledge engineers are needed to prevent the fast ride from becoming a disaster. The other extreme is to work your way by foot with a semantically safe intensional system, such as probability theory, but then you can hardly move, since every step seems to require that you examine the entire field afresh.

The challenge for extensional systems is to accommodate the nuance of correct inductive reasoning.

The challenge for intensional systems is to maintain their semantic clarity while becoming computationally feasible.

Pearl solves the second challenge by supplementing Bayesian probability theory with causal networks that give information about the relevance of propositions to each other, drastically simplifying the tasks of inference and belief propagation.

One more insight from Chapter 1 of the book… Pearl describes four primitive qualitative relationships in everyday reasoning: likelihood, conditioning, relevance, and causation. I’ll give an example of each, and how they are symbolized in Pearl’s formulation.

1. Likelihood (“Tim is more likely to fly than to walk.”)
P(A)

2. Conditioning (“If Tim is sick, he can’t fly.”)
P(A | B)

3. Relevance (“Whether Tim flies depends on whether he is sick.”)
A B

4. Causation (“Being sick caused Tim’s inability to fly.”)
P(A | do B)

The challenge is to find a formalism that fits all four of these, while remaining computationally feasible.

Leave a Reply