Plausible reasoning, unlike logical deduction, is sensitive not only to the information at hand but also to the query process by which the information was obtained.
Judea Pearl, Probabilistic Reasoning in Intelligent Systems
This quote references an interesting feature of inductive reasoning that’s worth unpacking. It is indicative of the different level of complexity involved in formalizing induction than that involved in formalizing deduction.
A very simple example of this:
You spread a rumor to your neighbor N. A few days later you hear the same rumor from another neighbor N’. Should you increase your belief in the rumor now that N acknowledges it, or should you determine first whether N heard it from N’?
Clearly, if the only source of information for N’ was N, then your belief should not change. But if N’ independently confirmed the validity of the rumor, you have good reason to increase your belief in it.
In general, when you have both top-down (predictive) and bottom-up (explanatory/diagnostic) inferences in evidential reasoning, it is important to be able to trace back queries. If not, one runs the risk of engaging in circular reasoning.
So far this is all fairly obvious. Now here’s an example that’s more subtle.
Three prisoners problem
Three prisoners have been tried for murder, and their verdicts will be read tomorrow morning. Only one will be declared guilty, and the other two will be declared innocent.
Before sentencing, Prisoner A asks the guard (who knows which prisoner will be declared guilty) to do him a favor and give a letter to one of the other two prisoners who will be released (since only one person will be declared guilty, Prisoner A knows that at least one of the other two prisoners will be released). The guard does so, and later, Prisoner A asks him which of the two prisoners (B or C) he gave the letter two. The guard responds “I gave the letter to Prisoner B.”
Now Prisoner A reasons as follows:
“Previously, my chances of being executed were one in three. Now that I know that B will be released, only C and I remain as candidates for being declared guilty. So now my chances are one in two.”
Is this wrong?
Denote “A is guilty” as GA, and “B is innocent” as IB. Now, since GA → IB, we have that P(IB | GA) = 1. This tells us that we can write
P(GA | IB) = P(IB | GA) P(GA) / P(IB)
= P(GA) / P(IB) = ⅓ / ⅔ = ½
The problem with this argument is that we have excluded some of the context of the guard’s response, namely, that the guard could only have answered “I gave the letter to Prisoner B” or “I gave the letter to Prisoner C.” In other words, the fact “Prisoner B will be declared innocent” leads to the wrong conclusion about the credibility of A’s guilt.
Let’s instead condition on IB’ = “Guard says that B will be declared innocent.” Now we get
P(GA | IB’) = P(IB’ | GA) P(GA) / P(IB’) = ½ ⋅ ⅓ / ½ = ⅓.
It’s not sufficient to just condition on what the guard said. We must consider the range of possible statements that the guard could have made.
In general, we cannot only assess the impact of propositions implied by information. We must also consider what information we could have received.
Things get clearer if we consider a similar thought experiment.
1000 prisoners problem
You are one of 1000 prisoners awaiting sentencing, with the knowledge that only one of you will be declared guilty. You come across a slip of paper from the court listing 998 prisoners; each name marked ‘innocent’. You look through all 998 names and find that your name is missing.
This should worry you greatly – your chances of being declared guilty have gone from 1 in 1000 to 1 in 2.
But imagine that you now see the query that produced the list.
Query: “Print the names of any 998 innocent right-handed prisoners.”
If you are the only left-handed prisoner, then you should thank your lucky stars. Why? Because now that you know that the query couldn’t have produced your name, the fact that it didn’t gives you no information. In other words, your chances have gone back from 1:2 to 1:1000.
In this example you can see very clearly why information about the possible outputs of a query is relevant to how we should update on the actual output of the query. We must know the process by which we attain information in order to be able to accommodate that information into our beliefs.
But now what if we don’t have this information? Suppose that you only run into the list of prisoners, and have no additional knowledge about how it was produced. Well, then we must consider all possible queries that might have produced this output!
This is no small matter.
For simplicity, let’s reconsider the simpler example with just three prisoners: Prisoners A, B, and C. Imagine that you are Prisoner A.
You come across a slip of paper from the court containing the statement I, where
I = “Prisoner B will be declared innocent”.
Now, we must assess the impact of I on the proposition GA = “Prisoner A is guilty.”
P(GA | I) = P(I | GA) P(GA) / P(I) = ⅓ P(I | GA) / P(I)
The result of this calculation depends upon P(I | GA), or in other words, how likely it is that the slip would declare Prisoner B innocent, given that you are guilty. This depends on the query process, and can vary anywhere from 0 to 1. Let’s just give this variable a name: P(I | GA) = p.
We’ll also need to know two other probabilities: (1) that the slip declares B innocent given that B is guilty, and (2) that it declares B innocent given that C is guilty. We’ll assume that the slip cannot be lying (i.e. that the first of these is zero), and name the second probability q = P(I | GC).
P(I | GA) = p (slip could declare either B or C innocent)
P(I | GB) = 0 (slip could declare either A or C innocent)
P(I | GC) = q (slip could declare either A or B innocent)
Now we have
P(GA | I) = ⅓ p / P(I)
=⅓ p / [P(I | GA) P(GA) + P(I | GB) P(GB) + P(I | GC) P(GC)]
= ⅓ p / (⅓ p + 0 + ⅓ q)
= p/(p + q)
How do we assess this value, given that p and q are unknown? The Bayesian solution is to treat the probabilities p and q as random variables, and specify probability distributions over their possible values: f(p) and g(q). This distribution should contain all of your prior knowledge about the plausible queries that might have produced I.
The final answer is obtained by integrating over all possible values of p and q.
P(GA | I) = E[p/(p + q)]
= ∫ p/(p + q) f(p) g(q) dp dq
Supposing that our distributions over p and q are maximally uncertain, the final distribution we obtain is
P(GA | I) = ∫ p/(p + q) dp dq
Now suppose that we know that the slip could not declare A (yourself) innocent (as we do in the original three prisoners problem). Then we know that q = 1 (since if C is guilty and A couldn’t be on the slip, B is the only possible choice). This gives us
P(GA | I) = ∫ p/(p + 1) f(p) dp
If we are maximally uncertain about the value of p, we obtain
P(GA | I) = ∫ p/(p + 1) dp
= 1 – ln(2)
If, on the other hand, we are sure that the value of p is 50% (i.e., we know that in the case that A is guilty, the guard chooses randomly between B and C), we obtain
P(GA | I) = .5/(.5 + 1) = ⅓
We’ve re-obtained our initial result! Interestingly, we can see that being maximally uncertainty about the guard’s procedure for choosing between B and C gives a different answer than knowing that the guard chooses totally randomly between B and C.
Notice that this is true even though these reflect the same expectation of what choice the guard will make!
I.e., in both cases (total uncertainty about p, and knowledge that p is exactly .5), we should have 50% credence in the guard choosing B. This gives us some insight into the importance of considering different types of uncertainty when doing induction, which is a topic for another post.