Simpson’s paradox

Previous: Screening off and explaining away

A look at admission statistics at a college reveals that women are less likely to be admitted to graduate programs than men. A closer investigation reveals that in fact when the data is broken down into individual department data, women are more likely to be admitted than men. Does this sound impossible to you? It happened at UC Berkeley in 1973.

When two treatments are tested on a group of patients with kidney stones, Treatment A turns out to lead to worse recovery rates than Treatment B. But when the patients are divided according to the size of their kidney stone, it turns out that no matter how large their kidney stone, Treatment A always does better than Treatment B. Is this a logical contradiction? Nope, it happened in 1986!

What’s going on here? How can we make sense of this apparently inconsistent data? And most importantly, what conclusions do we draw? Is Berkeley biased against women or men? Is Treatment A actually more effective or less effective than Treatment B?

In this post, we’ll apply what we’ve learned about causal modeling to be able to answer these questions.

***

Quine gave the following categorization of types of paradoxes: veridical paradoxes (those that seem wrong but are actually correct), falsidical paradoxes (those that seem wrong and actually are wrong), and antinomies (those that are premised on common forms of reasoning and end up deriving a contradiction).

Simpson’s paradox is in the first category. While it seems impossible, it actually is possible, and it happens all the time. Our first task is to explain away the apparent falsity of the paradox.

Let’s look at some actual data on the recovery rates for different treatments of kidney stones.

Treatment A Treatment B
All patients 78% (273/350) 83% (289/350)

The percentages represent the number of patients that recovered, out of all those that were given the particular treatment. So 273 patients recovered out of the 350 patients given Treatment A, giving us 78%. And 289 patients recovered out of the 350 patients given Treatment B, giving 83%.

At this point we’d be tempted to proclaim that B is the better treatment. But if we now break down the data and divide up the patients by kidney stone size, we see:

Treatment A Treatment B
Small stones 93% (81/87) 87% (234/270)
Large stones 73% (192/263) 69% (55/80)

And here the paradoxical conclusion falls out! If you have small stones, Treatment A looks better for you. And if you have large stones, Treatment A looks better for you. So no matter what size kidney stones you have, Treatment A is better!

And yet, amongst all patients, Treatment B has a higher recovery rate.

Small stones: A better than B
Large stones: A better than B
All sizes: B better than A

I encourage you to check out the numbers for yourself, in case you still don’t believe this.

***

The simplest explanation for what’s going on here is that we are treating conditional probabilities like they are joint probabilities. Let’s look again at our table, and express the meaning of the different percentages more precisely.

Treatment A Treatment B
Small stones P(Recovery | Small stones & Treatment A) P(Recovery | Small stones & Treatment B)
Large stones P(Recovery | Large stones & Treatment A) P(Recovery | Large stones & Treatment B)
Everybody P(Recovery | Treatment A) P(Recovery | Treatment B)

Our paradoxical result is the following:

P(Recovery | Small stones & Treatment A) > P(Recovery | Small stones & Treatment B)
P(Recovery | Large stones & Treatment A) > P(Recovery | Large stones & Treatment B)
P(Recovery | Treatment A) < P(Recovery | Treatment B)

But this is no paradox at all! There is no law of probability that tells us:

If P(A | B & C) > P(A | B & ~C)
and P(A | ~B & C) > P(A | ~B & ~C),
then P(A | C) > P(A | ~C)

There is, however, a law of probability that tells us:

If P(A & B | C) > P(A & B | ~C)
and P(A & ~B | C) > P(A & ~B | ~C),
then P(A | C) > P(A | ~C)

And if we represented the data in terms of these joint probabilities (probability of recovery AND small stones given Treatment A, for example) instead of conditional probabilities, we’d find that the probabilities add up nicely and the paradox vanishes.

Treatment A Treatment B
Small stones 23% (81/350) 67% (234/350)
Large stones 55% (192/350) 16% (55/350)
All patients 78% (273/350) 83% (289/350)

It is in this sense that the paradox arises from improper treatment of conditional probabilities as joint probabilities.

***

This tells us why we got a paradoxical result, but isn’t quite fully satisfying. We still want to know, for instance, whether we should give somebody with small kidney stones Treatment A or Treatment B.

The fully satisfying answer comes from causal modeling. The causal diagram we will draw will have three variables, A (which is true if you receive Treatment A and false if you receive Treatment B), S (which is true if you have small kidney stones and false if you have large), and R (which is true if you recovered).

Our causal diagram should express that there is some causal relationship between the treatment you receive (A) and whether you recover (R). It should also show a causal relationship between the size of your kidney stone (S) and your recovery, as the data indicates that larger kidney stones make recovery less likely.

And finally, it should show a causal arrow from the size of the kidney stone to the treatment that you receive. This final arrow comes from the fact that more people with large stones were given Treatment A than Treatment B, and more people with small stones were given Treatment B than Treatment B.

This gives us the following diagram:

Simpson's paradox

The values of P(S), P(A | S), and P(A | ~S) were calculated from the table we started with. For instance, the value of P(S) was calculated by adding up all the patients that had small kidney stones, and dividing by the total number of patients in the study: (87 + 270) / 700.

Now, we want to know if P(R | A) > P(R | ~A) (that is, if recovery is more likely given Treatment A than given Treatment B).

If we just look at the conditional probabilities given by our first table, then we are taking into account two sources of dependency between treatment type and recovery. The first is the direct causal relationship, which is what we want to know. The second is the spurious correlation between A and R as a result of the common cause S.

Simpson's paradox paths

Here the red arrows represent “paths of dependency” between A and R. For example, since those with small stones are more likely to get treatment B, and are also more likely to recover, this will result in a spurious correlation between small stones and recovery.

So how we do we determine the actual non-spurious causal dependency between A and R?

Easy!

If we observe the value of S, then we screen A off from R through S! This removes the spurious correlation, and leaves us with just the causal relationship that we want.

Simpson's paradox broken

What this means is that the true nature of the relationship between treatment type and recovery can be determined by breaking down the data in terms of kidney stone size. Looking back at our original data:

Recovery rate Treatment A Treatment B
Small stones 93% (81/87) 87% (234/270)
Large stones 73% (192/263) 69% (55/80)
All patients 78% (273/350) 83% (289/350)

This corresponds to looking at the data divided up by size of stones, and not the data on all patients. And since for each stone size category, Treatment A was more effective than Treatment B, this is the true causal relationship between A and R!

***

A nice feature of the framework of causal modeling is that there are often multiple ways to think about the same problem. So instead of thinking about this in terms of screening off the spurious correlation through observation of S, we could also think in terms of causal interventions.

In other words, to determine the true nature of the causal relationship between A and R, we want to intervene on A, and see what happens to R.

This corresponds to calculating if P(R | do A) > P(R | do ~A), rather than if P(R | A) > P(R | ~A).

Intervention on A gives us the new diagram:

Simpson's paradox intervene

With this diagram, we can calculate:

P(R | do A)
= P(R & S | do A) + P(R & ~S | do A)
= P(S) * P(R | A & S) + P(~S) * P(R | A & ~S)
= 51% * 93% + 49% * 73%
= 83.2%

And…

P(R | do ~A)
= P(R & S | do ~A) + P(R & ~S | do ~A)
= P(S) * P(R | ~A & S) + P(~S) * P(R | ~A & ~S)
= 51% * 87% + 49% * 69%
= 78.2%

Now not only do we see that Treatment A is better than Treatment B, but we can have the exact amount by which it is better – it improves recovery chances by about 5%!

Next, we’re going to go kind of crazy with Simpson’s paradox and show how to construct an infinite chain of Simpson’s paradoxes.

Fantastic paper on all of this here.

Previous: Screening off and explaining away

Next: Iterated Simpson’s paradox

2 thoughts on “Simpson’s paradox

Leave a Reply