Opt-out organ donation

(Mostly interested in this for two reasons: (1) the research in cognitive science about default effects and other unintuitive cognitive biases and (2) the adequacy implications of the lack of implementation of this policy)

In the United States, around 95% of the population approves of organ donation, while only 54% have granted permission for their organs to be used after death. Surveys in the UK indicate that the percentage that approve organ donation is around 90%, but only 25% of the population is registered on the Organ Donation Registry. Many other countries have similar patterns.

When polled, the reasons given for not explicitly registering for organ donation are things like laziness, confusion about the process and unwillingness to think about death.

And it’s actually worse than this – many countries have ‘soft’ organ-donation policies, meaning that family members can override the wishes of the deceased. Families are more likely to veto the decision to donate than the decision to not donate, further decreasing the number of organs available for transplant.

And this number really really matters. There are over 100,000 people in need of a life saving organ transplant in the United States, and over seven thousand people died last year while waiting. This amounts to 20 people every day. And in the UK and the US, the gap between available organs and patients awaiting transplantation is only growing.


Psychologists have studied the effects of default options on expressed preferences. One experiment told subjects to imagine that they had just moved to a new state, and that they had to decide whether or not to be organ donors. Some subjects were told that the default was to be an organ donor, and their choice was to confirm or change that status. Others were told the opposite – that the default was to not be an organ donor. The results were dramatic: about two times more people became donors when this was the default than when it was not. The simple framing effect of “confirm the default or change?” had the power to cut organ donations in half.

The real-world equivalent of this is whether a country has an opt-in or opt-out organ donation system. The UK and the US have an opt-in system, which means that the default choice is to not be an organ donor. Other countries, like Austria, Belgium, Spain and Sweden, have an opt-out system.

This difference in policy has huge differences in the percentage of the population that consents to organ donation. When Austria and Belgium changed from an opt-in to an opt-out system, donation rates more than doubled. When Singapore changed to opt-out, their donation rates more than sextupled. And comparisons between countries that have different policies are similarly impressive. Germany and Austria, similar countries in many ways except for their donation policy showed an almost 88% difference in effective consent rates.

Consider for a moment how strange this is. In the United States, all it requires to become an organ donor is to check a box when registering for a driver’s license at the DMV. Can it really be that a simple difference in whether the box means “become an organ donor” or “stop being an organ donor” is preventing millions of people from becoming organ donors? Classical economics would certainly not predict this – it is presumed that if somebody has a preference about whether or not to be an organ donor, a tiny difference in framing should not have such huge effects on their behavior.

But apparently the answer is that yes, these tiny differences do matter. And our strange little human quirks can be hugely important in deciding on how to make effective policy.


Ultimately, we are left with an adequacy question. Opt-out organ donation policies seem to me like low-hanging policy fruit. If policy-makers care to eliminate thousands of needless deaths, and are aware of these policies, then why aren’t they already implemented in the US and the UK?

The spiritual and the scientific

There’s an Isaac Asimov quote that I love. It goes:

When people thought the Earth was flat, they were wrong. When people thought the Earth was spherical, they were wrong. But if you think that thinking the Earth is spherical is just as wrong as thinking the Earth is flat, then your view is wronger than both of them put together.

I was recently reminded of this because I’m at an ashram this week, and in one of the talks, a swami brought up his beef with science.

He talked about how science is just another form of faith, and that therefore our intuition is a perfectly valid guide to understanding the universe. After all, all of our past scientific theories have turned out to be wrong, so we should expect that our current theories will also turn out to be wrong.

Thus the Asimov.


For various reasons, I’m often in spiritual places surrounded by spiritual people. These are the types of people that say “I believe in all religions” and go to yoga retreats and read books about sacred healing and ancient wisdom. When I’m at these places, people sometimes find out that I’m a physics student who is interested in things like Science and Rationality. The types of responses I get are interesting.

Usually the people I talk to are enthusiastic and eager to talk about the most recent scientific discoveries they’ve heard of. They’re also quick to point out that Science can’t tell us everything, and after all there are the virtues of faith to be considered. Other times I feel a subtle shift in attitude. This might be paranoia, but it’s as I’ve been registered as somebody belonging to the Other Team.

And after all, important swamis declare that science is just another form of faith, and spiritual people nod knowingly. And the Deepak Chopras of the world declare with relish that science cannot tell us objective truths, and that scientists are arrogant and dogmatic.

This is all very weird to me. Science is our best systematized attempt to understand the world we live in and to unearth the general principles that guide this world. Great scientists are guided by a fascination with the order of the universe and wonder at its comprehensibility. At their root they want to understand, in Einstein’s words, the mind of God.

And the spiritual tell me that “spiritual” means something like “interested in pondering the nature of reality at a deep level and appreciating the awe-inspiring and profound aspects of existence.”

If this is how I should understand these terms, then spirituality and science are two things that should definitely definitely not be enemies. In fact, if “spiritual” meant what the spiritual claim it means, then the best spiritual seekers should be the same people as the best scientists.

Look at this quote from Carl Sagan:

Science is not only compatible with spirituality; it is a profound source of spirituality. When we recognize our place in an immensity of light years and in the passage of ages, when we grasp the intricacy, beauty and subtlety of life, then that soaring feeling, that sense of elation and humility combined, is surely spiritual.

And from Neil Degrasse Tyson:

It’s quite literally true that we are star dust, in the highest exalted way one can use that phrase. I bask in the majesty of the cosmos.

Not only are we in the universe, the universe is in us. I don’t know of any deeper spiritual feeling than what that brings upon me.

Are these not expressions of an utmost appreciation for the spiritual, as defined above? Why don’t the spiritual embrace Neil Degrasse Tyson and his scientific colleagues with open arms as fellow earnest truth-seekers, and marvel at the beauty of the universe together? I mean, just look at the man – he’s practically overflowing with the type of joy and curiosity that the spiritual should love!

The spiritual will tell me: “Yes, some of the greatest scientists are very spiritual. Look at Einstein! He said that science without religion is lame, and that all serious scientists recognize a Spirit in the laws of nature! Science at its best can be and should be a deeply spiritual enterprise. But unfortunately, a lot of scientists out there are just too close-minded. This is why the spiritual can sometimes sound anti-science, because the scientists of the world dogmatically reject our reasonable beliefs, like that the spiritually enlightened can read minds and make objects levitate, or that the stars are sending us secret messages about our romantic prospects and whether we should change jobs, or that playing cards thrown randomly onto the ground can accurately tell us our future!”

Yes, scientists can be dogmatic, because scientists are humans. But it strikes me that perhaps part of the reason that the spiritual might claim that scientists are especially dogmatic has maybe something to do with the fact that scientists have repeatedly studied and disproved common spiritual beliefs and practices. More importantly, many of these beliefs are in direct conflict with the known laws of nature. As the saying goes: keep your mind open – but not so open that your brains fall out.

The spiritual: “But science too often tries to go too far and dismiss those things which it doesn’t understand!”

What, like the possible physical effects that the stars could have on the paths that our lives take? Or like the effects of diluting a chemical compound until not a single molecule remains on the potency of the final product as a medical instrument? Or the ways that the lines on your palm form, that really really have nothing to do with how rich you’ll be or how many kids you’ll end up having?

No, this won’t do. Science does not understand everything. There are plenty of mysteries out there, and we love that there are. They give scientists employment! But scientists are certainly not in the business of blindly dismissing those things that they actually do not understand.

Besides, are scientists really all that dogmatic? Look at the history of the scientific worldview. Consensus theories are constantly recycled as we make the long march towards understanding reality. Some of the strongest scientific consensuses are only a few decades old! Scientists are constantly updating and refurnishing their view of reality as the evidence changes.

Perfectly? No! But I’d hazard a guess that they do so better than the average person. Why? For one thing, they have a career incentive to do so. A scientist that sticks to the old phlogiston-theory of combustion can’t get published, and a scientist that discovers damning evidence of the falsity of an important consensus gets tenure, pay raises, and respect from their colleagues. The incentive structure of science is set up to reward those that can avoid becoming stuck in dogmatic patterns of belief.


Physicist and philosopher Tim Maudlin described a feature of truth-seeking enterprises as that they tend to be uniform across space and to vary across time. Ask a biologist in Bengal what they think about the structure of DNA, and you’ll get pretty much the same answer as a biologist at Oxford. And when new evidence comes in, the beliefs of scientists shift fairly uniformly.

Ask a spiritual seeker in India what they think about Shamanic healing, and you’ll likely get a different answer from a spiritual seeker in the UK.

Yes, science has problems and is definitely not perfect. But we’re not comparing it to an ideal perfected version of science conducted by perfect Bayesian epistemologists with infinite computing power, we’re comparing it to humanity’s status quo. With rampant climate change denial, young Earth creationism, disbelief in evolution and anti-vaccination conspiracies, it’d be hard to convince me that scientists are much worse than the average Joe at avoiding patterns of dogmatic thought.

I just don’t buy that the high epistemic standards and regard for truth held by the spiritual is the reason that they dismiss science. I’ve met too many spiritual people eager to have their charts read by astrologers or obtain homeopathic sugar pills or communicate with invisible spirits. And I don’t buy that scientists are not actually honest truth-seekers trying to understand the world.

Which is why I think that the word spiritual doesn’t actually mean what the spiritual claim it means. I’m not being a linguistic prescriptivist here; I’m saying that the definition that spiritual people provide of spirituality is the motte, and the bailey is something else, something that is apparently hostile to science and friendly to all sorts of pseudoscientific ideas.

The bailey is where the fertile and valuable ideological land is, and the motte is the easily defensible position that spiritual people can retreat to when their beliefs are questioned. The bailey is not actually fundamentally about the urge to understand nature. It’s not actually about the same type of wonder and joy that a scientist gets when they understand some important piece of how the world works. Based off of many of the interactions I’ve had with self-identified spiritual people, I would define it as something like “belief in the existence of some phenomenon for which there is no evidence, or evidence against, like Reiki, crystal healing, tarot cards, etc.”


Looking at what I’ve written so far, it sounds like I see nothing but conflict between spirituality and science. This is not so. I have focused on the aspects of spirituality that do come into conflict with science, mostly because I think that these play a large role in the anti-scientific attitudes among the spiritual. The spiritual are quite friendly towards science when it supports their beliefs.

And it often does! There are spiritual practices that science has found to be genuinely beneficial, more than predicted by placebo effects, and beneficial in many of the ways that the spiritual claim them to be. Meditation and yoga come to mind. Mindfulness practices also have an impressive evidence base. And things like a belief in a higher power and spiritual experiences can be genuinely uplifting and transformative.

I’ve talked about spiritual people as if they were all the same, harboring irrational beliefs and anti-scientific attitudes. But plenty of spiritual people I meet are genuinely appreciative of the sciences, and want their world-view to be as fully supported by the scientific evidence as possible. Some are even scientists themselves!

And anti-science attitudes are not at all ubiquitous across spiritual traditions. Buddhism is often praised for its friendliness towards the sciences, and its scientific approach to belief formation. The Dalai Lama says things like:

If scientific analysis were conclusively to demonstrate certain claims in Buddhism to be false, then we must accept the findings of science and abandon those claims.

I don’t know enough about the Dalai Lama’s personal epistemic habits to be confident that this is more than nice-sounding words. How does he think that this attitude affects Buddhist views on karma and reincarnation, for instance?

It is much easier to proclaim a science-friendly attitude than it is to actually accept the tough implications of such an attitude on beliefs central to one’s ideology. But attitudes like this seem like the right way forward in reconciling the actual meaning of spirituality with the meaning that the spiritual seem to want it to have.

Comprehensibility of the Complex

(Some speculative rambling about stuff I’ve been thinking about recently.)

There’s a fallacy that I have committed hundreds of times, and that I have only really recently internalized as a fallacy. Perhaps it is not a fallacy, but a confused pattern of thought. In any case, I’ll call it “the incomprehensibility of the complex.”

Here’s the context in which I would make the mistake:

Somebody brings up some political or economic question, say “Should we have left Iraq?” or “Should we raise the minimum wage?”

This sparks a fierce debate. Somebody says that removing the troops left the region defenseless against takeover by extremist groups, or that extra wages given to workers go back into the economy and stimulate the economy. Another objects that our troops were ultimately the source of the instability, or cite the broken-window fallacy.

And I would think: “The world is crazily complicated. Physicists can barely understand complex atoms. Now scale that complexity up to interactions between hundreds of millions of humans, each one a system of a hundred trillion trillion atoms. This should put into perspective the proper degree of epistemic humility we should hold when discussing the minimum wage.”

Basically: If we can’t understand atoms, then we sure as hell can’t understand economic systems or international relations.

Observing that this is a bad argument is not too profound or interesting.

What’s interesting to me is the fact that this is a bad argument. That is, the fact that we can scale up the complexity of the system we are studying by a factor of 10^30, squint our eyes, and then get to work at creating fantastically simple and accurate models of the system. This is absolutely insane, and tells us something about the type of universe that we live in.


Recently I watched a lecture on Marginal Revolution University about gun buyback programs and slave redemption policies. The gist of it is this:

Starting in 1993, some humanitarian groups got in their head that they could save Sudanese slaves by buying them from their owners and then freeing them. This maybe sounds like a good idea, until you learn about supply and demand curves.

In truth, what the slave redeemers ended up doing was increasing demand for slaves, resulting in new slaves being captured and tens of thousands of dollars ending up in the hands of slave-owners. Fresh revenue funded weapons purchases, further enabling slave traders to raid villages and capture new slaves.

(By the way, some charity groups still do this)

A similar thing can happen with gun buyback programs. These programs involve the buying of guns in large quantities from gun owners in order to melt them down, the thought being that this will get the guns off of the street. The effect of this?

Well, the gun producers thank their new customers for the money and start manufacturing more guns to supply their larger customer base. In some cases violent crime rates jumped, and a study measuring if these programs actually decrease violent crime rates overall found no statistically significant effects.

Now, I’m ashamed to say that these programs actually initially seemed like fine ideas to me. This is really a statement of my failure to have internalized how supply and demand curves work. In my defense, this is not always a totally horrible policy idea. When demand is much more elastic than supply, the price of the good will jump and many of the original buyers will be priced out of the market. In other words, if the producers have a harder time scaling up their operations than the consumers have buying less of the good, then the world will actually end up freer of slaves/guns.

But that is not how these markets actually work. Demand for guns is in fact less elastic than supply of guns, so the gun nuts are barely affected and the ungun-nuts are handing over free money to the gun manufacturers.

Gun Buybacks

And one more example from Marginal Revolution. Sorry, but we’re on the topic of unintuitive basic econ and it’s just too good to leave out.

In 1990 the United States passed a policy that applied a tax on luxury goods like yachts. The idea, it seems, was, “The federal budget deficit is too high, and if we tax the rich on their fancy luxury goods, we can reduce the deficit without really hurting anybody.” Sounds good, yes?

But what actually happened was that as the price of yachts increased, rich people bought less, and thousands of laborers in the yacht industry lost their jobs. When all was said and done, the government ended up paying more in increased unemployment benefits than they gained in tax revenue from the policy! The government quickly wised up and repealed the tax a few years after it was put in place.

How to understand this? Easy! Draw a graph of supply and demand. Which one has a steeper slope? Well, rich people can fairly easily just spend their money differently if yacht prices increase. They care less about one less yacht than the workers that survive off of the wage they got making that yacht.

So the yacht-buyers will more easily leave the market than the yacht-producers, which means the demand for yachts is more elastic than the supply, which means that the producers are hurt more by the tax.

Luxury Tax

The point is, the model works! It makes weird-sounding and unintuitive predictions, and it turns out to be right. Literally just draw two lines and assess their relative slopes, and you can understand why a tax will sometimes burn consumers and other times burn producers. (You can also do better than the US government in 1990 apparently, but maybe this shouldn’t be surprising)

A simple model of our economy as a bunch of supply and demand curves with varying elasticities has enormous explanatory power. This is a breathtakingly simple model of a breathtakingly complex system. And it tells us something important about the world that it works at all.

Okay, enough fun with econ. All of this was just to say that I feel thoroughly rebutted in my old view that things like interactions of humans are too complex to be understood by anybody. So we have our mystery: how does simplicity arise out of complexity?

Here’s my attempt at an answer: simplicity arises when the universe is playing an optimization game with a simple target.

If every few seconds God scanned the universe, erased the least macroscopically circular shapes, and duplicated the rest, then you would quickly expect to be able the universe to consist of only circles. More to the point, it would quickly become possible to accurately model the universe as a bunch of circles of various sizes at various locations.

The clearest real world example of something like this is natural selection. Natural selection is a process that is optimizing biological systems for a simple target – reproductive fitness. It kills off variation and only lets those few forms that are able to reproduce successfully survive into the next generation.

In this sense, natural selection prunes down the complexity of the world, replacing the incomprehensible with the comprehensible. What was initially a high-entropy system, describable only at the level of fundamental physics, becomes a low-entropy system, describable by a few simple biological principles. Instead of having to describe the organism in full glorious detail at the level of quarks and electrons, we just need to explain how it won the optimization game of natural selection.

Gravity gives us another example of an optimization game our universe plays. Once you get enough mass in one place, gravity will crush it inward towards the center of mass, gradually inching diverse macroscopic shapes towards sphericity.


Which is why every large object you’ll see in the sky looks perfectly spherical. Any large objects that started off clunky and non-spherical were ruthlessly optimized into sphericity. (Actually they are oblate spheroids, but that’s because technically the optimization game they’re playing is gravity + angular momentum)

So why do supply and demand curves do a great job at predicting interactions between massive numbers of humans? The implied answer is that humans are the result of an optimization game that has made our behaviors simply describable in terms of supply and demand curves.

What exactly does this mean? Perhaps a trait that enhances reproductive fitness in organisms like us is the cognitive skill to make tradeoffs between different desires, and this gives rise to some type of universal comparison metric between very different goods. Now we can sensibly say things like “I want ice cream less than I want to enjoy a beautiful sunset. Except orange custard chocolate chip ice cream. I’d trade off the sunset for orange custard chocolate chip ice cream any day.”

Then somebody comes along with a bright idea called ‘money’, and suddenly we have a great generalization about human behavior: “Everybody wants more money.” From this, some basic notions like a downward-sloping demand curve, an upward-sloping supply curve, and a push towards equilibrium follow quite nicely. And we have a crazily simple high-level explanation of the crazily complex phenomenon of human interaction.

Correlation and causation

Previous: Causal intervention

I’m feeling a bit uninspired today, so what I am going to do is take the path of least resistance. Instead of giving a thoughtful discussion of the merits and faults of the slogan “Correlation does not imply causation”, I’ll just disprove it with a counterexample.

We have some condition C. This condition affects some members of our population. We want to know if gender (A) and race (B) play a causal role in the incidence of this condition.

Some starting causal assumptions: Gender does not cause race. Race does not cause gender. And the condition does not cause either gender or race.

First we go search for numbers to determine possible correlations between gender and the condition or race and the conditions. Here’s what we find:

P(A & B & C) = 2%
P(A & B & ~C) = 3%
P(A & ~B & C) = 18%
P(A & ~B & ~C) = 27%
P(~A & B & C) = 0.5%
P(~A & B & ~C) = 4.5%
P(~A & ~B & C) = 4.5%
P(~A & ~B & ~C) = 40.5%

Alright, now what are the possible causal structures of race, gender, and condition consistent with our starting assumptions? There are 4: neither A nor B cause C, only B causes C, only A causes C, and both cause C.

ABC all models

Each of these causal models makes precise, empirical predictions about what sort of correlations we should expect to find. The first model tells us not to expect any correlations whatsoever – each of the variables should vary independently in the population. The second says that A and C will be independent, and B and C will not be. Etc.

We can test all of these straightforwardly: Is it true that P(A & C) = P(A) * P(C)? And is it true that P(B & C) = P(B) * P(C)? We calculate:

P(A & C) = 2% + 18% = 20%
P(B & C) = 2% + .5% = 2.5%

P(A) = 2% + 3% + 18% + 27% = 50%
P(B) = 2% + 3% + .5% + 4.5% = 10%
P(C) = 2% + 18% + .5% + 4.5% = 25%

P(A) * P(C) is 12.5%, and P(B) * P(C) is 2.5%.

So… our third model is correct! We have determined causation from correlation! So much for the famous slogan.


The studious one will object that the only way that we have determined causation from correlation in this case is because we started with causal assumptions. This is correct, at least in part. If we had started with no causal assumptions, we still would have found that race and gender are independent. But we would not have been able to determine the direction of our causal arrows.

Here’s a general principle: Purely observational data (read: correlations) cannot tell you on its own the direction of causation. Even this is not actually fully correct: in fact there are special situations called natural experiments in which purely observational data can tell you the direction of causation. We’ll save this discussion for later.

Another studious reader will object: But this is a threadbare notion of causation! On this view, causation is really just statistical dependence!

They are wrong. A causal diagram tells you two things. First, it tells you what correlations you should expect to observe in observational data. But second, it tells you what to expect when you intervene and perform experiments on your variables. This second feature packs in the rest of the intuitive substance of causality.

One final skeptic will point out: Even if we accept your causal assumptions, we cannot truly say that we have ruled out all other causal models. For instance, what if gender does not actually cause the condition, but both gender and the condition are the result of some hidden common cause? This new causal diagram is not ruled out by the data, as one still expects to see a correlation between gender and condition.

They are correct. I am being a little sly in ignoring these subtleties, but this is because they avoid the main point. Which is that causal diagrams are empirically falsifiable, even from purely correlational data. The sense in which the slogan “Correlation does not imply causation” is correct is the sense in which not literally every possible causal model can be eliminated just by observations of correlation. Some causal diagrams truly are empirically indistinguishable. But this doesn’t make causality any more mysterious or un-probeable with the scientific method. We can simply run experiments to deal with the remaining possibilities.

Here are three general ways that you can falsify causal diagrams:

  1. Through observations of correlation or lack of correlation between variables.
  2. Through relevant background information (like temporal order or impossibility of physical interaction between variables)
  3. Through experimental interventions, in which you fix some variables and observe what happens to the others.

Next we’ll discuss some of the useful conceptual tools that arise from this notion of causality.

Previous: Causal intervention

Next: Screening off and explaining away

Causal Intervention

Previous post: Causal arrows

Let’s quickly review the last post. A causal diagram for two variables A and B tells us how to factor the joint probability distribution P(A & B). The rule we use is that for each variable, we calculate its probability conditional upon all of its parent nodes. This can easily be generalized to any number of variables.

Quick exercises: See if you understand why the following are true.

1. If the causal relationships between three variables A, B, and C are:A>B>C

Then P(A & B & C) = P(A) · P(B | A) · P(C | B).

2. If the causal relationships are:


Then P(A & B & ~C) = P(A | B) · P(B) · P(~C | B).

3. If the causal relationships are:


Then P(~A & ~B & C) = P(~A) · P(~B | ~A & C) · P(C)

Got it? Then you’re ready to move on!


Two people are debating a causal question. One of them says that the rain causes the sidewalk to get wet. The other one says that the sidewalk being wet causes the rain. We can express their debate as:

2 var causal OR

We’ve already seen that the probability distributions that correspond to these causal models are empirically indistinguishable. So how do we tell who’s right?

Easy! We go outside with a bucket of water and splash it on the sidewalk. Then we check and see if it’s raining. Another day, we apply a high-powered blow-drier to the sidewalk and check if it’s raining.

We repeat this a bunch of times at random intervals, and see if we find that splashing the sidewalk makes it any more likely to rain than blow-drying the sidewalk. If so, then we know that sidewalk-wetness causes rain, not the other way around.

This is the process of intervention. When we intervene on a variable, we set it to some desired value and see what happens. Let’s express this with our diagrams.

When we splash the sidewalk with water, what we are in essence doing is setting the variable B (“The sidewalk is wet”) to true. And when we blow-dry the sidewalk, we are setting the variable B to false. Since we are now the complete determinant of the value of B, all causal arrows pointing towards B must be erased. So:

A>B becomes A B


< stays <

And now our intervened-upon distributions are empirically distinguishable!

The person who thinks that sidewalk-wetness causes rain expects to find a probabilistic dependence between A and B when we intervene. In particular, they expect that it will be more likely to rain when you splash the sidewalk than when you blow-dry it.

And the person who thinks that rain causes sidewalk-wetness expects to find no probabilistic dependence between A and B. They’ll expect that it is equally likely to be raining if you’re splashing the sidewalk as if you’re blow-drying it.


This is how to determine the direction of causal arrows using causal models. The key insight here is that a causal model tells you what happens when you perform interventions.

The rule is: Causal intervention on a variable X is represented by erasing all incoming arrows to X and setting its value to its intervened value.

I’ll introduce one last concept here before we move on to the next post: the causal conditional probability.

In our previous example, we talked about the probability that it rains, given that you splash the sidewalk. This is clearly different than the probability that it rains, given that the sidewalk is wet. So we give it a new name.

Normal conditional probability = P(A | B) = probability that it rains given that the sidewalk is wet

Causal conditional probability = P(A | do B) = probability that it rains given that you splash the sidewalk.

The causal conditional probability of A given B, is just the probability of A given that you intervene on B and set it to “True”. And P(A | do ~B) is the probability of A given that you intervene on B and set it to “False”.
If we find that P(A | do B) = P(A | do ~B), then we have ruled out a-b-e1513835006290.png.


Previous: Causal arrows

Next: Correlation and causation

Causal Arrows

Previous post: Preliminaries

Let’s start discussing causality. The first thing I want to get across is that causal models tell us how to factor joint probability distributions.

Let’s say that we want to express a causal relationship between some variable A and another variable B. We’ll draw it this way:

A > B

Let’s say that A = “It is raining”, and B = “The sidewalk is wet.”

Let’s assign probabilities to the various possibilities.

P(A & B) = 49%
P(A & ~B) = 1%
P(~A & B) = 5%
P(~A & ~B) = 45%

This is the joint probability distribution for our variables A and B. It tells us that it rains about half the time, that the sidewalk is almost always wet when it rains, and the sidewalk is rarely wet when it doesn’t rain.

Factorizations of a joint probability distribution express the joint probabilities in terms of a product of probabilities for each variable. Any given probability distribution may have multiple equivalent factorizations. So, for instance, we can factor our distribution like this:

Factorization 1:
P(A) = 50%
P(B | A) = 98%
P(B | ~A) = 10%

And we can also factor our distribution like this:

Factorization 2
P(B) = 54%
P(A | B) = 90.741%
P(A | ~B) = 2.174%

You can check for yourself that these factorizations are equivalent to our starting joint probability distribution by using the relationship between joint probabilities and conditional probabilities. For example, using Factorization 1:

P(A & ~B)
= P(A) · P(~B | A)
= 50% · 2%
= 1%

Just as expected! If any of this is confusing to you, go back to my last post.


Let’s rewind. What does any of this have to do with causality? Well, the diagram we drew above, in which rain causes sidewalk-wetness, instructs us as to how we should factor our joint probability distribution.

Here are the rules:

  1. If node X has no incoming arrows, you express its probability as P(X).
  2. If a node does have incoming arrows, you express its probability as conditional upon the values of its parent nodes – those from which the arrows originate.

Let’s look back at our diagram for rain and sidewalk-wetness.

A > B

Which representation do we use?

A has no incoming arrows, so we express its probability unconditionally: P(A).

B has one incoming arrow from A, so we express its probability as conditional upon the possible values of A. That is, we use P(B | A) and P(B | ~A).

Which means that we use Factorization 1!

Say that instead somebody tells you that they think the causal relationship between rain and sidewalk-wetness goes the other way. I.e., they believe that the correct diagram is:

A < B

Which factorization would they use?


So causal diagrams tell us how to factor a probability distribution over multiple variables. But why does this matter? After all, two different factorizations of a single probability distribution are empirically equivalent. Doesn’t this mean that “A causes B” and “B causes A” are empirically indistinguishable?

Two responses: First, this is only one component of causal models. Other uses of causal models that we will see in the next post will allow us to empirically determine the direction of causation.

And second: in fact, some causal diagrams can be empirically distinguished.

Say that somebody proclaims that there are no causal links between rain and sidewalk-wetness. We represent this as follows:


What does this tell us about how to express our probability distribution?

Well, A has no incoming arrows, so we use P(A). B has no incoming arrows, so we use P(B).

So let’s say we want to know the chance that it’s raining AND the sidewalk is wet. According to the diagram, we’ll calculate this in the following way:

P(A & B) = P(A) · P(B)

But wait! Let’s look back at our initial distribution:

P(A & B) = 49%
P(A & ~B) = 1%
P(~A & B) = 5%
P(~A & ~B) = 45%

Is it possible to get all of these values from just our two values P(A) and P(B)? No! (proof below)

In other words, our data rules out this causal model.

A X B crossed


To summarize: a causal diagram over two variables A and B tells you how to calculate things like P(A & B). It says that you break it into the probabilities of the individual propositions, and that the probability for each variable should be conditional on the possible values of its parents.

Next we’ll look at how we can empirically distinguish between > and <

Proof of dependence

Previous post: Preliminaries

Next post: Causal intervention

Causality: Preliminaries


One revolution in my thinking was Bayesianism – applying probability theory to beliefs. This has been thoroughly covered in self-contained series at all levels of accessibility elsewhere.

A more recent revolution in my thinking is causal modeling – using graphical networks to model causal relationships. There appears to be a lack of good online explanations of these tools for reasoning, so it seems worthwhile to create one.

My goal here is not to make you an expert in all things causal, but to pass on the key insights that have modified my thinking. Let’s get started!


Much of the framework of causal modeling relies on an understanding of probability theory. So in this first post, I’ll establish the basics that will be used in later posts. If you know how to factor a joint probability distribution, then you can safely skip this.

We’ll label propositions like “The movie has started” with the letters A, B, C, etc. Probability theory is about assigning probabilities to these propositions. A probability is a value between 0 and 1, where 0 is complete confidence that the statement is false and 1 is complete confidence that it is true.

Some notation:

The probability of A = P(A)
The negation of A = ~A
The joint probability that both A and B are true = P(A & B)
The conditional probability of A, given that B is true = P(A | B)

There are just five important things you need to know in order to understand the following posts:

  1. P(A & B) = P(A | B) · P(B)
  2. P(A) + P(~A) = 1
  3. A and B are independent if and only if P(A | B) = P(A). Otherwise, A and B are called dependent.
  4. A joint probability distribution over statements is an assignment of probabilities to all possible truth-values of those statements.
  5. factorization of a joint probability distribution is a way to break down the joint probabilities into products of probabilities of individual statements.

#1 should make some sense. To see how likely it is that A and B are both true, you can first calculate how likely it is that A is true given that B is true, then multiply by the chance that B is true. You can think of this as breaking a question about the probability of both A and B into two questions:

1. In a world in which B is true, how likely is it that A is true?
and 2. How likely is it that we are in that world where B is true?

#2 is just the idea that a proposition must be either true or false, and not both. This is the type of thing that sounds trivial, but ends up being extremely important for manipulating probabilities. For instance, it is also true that a proposition must be true or false and not both, given some other proposition. This means that the conditional probabilities P(B | A) and P(~B | A) must sum to 1 as well. From this we find that P(A) = P(A & B) + P(A & ~B). We’ll use this last identity often.

#3 is a definition of the terms dependence and independence. If two statements are independent, then the truth of one makes no difference to the probability of the other.  It also follows from #1 that if A and B are independent, then P(A & B) = P(A) · P(B). A lot of analysis of causality will be done by looking at probabilistic dependencies, so make sure that this makes sense.

I’ll explain #4 with a simple example. The possible truth-values of two variables A and B are the following:

Both are true: A & B
A is true, and B is false: A & ~B
A is false and B is true: ~A & B
Both are false: ~A & ~B

To specify the joint distribution, we assign probabilities to each of these. For instance:

P(A & B) = .25
P(A & ~B) = .25
P(~A & B) = .30
P(~A & ~B) = .20

In this case, the joint distribution is a set of four different joint probabilities.

And finally, #5 is a definition of factorization. We turn joint distributions into products of individual probabilities by using #1. For instance, one factorization of the joint distribution over A and B uses:

P(A & B) = P(A) · P(B | A)
P(A & ~B) = P(A) · P(~B | A)
P(~A & B) = P(~A) · P(B | ~A)
P(~A & ~B) = P(~A) · P(~B | ~A)

We can see that in order to express all four joint probabilities, we need to know the values of six probabilities. But as a result of #2, we only need to know three of them to find all six. If we specify P(A), P(B | A), and P(B | ~A), then we know the values of P(~A), P(~B | A) and P(~B | ~A). These three probabilities are the factors in our factorization.

P(B | A)
P(B | ~A)

One last thing to notice is that our joint distribution of A and B could have been factored in another way. This comes from the fact that we could use #1 to break down P(A & B), or equivalently to break down P(B & A). If we had done the second, then our factors would be P(B), P(A | B), and P(A | ~B).

And that’s everything!



We’ll apply all this by looking at one factorization of a joint probability distribution over three statements. With three statements, there are eight possible worlds:

A & B & C        A & B & ~C
A & ~B & C       A & ~B & ~C
~A & B & C       ~A & B & ~C
~A & ~B & C       ~A & ~B & ~C

The joint distribution over A, B and C is an assignment of probabilities to each of these worlds.

P(A & B & C)        P(A & B & ~C)
P(A & ~B & C)       P(A & ~B & ~C)
P(~A & B & C)       P(~A & B & ~C)
P(~A & ~B & C)       P(~A & ~B & ~C)

To factor our joint distribution, we just use Idea #1 twice, treating “B & C” as a single statement the first time:

P(A & B & C)
= P(A | B&C) · P(B&C)
= P(A | B&C) · P(B | C) · P(C)

This tells us that the factors we need to specify are:

P(B | C), P(B | ~C),
P(A | B & C), P(A | B & ~C), P(A | ~B & C), and P(A | ~B & ~C)


One last application, this time with actual numbers. Let’s revisit our earlier distribution:

P(A & B) = .25
P(A & ~B) = .25
P(~A & B) = .3
P(~A & ~B) = .2

To factor this distribution, we must find P(A), P(B | A), and P(B | ~A).

We’ll start by finding P(A) using #2.

Since P(B) + P(~B) = 1, P(A & B) + P(A & ~B) = P(A).

This means that P(A) = .5

We can now use #1 to find our remaining two numbers.

Plugging in values to P(A & B) = P(A) · P(B | A) and P(~A & B) = P(~A) · P(B | ~A), we have:

.25 = .5 · P(B | A)
.3 = .5 · P(B | ~A)

Therefore, P(B | A) = .5 and P(B | ~A) = .6

Next: Causal arrows

Low-hanging policy fruit

(Note: none of this is original, just a repackaging of others’ ideas)

A question of great practical importance is “What can I do to improve the world?”

In this post I want to talk about a different but related question: How confident should I be that my ideas for how to improve the world are actually good ideas?

The cynic says something like: “Don’t be naive. The real world is complicated, and there’s almost surely some complex reason that you don’t understand that would make your idea fail spectacularly upon attempted implementation. Besides, there are millions of people out there that are smarter and more knowledgeable than you, and some of them have most likely already thought of your idea. Maybe, if you’re really lucky or really really bright, you might have one or two truly original and not terrible ideas in your life, but I wouldn’t bet on it.”

I don’t want to straw man this perspective, because I think it is right in some really important ways. There is a sense in which perceived low hanging policy fruit is similar to perceived $100 bills lying in the middle of the sidewalk – if it wasn’t some type of trick, you’d better believe that somebody else would have picked them up by now.

And yet…


Overfishing removes tens of billions of dollars from global GDP every year. It permanently destroys fish stocks, the livelihoods of fishermen, and seaside communities. And, well… we’ve known how to solve this problem for decades. It’s a classic tragedy of the commons. The standard solutions that you’ll find in an introductory economics textbook are: privatize the common resource, regulate the market through legally enforceable agreements, or tax/subsidize the market to incentivize sustainable fishing.

These are not just good in theory – they actually work. Catch share programs like an individual transferable quota (ITQ) are clever combinations of privatization and regulation – there is a legally enforced fishing quota and individual fishermen own percentages of this quota. These policies have been tried in about a hundred fisheries, and when they are tried they not only stop the trend of overfishing but even reverse it.

Regulation through marine protection programs that temporarily halt activity in heavily fished areas to let them recover could save up to $920 billion of otherwise lost value by 2050. And in 2010, researchers studying subsidies to fisheries found that “the single action of eliminating fuel subsidies could potentially be the most influential factor in stemming the trend of overfishing”.

All of these solutions are perfectly obvious and commonsensical. Want to end overfishing? Stop subsidizing the overfishers and tax them instead, enforce sustainable fishing practices, and protect overfished areas. So if you thought that by applying some basic economics and common sense, you could do better than most of the world’s governments and fisheries over the last century, you would be completely correct.

But then we come back to our $100 bill thought experiment and the cynical argument. Surely the world consists of people that are plenty incentivized to save marine ecosystems, bring in billions of dollars to the country’s economy, and save the livelihoods of fishermen. And surely some of those people know about the policies I’ve just described and have the power to implement them. But overfishing continues as ever, destroying fish populations and draining money from the economy. So what gives?


It’s not that the cynic is wrong, it’s just that there is more to be said.

I want to develop the $100 bill analogy some more. When in fact should we expect that you could successfully discover a real $100 bill lying on the floor? Here are some questions whose answers would be important to know:

  1. Are there other people that can see the $100 bill?
  2. Do they realize what it is (that is, money)?
  3. Do they want money?
  4. Could they take the $100 bill?

If the answer is “yes” to all four questions, then the bill is probably a realistic sidewalk painting, or a hallucination, or a prank bill on a fishing line held by some impish teenagers in the nearby bushes. If others can see the $100 bill, know what it is, want it, and are capable of taking it, then it’s probably going to be picked up very quickly.

On the other hand, if the answer is “no” to any of the questions, then the bill is likely real and soon to be yours. All you need is one break in the chain of conditions for the conclusion to not obtain. So, for instance, if everybody is blind, it’s not too surprising that the bill is lying there. Similarly for a society in which nobody has ever seen paper money, or the people are all ascetics, or they are all incapable of bending over to pick it up.


Let’s bring this back to our starting question. Say that you’ve thought of an apparently brilliant policy P that solves an important issue I. When should we expect that P is actually a solution to I? Here are the analogous four questions you should ask yourself:

  1. Could other people have thought of P?
  2. Would they be able to tell if it were a solution to I?
  3. Do they want to solve I?
  4. Could they implement P?

We can call this our taxonomy of inadequacy, if we feel fancy. If all of these questions are answered in the affirmative, then we should expect that the policy would have already been implemented if it were actually a solution to I.

At the risk of being redundant, here’s an image:

Adequacy pic

The intersection of these four circles is the set of people that you’d expect to have implemented P, if it were actually a good solution to I. The larger this set is, the more suspicious you should be of your idea.


So let’s apply this!

Why is overfishing not solved? Probably because of #4.

People in positions of influence know how to stop overfishing and some would even like to do so. But they have to worry about the influence of the fishing lobby, as well as their approval ratings among the coastal communities that would be temporarily disadvantaged by policies like fishing quotas, marine protected areas, and higher taxes. Sure it’d be better for everybody in the long run, but voters have a hard time accepting short-term losses for long-term gains. So although we can all see ridiculously low-hanging policy fruit, and enough of us care about solving the problem, nobody in power is actually able to implement these policies due to the nature of the system they exist in.

Another example! Everybody agrees that first-past-the-post (FTPT) voting is about the worst voting system out there. It encourages gerrymandering, dooms third parties, and forces smart voters to vote against their preferences. And we know of more sane voting systems! So why are we still stuck with our horrible system?

Well, those that are currently in power are exactly those that have benefited from FPTP. And if a third party came into being that wanted to change the voting system… well FPTP dooms third parties. So we’re stuck. In terms of our taxonomy of inadequacy, this is a combination of #3 and #4 – those that are in power don’t want to change the voting system, and those that want to change the voting system are unable to get in power.


What I like about this way of thinking is that you can start with “Hey, this sounds like a good way to solve problem X!” and end up understanding the deep structure of our society, seeing the way that this gigantic beast we call civilization functions and the inadequacies that result.

There’s a lot more to be said about this, but I will leave it for future posts.