Societal Failure Modes

December 29, 2017March 15, 2018 ~ squarishbracket ~ Leave a comment

(Nothing original, besides potentially this specific way of framing the concepts. This post started off short and ended up wayyy too long, and I don’t have the proper level of executive control to make myself shorten it significantly. So sorry, you’re stuck with this!)

Noam Chomsky in a recent interview said about the Republican Party:

I mean, has there ever been an organization in human history that is dedicated, with such commitment, to the destruction of organized human life on Earth? Not that I’m aware of. Is the Republican organization – I hesitate to call it a party – committed to that? Overwhelmingly. There isn’t even any question about it.

And later in the same interview:

… extermination of the species is very much an – very much an open question. I don’t want to say it’s solely the impact of the Republican Party – obviously, that’s false – but they certainly are in the lead in openly advocating and working for destruction of the human species.

In Chomsky’s mind, members of the Republican Party apparently sit in dark rooms scheming about how best to destroy all that is good and sacred.

I just watched the most recent Star Wars movie, and was struck by a sense of some relationship between the sentiment being expressed by Chomsky here and a statement made by Supreme Leader Snoke:

The seed of the Jedi Order lives. As long as he does, hope lives within the galaxy. I thought you would be the one to snuff it out.

There’s a really easy pattern of thought to fall into, which is something like “When things go wrong, it’s because of evil people doing evil things.”

It’s a really tempting idea. It diagnoses our societal problems as a simple “good guys vs bad guys” story – easy to understand and to convince others of. And it comes with an automatic solution, one that is very intuitive, simple, and highly self-gratifying: “Get rid of the bad guys, and just let us good guys make all the decisions!”

I think that the prevalence of this sort of story in the entertainment industry gives us some sort of evidence of its memetic power as a go-to explanation for problems. Think about how intensely the movie industry is optimizing for densely packed megadoses of gratifying storylines, visual feasts, appealing characters, and all the rest. The degree to which two and a half hours can be packed with constant intense emotional stimulation is fairly astounding.

Given this competitive market for appealing stories, it makes sense that we’d expect to gain some level of insight into the types of memes that we are most vulnerable to by looking at those types of stories and plot devices that appear over and over again. And this meme in particular, the theme of “social problems are caused by evil people,” is astonishingly universal across entertainment.

***

That this meme is wrong is the first of two big insights that I’ve been internalizing more and more in the past year. These are:

When stuff goes wrong, or the world seems like it’s stuck in shitty and totally repairable ways, the only explanation is not evil people. In fact, this is often the least helpful explanation.
Talking about the “motives” of an institution can be extremely useful. These motives can overpower the motives of the individuals that make up that institution, making them more or less irrelevant. In this way, we can end up with a description of institutions with weird desires and inclinations that are totally distinct from those of the people that make them up, and yet the institutions are in charge of what actually happens in the world.

On the second insight first: this is a sense in which institutions can be very very powerful. It’s not just the sense of powerful that means “able to implement lots of large-scale policies and cause lots of big changes”. It’s more like “able to override the desires of individuals within your range of influence, manipulating and bending them to your will.”

I was talking to my sister and her fiancé, both law students, about the US judicial system, and Supreme Court justices in particular. I wanted to understand what it is that really constrains the decisions of these highest judicial authorities; what are the forces that result in Justice Ginsberg writing the particular decision that she ends up writing.

What they ended up concluding is that there are essentially no such external forces.

Sure, there are ways in which Supreme Court justices can lose their jobs in principle, but this has never actually happened. And Congress can and does sometimes ignore Supreme Court decisions on statutory issues, but this doesn’t generally give the Justices any less reason to write their decision any differently.

What guides Justice Ginsberg is what she believes is right – her ideology – and perhaps legacy. In other words, purely internal forces. I wanted to think of other people in positions that allow similar degrees of power in ability to enact social change, and failed.

The first sense of power as ‘able to cause lots of things to happen” really doesn’t align with the second sense of ‘free from external constraints on your decision-making‘. An autocratic ruler might be plenty powerful in terms of ability to decide economic policy or assassinate journalists or wage war on neighboring states, but is highly constrained in his decisions by a tight incentive structure around what allows him to keep doing these things.

On the other hand, a Supreme Court justice could have total power to do whatever she personally desires, but never do anything remarkable or make any significant long-term impact on society.

The fact that this is so rare – that we could only think of a single example of a position like this – tells us about the way that powerful institutions are able to warp and override the individual motivations of the humans that compose them.

The rest of this post is on the first insight, about the idea that social problems are often not caused by evil people. There are two general things to say about evil people:

I think that it’s often the case that “evil people” is a very surface-level explanation, able to capture some aspects of reality and roughly get at the problem, but not touching anywhere near the roots of the issue. One example of this may be when you ask people what the cause of the 2007 financial crisis was, and they go on about greedy bankers destroying America with their insatiable thirst for wealth.
While they might be landing on some semblance of truth there, they are really missing a lot of important subtlety in terms of the incentive structures of financial institutions, and how they led the bankers to behave in the way that they did. They are also very naturally led to unproductive “solutions” to the problems – what do we do, ban greed? No more bankers? Chuck capitalism? (Viva la revolución?) If you try to explain things on the deeper level of the incentive structures that led to “greedy banker” behavior, then you stand a chance of actually understanding how to solve the root problem and prevent it from recurring.
Appeals to “evil people” can only explain a small proportion of the actual problems that we actually see in the world. There are a massive number of ways in which groups of human beings, all good people not trying to cause destruction and chaos or extinguish the last lights of hope in the universe, can end up steering themselves into highly suboptimal and unfortunate states.

My main goal in this post is to try to taxonomize these different causes of civilizational failure.

Previously I gave a barebones taxonomy of some of the reasons that low-hanging policy fruits might be left unplucked. Here I want to give a more comprehensive list.

***

I think a useful way to frame these issues is in terms of Nash equilibria. The worst-case scenario is where there are Pareto improvements all around us, and yet none of these improvements correspond to worlds that are in a Nash equilibrium. These are cases where the prospect of improvement seems fairly hopeless without a significant restructuring of our institutions.

Slightly better scenarios are where we have improvements that do correspond to a world in a Nash equilibrium, but we just happen to be stuck in a worse Nash equilibrium. So to start with, we have:

The better world is not in a Nash equilibrium
The better world is in a Nash equilibrium

I think that failures of the first kind are very commonly made amongst bright-eyed idealists trying to imagine setting up their perfect societies.

These types of failures correspond to questions like “okay, so once you’ve set up your perfect world, how will you assure that it stays that way?” and can be spotted in plans that involve steps like “well, I’m just assuming that all the people in my world are kind enough to not follow their incentives down this obvious path to failure.”

Nash equilibria correspond to stable societal setups. Any societal setup that is not in a Nash equilibrium can fairly quickly be expected to degenerate into some actually stable societal set-up.

The ways in which a given societal set up fails to be stable can be quite subtle and non-obvious, which I suspect is why this step is so often overlooked by reformers that think they see obvious ways to improve the world.

One of my favorite examples of this is the make-up problem. It starts with the following assumptions: (1) makeup makes people more attractive (which they want to be), and (2) an individual’s attractiveness is valued relative to the individuals around them.

Let’s now consider two societies, a make-up free society and a makeup-ubiquitous society. In both societies, everybody’s relative attractiveness is the same, which means that nobody is better off or worse off in one society over another on the basis of their attractiveness.

But the society in which everybody wears makeup is worse for everybody, because everybody has to spend a little bit of their money buying makeup. In other words, the makeup-free world represents a Pareto improvement over the makeup-ubiquitous world.

What’s worse; the makeup-free world is not in a Nash equilibrium, and the makeup-ubiquitous society is!

We can see this by imagining a society that starts makeup-free, and looking at the incentives of an individual within that society. This individual only stands to gain by wearing makeup, because she becomes more attractive relative to everybody else. So she buys makeup. Everybody else reasons the same way, so the make-up free society quickly degenerates into its equilibrium version, the makeup-ubiquitous society.

Sure, she can see that if everybody reasoned this way, then she will be worse off (she would have spent her money and gained nothing from it). But this reasoning does not help her. Why? Because regardless of what everybody else does, she is still better off wearing makeup.

If nobody wears makeup, then her relative attractiveness rises if she wears makeup. And if everybody else wears makeup, then her relative attractiveness rises if she wears makeup. It’s just that it’s rising from a lower starting point.

So no matter what society we start in, we end up in the suboptimal makeup-ubiquitous society. (I have to point out here that this is assuming a standard causal decision theory framework, which I think is wrong. Timeless decision theory will object to this line of reasoning, and will be able to maintain a makeup free equilibrium.)

We want to say “but just in this society assume that everybody is a good enough person to recognize the problem with makeup-wearing, and doesn’t do so!“

But that’s missing the entire point of civilization building – dealing with the fact that we will end up leaving non-Nash-equilibrium societal setups and degenerating in unexpected ways.

This failure mode arises because of the nature of positional goods, which are exactly what they sound like. In our example, attractiveness is a positional good, because your attractiveness is determined by looking at your position with respect to all other individuals (and yes this is a bit contrived and no I don’t think that attractiveness is purely positional, though I think that this is in part an actual problem).

To some degree, prices are also a positional good. If all prices fell tomorrow, then everybody would quickly end up with the same purchasing power as they had yesterday. And if everybody got an extra dollar to spend tomorrow, then prices would rise in response, the value of their money would decrease, and nobody would be better off (there are a lot of subtleties that make this not actually totally true, but let’s set that aside for the sake of simplicity).

Positional goods are just one example where we can naturally end up with our desired societies not being Nash equilibria.

The more general situation is just bad incentive structures, whereby individuals are incentivized to defect against a benevolent order, and society tosses and turns and settles at the nearest Nash equilibrium.

The better world is not a Nash equilibrium
- Positional goods
- Bad incentive structures
The better world is a Nash equilibrium

***

If the better world is in a Nash equilibrium, then we can actually imagine this world coming into being and not crumbling into a degenerate cousin-world. If a magical omniscient society-optimizing God stepped in and rearranged things, then they would likely stay that way, and we’d end up with a stable and happier world.

But there are a lot of reasons why all of us that are not magical society-optimizing Gods can do very little to make the changes that we desire. Said differently, there are many ways in which current Nash equilibria can do a great job of keeping us stuck in the existing system.

Three basic types of problems are (1) where the decision makers are not incentivized to implement this policy, (2) where valuable information fails to reach decision makers, and (3) where decision makers do have the right incentives and information, but fail because of coordination problems.

The better world is not a Nash equilibrium
- Positional goods
- Bad incentive structures
The better world is a Nash equilibrium
- You can’t reach it because you’re stuck in a lesser Nash equilibrium.
  - Lack of incentives in decision makers
  - Asymmetric information
  - Coordination problems

Lack of incentives in decision makers can take many forms. The most famous of these occurs when policies result in externalities. This is essentially just where decision-makers do not absorb some of the consequences of a policy.

Negative externalities help to explain why behaviors that are net negative to society exist and continue (resulting in things like climate change and overfishing, for example), and positive externalities help to explain why some behaviors that would be net positive for society are not happening.

An even worse case of misalignment of incentives would be where the positive consequences on society would be negative consequences on decision-makers, or vice-versa. Our first-past-the-post voting system might be an example of this – abandoning FPTP would be great exactly because it allows us to remove the current set of decision-makers and replace them with a better set. This would great for us, but not so great for them.

I’m not aware of a name for this class of scenarios, and will just call it ‘perverse incentives.’

I think that this is also where the traditional concept of “evil people” would lie – evil people are those whose incentives are dramatically misaligned. This could mean that they are apathetic towards societal improvements, but typically fiction’s common conception of villains is individuals actively trying to harm society.

Lack of liquidity is another potential source of absent incentives. This is where there are plenty of individuals that do have the right incentives, but there is not enough freedom for them to actually make significant changes.

An example of this could be if a bunch of individuals all had the same idea for a fantastic new app that would perform some missing social function, and all know how to make the app, but are barred by burdensome costs of actually entering the market and getting the app out there.

The app will not get developed and society will be worse off, as a result of the difficulty in converting good app ideas to cash.

Lack of incentives in decision makers
- Misalignment of incentives
  - Externalities
  - Perverse incentives
    - Evil people
  - Lack of liquidity

***

Asymmetric information is a well-known phenomenon that can lead societies into ruts. The classic example of this is the lemons problem. There are versions of asymmetric information problems in the insurance market, the housing market, the health care market and the charity market.

This deserves its own category because asymmetric information can bar progress, even when decision-makers have good incentives and important good policy ideas are out there.

Lack of incentives in decision makers
- Misalignment of incentives
  - Externalities
  - Perverse incentives
    - Evil people
  - Lack of liquidity
- Asymmetric information

And of course, there are coordination problems. The makeup example given earlier is an example of a coordination problem – if everybody could successfully coordinate and avoid the temptation of makeup, then they’d all end up better off. But since each individual is incentivized to defect, the coordination attempts will break down.

Coordination problems generally occur when you have multi-step or multi-factor decision processes. I.e. when the decision cannot be unilaterally made by a single individual, and must be done as a cooperative effort between groups of individuals operating under different incentive structures.

A nice clear example of this comes from Eliezer Yudkowsky, who imagines a hypothetical new site called Danslist, designed to be a competitor to Craigslist.

Danslist is better than Craigslist in every way, and everybody would prefer that it was the site in use. The problem is that Craigslist is older, so everybody is already on that site.

Buyers will only switch to Danslist if there are enough sellers there, and sellers will only switch to Danslist if there are enough buyers there. This makes the decision to switch to Danslist a decision that is dependent on two factors, the buyers and the sellers.

In particular, an N-factor market is one where there are N different incentive structures that must interact for action to occur. In N-factor markets, the larger N is, the more difficult it is to make good decisions happen.

This is really important, because when markets are stuck in this way, inefficiencies arise and people can profit off of the sub-optimality of the situation.

So Craigslist can charge more than Danslist, while offering a worse service, as long as this doesn’t provide sufficient incentive for enough people to switch over.

Yudkowsky also talks about Elsevier as an instance of this. Elsevier is a profiteer that captured several large and prestigious scientific journals and jacked up subscription prices. While researchers, universities, and readers could in principle just unanimously switch their publication patterns to non-Elsevier journals, this involves solving a fairly tough coordination problem. (It has happened a few times)

One solution to coordination problems is an ability to credibly pre-commit. So if everybody in the makeup-ubiquitous world was able to sign a magical agreement that truly and completely credibly bound their future actions in a way that they couldn’t defect from, then they could end up in a better world.

When individuals cannot credibly pre-commit, then this naturally results in coordination problems.

And finally, there are other weird reasons that are harder to categorize for why we end up stuck in bad Nash equilibria.

For instance, a system in which politicians respond to the wills of voters and are genuinely accountable to them seems like a system with a nicely aligned incentive structure.

But if for some reason, the majority of the public resists policies that will actually improve their lives, or push policies that will hurt them, then this system will still end up in a failure mode. Perhaps this failure mode is not best expressed as a Nash equilibrium, as there is a sense in which voters do have the incentive to switch to a more sensible view, but I will express it as such regardless.

This looks to me like what is happening with popular opinion about minimum wage laws.

Huge amounts of people support minimum wage laws, including those that may actually lose their jobs as the result of those laws. While I’m aware that there isn’t a strong consensus among economists as to the real effects of a moderate minimum-wage increase, it is striking to me that so many people are so convinced that it can only be net positive for them, when there is plenty of evidence that it may not be.

Another instance of this is the idea of “wage stickiness”.

This is the idea that employers are more likely to fire their workers than to lower their wages, resulting in an artificial “stickiness” to the current wages. The proposed reason for why this is so is that worker morale is hurt more by decreased wages than by coworkers being fired.

Sticky wages are especially bad when you take into account inflation effects. If an economy has an inflation rate of 10%, then an employer that keeps her employees’ wages constant is in effect cutting their wages by 10%. Even if she raises their wages by 5%, they’re still losing money!

And if the economy enters a recession, with say an inflation rate of -5%, then an employer will have to cut wages by 5% in order to stay at the market equilibrium. But since wages are sticky and her workers won’t realize that they are actually not losing any money despite the wage cut, she will be more likely to fire workers instead.

A friend described to me an interaction he had had with a coworker at a manufacturing plant. My friend had been recently hired in the same position as this man, and was receiving the minimum wage at 5 dollars an hour.

His coworker was telling him about how he was being paid so much, because he had been working there so many years and was constantly getting pay raises. He was mortified when he compared wages with my friend, and found that they were receiving the exact same amount.

Status quo bias is another important effect to keep in mind here. Individuals are likely to favor the current status quo, for no reason besides that it is the status quo. This type of effect can add to political inertia and further entrench society in a suboptimal Nash equilibrium.

I’ll just lump all of these effects in as “Stupidity & cognitive biases.”

***

I want to close by adding a third category that I’ve been starting to suspect is more important than I previously realized. This is:

The better world is in a Nash equilibrium, and you can reach it, and you will reach it, just WAIT a little bit.

I add this because I sometimes forget that society is a massive complicated beast with enormous inertia behind its existing structure, and that just because some favored policy of yours has not yet been fully implemented everywhere, this does not mean that there is a deep underlying unsolvable problem.

So, for instance, one time I puzzled for a couple weeks about why, given the apparently low cost of ending global poverty forever, it still exists.

Aren’t there enough politicians that are aware of the low cost? And aren’t they sufficiently motivated to pick up the windfall of public support and goodwill that they would surely get? (To say nothing of massively improving the world)

Then I watched Hans Rosling’s 2008 lecture “Don’t Panic” (which, by the way, should be required watching for everyone) and realized that global poverty is actually being ended, just slowly and gradually.

The UN set a goal in 2000 to completely end all world poverty by 2030. They’ve already succeeded in cutting it in half, and are five years ahead of their plan.

We’re on course to see the end of extreme poverty; it’ll just take a few more years. And after all, it should be expected that raising an entire segment of the world’s population above the poverty line will take some time.

So in this case, the answer to my question of “Why is this problem not being solved, if solutions exist?” was actually “Um, it is being solved, you’re just impatient.”

And earlier I wrote about overfishing and the ridiculously obvious solutions to the problem. I concluded by pessimistically noting that the fishing lobby has a significant influence over policy makers, which is why the problem cannot by solved.

While the antecedent of this is true, it is in fact the case that ITQ policies are being adopted in more and more fisheries, the Atlantic Northwest cod fisheries are being revived as a result of marine protection policies, and governments are making real improvements along this front.

This is a nice optimistic note to end on – the idea that not everything is a horrible unsolvable trap and that we can and do make real progress.

***

So we have:

The better world is not a Nash equilibrium
- Positional goods
- Bad incentive structures
The better world is a Nash equilibrium
- You can’t reach it because you’re stuck in a lesser Nash equilibrium.
  - Lack of incentives in decision makers
    - Misalignment of incentives
      - Externalities
      - Perverse incentives
      - Lack of liquidity
  - Asymmetric information
  - Coordination problems
    - Multi-factor markets
    - Multi-step decision processes
    - Inability to pre-commit
  - Stupidity & cognitive biases
- You can and will reach it, just be patient.

I don’t think that this overall layout is perfect, or completely encompasses all failure modes of society. But I suspect that it is along the right lines of how to think about these issues. I’ve had conversations where people will say things like “Society would be better if we just got rid of all money” or “If somebody could just remove all those darned Republicans from power, imagine how much everything would improved” or “If I was elected dictator-for-life, I could fix all the world’s problems.”

I think that people that think this way are often really missing the point. It’s dead easy to look at the world’s problems, find somebody or something to point at and blame, and proclaim that removing them will fix everything. But the majority of the work you need to do to actually improve society involves answering really hard questions like “Am I sure that I haven’t overlooked some way in which my proposed policy degenerates into a suboptimal Nash equilibrium? What types of incentive structures naturally arise if I modify society in this way? How could somebody actually make this societal change from within the current system?”

That’s really the goal of this taxonomy – is to try to give a sense of what the right questions to be asking are.

(More & better reading along these same lines here and here.)

Dialogue: Why you should one-box in Newcomb’s problem

December 24, 2017March 2, 2018 ~ squarishbracket ~ 13 Comments

(Nothing original here, just my presentation of the most interesting arguments I’ve seen on the various sides)

Newcomb’s problem: You find yourself in a room with two boxes in it. Box #1 is clear, and you can see $10,000 inside. Box #2 is opaque. A loud voice announces to you: “Box 2 has either 1 million dollars inside of it or nothing. You have a choice: Either you take just Box 2 by itself, or you take both Box 1 and Box 2.”

As you’re reaching forward to take both boxes, the voice declares: “Wait! There’s a catch.

Sometime before you entered the room, a Predictor with enormous computing power scanned you, made an incredibly detailed simulation of you, and used it to make a prediction about what decision you would make. The Predictor has done similar simulations many times in the past, and has never been wrong. If the Predictor predicted that you would take just Box 2, then it filled up the box with 1 million dollars. And if the Predictor predicted that you would take both boxes, then it left Box 2 empty. Now you may make your choice.”

The most initially intuitive answer to most people is to take both boxes. Here’s the strongest argument for why this makes sense, presented by Claus the causal thinker.

Claus: “The Predictor has already made its prediction and fixed the contents of the box. So we know for sure that my decision can’t possibly have any impact on whether Box 2 is full or empty. And in either case, I am better off taking both boxes than just one! Think about it like this: whether I one-box or two-box, I still end up taking Box 2. So let’s consider Box 2 taken – I have no choice in the matter. Now the only real question is if I’m also going to take Box 1. And Box 1 has $10,000 inside it! I can see it right there! My choice is really whether to take the free $10,000 or not, and I’d be a fool to leave it behind.”

***

Claus makes a very convincing argument. On his calculation, the expected value of two-boxing is strictly greater than the expected value of one-boxing, regardless of what probabilities he puts on the second box being empty. We’ll call this the dominance argument.

But Claus is making a fundamental error in his calculation. Let’s let a different type of decision theorist named Eve interrogate Claus.

Eve: “So, Claus, I’m curious about how you arrived at your answer. You say that your decision about whether or not to take Box 1 can’t possibly impact the contents of Box 2. I think that I agree with this. But do you agree that if you don’t take Box 1, it is more likely that Box 2 has a million dollars inside it?”

Claus: “I can’t see how that could be the case. The box’s contents are already fixed. How could my decision about something entirely causally unrelated make it any more likely that the contents are one way or the other?”

Eve: “Well, it’s not actually that unusual. There are plenty of things that are correlated without any direct causal impact between them. For example, say that a certain gene causes you to be a good juggler, but also causes a high chance of a certain disease. In this case, juggling ability and incidence of the disease will end up being correlated in the population, even though neither one is directly causing the other. And if you’re a good juggler, then you should be more worried that maybe you also have the disease!”

Claus: “Sure, but I don’t see how that case is anything like this one…”

Eve: “The two cases are actually structurally identical! Let me draw some causal diagrams…” (Claus rummages around for paper and a pencil)

Newcomb vs Juggling alt.png

Eve: “In our disease example, we have a common cause (the gene) that is directly causally linked to both the disease and to being a good juggler. So the “disease” variable and the “good juggler” variable are dependent because of the “gene” variable. In your Newcomb problem, the common cause is your past self at the moment that the Predictor scanned you. This common cause is directly linked to both your decision to one-box or two-box in the present, and to the contents of the box. Which means that in the exact same way that being a good juggler makes you more likely to have the disease, two-boxing makes you more likely to end up with an empty Box 2! The two cases are exactly analogous!”

Claus: “Hmm, that all seems correct. But even if my decision to take Box 1 isn’t independent of the contents of Box 2, this doesn’t necessarily mean that I shouldn’t still take both.”

Eve: “Right! But it does invalidate your dominance argument, which implicitly rested on the assumption that you could treat the contents of the box as if they were unaffected by your action. While your actions do not strictly speaking causally effect the contents of the box, they do change the likelihoods of the different possible contents! So there is a real sense in which your actions do statistically affect the contents of the box, even though they don’t causally affect them. Anyway, we can just calculate the actual expected values and see whether one-boxing or two-boxing comes out ahead.”

Eve writes out some expected utility calculations:

Newcomb vs Juggling 2

Eve: “So you see, it actually turns out to always be better to one-box than to two-box!”

Claus: “Hmm, I guess you’re right. Okay never mind, I guess that I’ll one-box. Thanks!”

***

Claus goes away for a while, and comes back a more sophisticated causal thinker.

Claus: “Hey, remember that I agreed that my decision and the contents of Box 1 are actually dependent upon each other, just not causally?”

Eve: “Yes.”

Claus: “Well, I do still agree with that. But I am also still a two-boxer. I’ll explain – would you hand me that paper?”

Claus scribbles a few equations beneath Eve’s diagrams.

EDT vs CDT.png

Claus: “When you calculated the expected values of one-boxing and two-boxing, you implicitly used Equation (1). Let’s call this equation the “Evidential Decision Algorithm.” You summed over all the possible consequences of your actions, and multiplied the values of each consequence by the conditional probability of that consequence, given the decision.”

Eve: “Yes…”

Claus: “Well, I have a different way to calculate expected values! It’s Equation (2), and I call it the “Causal Decision Algorithm.” I also sum over all possible consequences, but I multiply the value of each consequence by its causal conditional probability, not it’s ordinary conditional probability! And when you calculate the expected value, it turns out to be larger for two-boxing!”

Eve: “Hmm, doesn’t this seem a little arbitrary? Maybe a little ad-hoc?”

Claus: “Not at all! The point of rational decision-making is to choose the decision that causes the best outcomes. What we should be interested in is only the causal links between our decisions and their possible consequences, not the spurious correlations.”

Eve: “Hmm, I can see how that makes sense…”

Claus: “Here, let’s look back at your earlier example about juggling and disease. I agree with what you said that if you observe that you’re a good juggler, you should be worried that you have the disease. But imagine that instead of just observing whether or not you’re a good juggler, you get to decide whether or not to be a good juggler. Say that you can decide to spend many hours training your juggling, and at the end of that process you know that you’ll be a good juggler. Now, according to your decision theory, deciding to train to become a good juggler puts you at a higher risk for having the disease. But that’s ridiculous! We know for sure that your decision to become a good juggler does not make you any more likely to have the disease. Since you’re deciding what actions to take, you should treat your decisions like causal interventions, in which you set the decision variable to one value or another and in the process break all other causal arrows directed at it. And that’s why you should be using the causal conditional probability, not the ordinary conditional probability!”

Eve: “Huh. What you’re saying does have some intuitive appeal. But now I’m starting to think that there is an important difference that we both missed between the juggling example and Newcomb’s problem.”

Eve draws two more diagrams on a new page.

Newcomb vs Juggling 3.png

Eve: “In the juggling case, it makes sense to describe your decision to become a good juggler or not as a causal intervention, because this decision is not part of the chain of causes leading from your genes to your juggling ability – it’s a separate cause, independent of whether or not you have the gene. But in Newcomb’s problem, your decision to one-box or two-box exists along the path of the causal arrow from your past character to your current action! The Predictor predicted every part of you, including the part of you that’s thinking about what action you’re going to take. So while modeling your decision as a causal intervention in the juggling example makes sense, doing so in Newcomb’s case is just empirically wrong! Whatever part of your brain ends up deciding to “intervene” and two-box, the Predictor predicted that this would happen! By the nature of the problem, any way in which you attempt to intervene on your decision will inevitably not actually be a causal intervention.”

***

(Tim, a new type of decision theorist, appears in a puff of smoke)

Claus and Eve: “Gasp! Who are you?”

Tim: “I’m Tim, a new type of decision theorist! And I’m here to say that you’re both wrong!”

Claus and Eve: “Gasp!”

Tim: “I’ll explain with a thought experiment. You both know the prisoner’s dilemma, right? Two prisoners each get to make a choice either to cooperate or defect. The best outcome for each one is that they defect and the other prisoner cooperates, the second best outcome is that both cooperate, the second worst is that they both defect, and the worst is that they cooperate and the other prisoner defects. Famously, two rational agents in a prisoner’s dilemma will end up both defecting, because defecting dominates cooperating as a strategy. If the other prisoner defects, you’re better off defecting, and if the other prisoner cooperates, you’re better off defecting. So you should defect.”

Claus and Eve: “Yes, that seems right…”

Tim: “Well, first of all notice that two rational agents end up behaving in a sub-optimal way. They would both be better off if they each cooperated. But apparently, being ‘rational’ in this case entails ending up worse off. This should be a little unusual to you if you think that rational decision-making is about optimizing your outcomes. But now consider this variant: now you are in a prisoner’s dilemma with an exact clone of yourself. You have identical brains, have lived identical lives, and are now making this decision in identical settings. Now what do you do?”

Claus: “Well, on my decision theory, it’s still the case that I can’t causally effect my clone with my decision. This means that when I treat my decision as an intervention, I won’t end up making the probability that my clone defects given that I defect any higher. So defecting still dominates cooperating as a strategy. I defect!”

Eve: “Well, my answer depends on the set-up of the problem. If there’s some common cause that explains why my clone and I are identical (like maybe we were both manufactured in a twin-clone-making factory), then our decisions will be dependent. If I defect, then my clone will certainly defect, and if I cooperate, then my clone will cooperate. So my algorithm will tell me that cooperation maximizes expected utility.”

Tim: “There is no common cause. It’s by an insanely unlikely coincidence that you and your clone happen to have the same brains and to have lived the same lives. Until this moment, the two of you have been completely causally cut off from each other, with no possibility of any type of causal relationship .

Eve: “Okay, then I gotta agree with Claus. With no possible common cause and no causal intermediaries, my decision can’t affect my clone’s decision, causally or statistically. So I’ll defect too.”

Tim: “You’re both wrong. Both of you end up defecting, along with your clones, and everybody is worse off. Look, both of you ended up concluding that your decision and the decision of your clone cannot be correlated, because there are no causal connections to generate that correlation. But you and your clone are completely physically identical. Every atom in your brain is in a functionally identical spot as the atoms in your clone’s brain. Are you determinists?”

Eve: “Well, in quantum mechanics -”

Tim: “Forget quantum mechanics! For the purpose of this thought experiment, you exist in a completely deterministic world, where the same initial conditions lead to the same final conditions in every case, always. You and your clone are in identical initial conditions. So your final condition – that is, your decision about whether to cooperate or defect, must be the same. In the setup as I’ve described it, it is logically impossible that you defect and your clone cooperates, or that you cooperate and your clone defects.”

Claus: “Yes, I think you’re right… but then how do we represent this extra dependence in our diagrams? We can’t draw any causal links connecting the two, so how can we express the logical connection between our actions?”

Tim: “I don’t really care how you represent it in your diagram. Maybe draw a special fancy common cause node with special fancy causal arrows that can’t be broken towards both your decision and your clone’s decision.”

TDT Clones.png

Tim: “The point is: there are really only two possible worlds. In World 1, you defect and your clone defects. In World 2, you cooperate and your clone cooperates. Which world would you rather be in?”

Claus and Eve: “World 2.”

Tim: “Good! So you’ll both cooperate. Now, what if the clone is not exactly identical to you? Let’s say that your clone only ends up doing the same thing as you 99.999% of the time. Now what do you do?”

Claus: “Well, if it’s no longer logically impossible for my clone to behave differently from me, then maybe I should defect again?”

Tim: “Do you really want a decision theory that has a discontinuous jump in your behavior from a 99.999% chance to a 100% chance? I mean, I’ve told you that the chance that the clone gives a different answer than your answer is .001%! Rational agents should take into account all of their information, not only selective pieces of it. Either you ignore this information and end up worse off, or you take it into account and win!”

Eve: “Okay, yes, it seems reasonable to still expect a 99.999% chance of identical choices in this case. So we should cooperate again. But what does all of this have to do with Newcomb’s problem?”

Tim: “It relates to your answers to Newcomb’s problem in two ways. First, it shows that both of your decision algorithms are wrong! They are failing to take into account that extra logical dependency between actions and consequences that we drew with fancy arrows. And second, Newcomb’s problem is virtually identical to the prisoner’s dilemma with a clone!”

Eve and Claus: “Huh?”

Tim: “Here, let’s modify the prisoner’s dilemma in the following way: If you both cooperate, then you get one million dollars. If you cooperate and your clone defects, you get $0. If you defect and your clone cooperates, you get $1,010,000. And if you both defect, then you get $10,000. Now “cooperating” is the same as one-boxing, and “defecting” is the same as two-boxing!”

Eve: “But hold on, isn’t the logical dependency between my actions and my clone’s actions not carried over to the prisoner’s dilemma? Like, it’s not logically impossible that I one-box and the box has a million dollars in it, right?”

Tim: “It is with a perfect Predictor, yes! Remember, the Predictor works by creating a perfect simulation of you and seeing what it does. This means that your decision to one-box or to two-box is logically dependent on the Predictor’s prediction of what you do (and thus the contents of the box) in the exact same way that your decision to cooperate is logically dependent on your clone’s decision to one-box!”

TDT

Claus: “Yes, I see. So with a perfect Predictor, there are really only two worlds to consider: one in which I one-box and get a million dollars, and another in which I two-box and get just $10,000. And of course I prefer the first, so I should one-box.”

Tim: “Exactly! And if the Predictor is not perfectly accurate, and is only right 99.999% of the time…”

Eve: “Well, then there’s still only a .001% chance that I two-box and get an extra million bucks. So, I’m still much better off if I one-box than if I two-box.”

Tim: “Yep! It sounds like we’re all on the same page then. There’s a logical dependence between your action and the contents of the box that you are rationally required to take into account, and when you do take it into account, you end up seeing that one-boxing is the rational action.”

***

The decision theory that “Tim” is using is called timeless decision theory. It’s also been variously called functional decision theory, logical decision theory, and updateless decision theory.

Timeless decision theory ends up better off in Newcomb-like problems, invariably walking away with 1 million dollars instead of $10,000. It also does better than evidential decision theory (Eve’s theory) and causal decision theory (Claus’s theory) at prisoner’s-dilemmas-with-a-clone. These are fairly contrived problems, and it’d be easy for Eve or Claus to just deny that these problems have any real-world application.

But timeless decision theorists also cooperate with each other in ordinary prisoner’s dilemmas. They have a much easier time with coordination problems in general. They do better in bargaining problems. And they can’t be blackmailed in a large general class of situations. It’s harder to write these results off as strange quirks that don’t relate to real life.

A society of TDTs wouldn’t be plagued with doubts about the rationality of voting, wouldn’t find themselves stuck in as many sub-optimal Nash equilibria, and would look around and see a lot fewer civilizational inadequacies and low-hanging policy fruit than we currently have. This is what’s most interesting to me about TDT – that it gives a foundation for rational decision-making that seems like it has potential for solving real civilizational problems.

Opt-out organ donation

December 24, 2017March 2, 2018 ~ squarishbracket ~ Leave a comment

(Mostly interested in this for two reasons: (1) the research in cognitive science about default effects and other unintuitive cognitive biases and (2) the adequacy implications of the lack of implementation of this policy)

In the United States, around 95% of the population approves of organ donation, while only 54% have granted permission for their organs to be used after death. Surveys in the UK indicate that the percentage that approve organ donation is around 90%, but only 25% of the population is registered on the Organ Donation Registry. Many other countries have similar patterns.

When polled, the reasons given for not explicitly registering for organ donation are things like laziness, confusion about the process and unwillingness to think about death.

And it’s actually worse than this – many countries have ‘soft’ organ-donation policies, meaning that family members can override the wishes of the deceased. Families are more likely to veto the decision to donate than the decision to not donate, further decreasing the number of organs available for transplant.

And this number really really matters. There are over 100,000 people in need of a life saving organ transplant in the United States, and over seven thousand people died last year while waiting. This amounts to 20 people every day. And in the UK and the US, the gap between available organs and patients awaiting transplantation is only growing.

***

Psychologists have studied the effects of default options on expressed preferences. One experiment told subjects to imagine that they had just moved to a new state, and that they had to decide whether or not to be organ donors. Some subjects were told that the default was to be an organ donor, and their choice was to confirm or change that status. Others were told the opposite – that the default was to not be an organ donor. The results were dramatic: about two times more people became donors when this was the default than when it was not. The simple framing effect of “confirm the default or change?” had the power to cut organ donations in half.

The real-world equivalent of this is whether a country has an opt-in or opt-out organ donation system. The UK and the US have an opt-in system, which means that the default choice is to not be an organ donor. Other countries, like Austria, Belgium, Spain and Sweden, have an opt-out system.

This difference in policy has huge differences in the percentage of the population that consents to organ donation. When Austria and Belgium changed from an opt-in to an opt-out system, donation rates more than doubled. When Singapore changed to opt-out, their donation rates more than sextupled. And comparisons between countries that have different policies are similarly impressive. Germany and Austria, similar countries in many ways except for their donation policy showed an almost 88% difference in effective consent rates.

Consider for a moment how strange this is. In the United States, all it requires to become an organ donor is to check a box when registering for a driver’s license at the DMV. Can it really be that a simple difference in whether the box means “become an organ donor” or “stop being an organ donor” is preventing millions of people from becoming organ donors? Classical economics would certainly not predict this – it is presumed that if somebody has a preference about whether or not to be an organ donor, a tiny difference in framing should not have such huge effects on their behavior.

But apparently the answer is that yes, these tiny differences do matter. And our strange little human quirks can be hugely important in deciding on how to make effective policy.

***

Ultimately, we are left with an adequacy question. Opt-out organ donation policies seem to me like low-hanging policy fruit. If policy-makers care to eliminate thousands of needless deaths, and are aware of these policies, then why aren’t they already implemented in the US and the UK?

The spiritual and the scientific

December 23, 2017March 2, 2018 ~ squarishbracket ~ 4 Comments

There’s an Isaac Asimov quote that I love. It goes:

When people thought the Earth was flat, they were wrong. When people thought the Earth was spherical, they were wrong. But if you think that thinking the Earth is spherical is just as wrong as thinking the Earth is flat, then your view is wronger than both of them put together.

I was recently reminded of this because I’m at an ashram this week, and in one of the talks, a swami brought up his beef with science.

He talked about how science is just another form of faith, and that therefore our intuition is a perfectly valid guide to understanding the universe. After all, all of our past scientific theories have turned out to be wrong, so we should expect that our current theories will also turn out to be wrong.

Thus the Asimov.

***

For various reasons, I’m often in spiritual places surrounded by spiritual people. These are the types of people that say “I believe in all religions” and go to yoga retreats and read books about sacred healing and ancient wisdom. When I’m at these places, people sometimes find out that I’m a physics student who is interested in things like Science and Rationality. The types of responses I get are interesting.

Usually the people I talk to are enthusiastic and eager to talk about the most recent scientific discoveries they’ve heard of. They’re also quick to point out that Science can’t tell us everything, and after all there are the virtues of faith to be considered. Other times I feel a subtle shift in attitude. This might be paranoia, but it’s as I’ve been registered as somebody belonging to the Other Team.

And after all, important swamis declare that science is just another form of faith, and spiritual people nod knowingly. And the Deepak Chopras of the world declare with relish that science cannot tell us objective truths, and that scientists are arrogant and dogmatic.

This is all very weird to me. Science is our best systematized attempt to understand the world we live in and to unearth the general principles that guide this world. Great scientists are guided by a fascination with the order of the universe and wonder at its comprehensibility. At their root they want to understand, in Einstein’s words, the mind of God.

And the spiritual tell me that “spiritual” means something like “interested in pondering the nature of reality at a deep level and appreciating the awe-inspiring and profound aspects of existence.”

If this is how I should understand these terms, then spirituality and science are two things that should definitely definitely not be enemies. In fact, if “spiritual” meant what the spiritual claim it means, then the best spiritual seekers should be the same people as the best scientists.

Look at this quote from Carl Sagan:

Science is not only compatible with spirituality; it is a profound source of spirituality. When we recognize our place in an immensity of light years and in the passage of ages, when we grasp the intricacy, beauty and subtlety of life, then that soaring feeling, that sense of elation and humility combined, is surely spiritual.

And from Neil Degrasse Tyson:

It’s quite literally true that we are star dust, in the highest exalted way one can use that phrase. I bask in the majesty of the cosmos.

Not only are we in the universe, the universe is in us. I don’t know of any deeper spiritual feeling than what that brings upon me.

Are these not expressions of an utmost appreciation for the spiritual, as defined above? Why don’t the spiritual embrace Neil Degrasse Tyson and his scientific colleagues with open arms as fellow earnest truth-seekers, and marvel at the beauty of the universe together? I mean, just look at the man – he’s practically overflowing with the type of joy and curiosity that the spiritual should love!

The spiritual will tell me: “Yes, some of the greatest scientists are very spiritual. Look at Einstein! He said that science without religion is lame, and that all serious scientists recognize a Spirit in the laws of nature! Science at its best can be and should be a deeply spiritual enterprise. But unfortunately, a lot of scientists out there are just too close-minded. This is why the spiritual can sometimes sound anti-science, because the scientists of the world dogmatically reject our reasonable beliefs, like that the spiritually enlightened can read minds and make objects levitate, or that the stars are sending us secret messages about our romantic prospects and whether we should change jobs, or that playing cards thrown randomly onto the ground can accurately tell us our future!”

Yes, scientists can be dogmatic, because scientists are humans. But it strikes me that perhaps part of the reason that the spiritual might claim that scientists are especially dogmatic has maybe something to do with the fact that scientists have repeatedly studied and disproved common spiritual beliefs and practices. More importantly, many of these beliefs are in direct conflict with the known laws of nature. As the saying goes: keep your mind open – but not so open that your brains fall out.

The spiritual: “But science too often tries to go too far and dismiss those things which it doesn’t understand!”

What, like the possible physical effects that the stars could have on the paths that our lives take? Or like the effects of diluting a chemical compound until not a single molecule remains on the potency of the final product as a medical instrument? Or the ways that the lines on your palm form, that really really have nothing to do with how rich you’ll be or how many kids you’ll end up having?

No, this won’t do. Science does not understand everything. There are plenty of mysteries out there, and we love that there are. They give scientists employment! But scientists are certainly not in the business of blindly dismissing those things that they actually do not understand.

Besides, are scientists really all that dogmatic? Look at the history of the scientific worldview. Consensus theories are constantly recycled as we make the long march towards understanding reality. Some of the strongest scientific consensuses are only a few decades old! Scientists are constantly updating and refurnishing their view of reality as the evidence changes.

Perfectly? No! But I’d hazard a guess that they do so better than the average person. Why? For one thing, they have a career incentive to do so. A scientist that sticks to the old phlogiston-theory of combustion can’t get published, and a scientist that discovers damning evidence of the falsity of an important consensus gets tenure, pay raises, and respect from their colleagues. The incentive structure of science is set up to reward those that can avoid becoming stuck in dogmatic patterns of belief.

***

Physicist and philosopher Tim Maudlin described a feature of truth-seeking enterprises as that they tend to be uniform across space and to vary across time. Ask a biologist in Bengal what they think about the structure of DNA, and you’ll get pretty much the same answer as a biologist at Oxford. And when new evidence comes in, the beliefs of scientists shift fairly uniformly.

Ask a spiritual seeker in India what they think about Shamanic healing, and you’ll likely get a different answer from a spiritual seeker in the UK.

Yes, science has problems and is definitely not perfect. But we’re not comparing it to an ideal perfected version of science conducted by perfect Bayesian epistemologists with infinite computing power, we’re comparing it to humanity’s status quo. With rampant climate change denial, young Earth creationism, disbelief in evolution and anti-vaccination conspiracies, it’d be hard to convince me that scientists are much worse than the average Joe at avoiding patterns of dogmatic thought.

I just don’t buy that the high epistemic standards and regard for truth held by the spiritual is the reason that they dismiss science. I’ve met too many spiritual people eager to have their charts read by astrologers or obtain homeopathic sugar pills or communicate with invisible spirits. And I don’t buy that scientists are not actually honest truth-seekers trying to understand the world.

Which is why I think that the word spiritual doesn’t actually mean what the spiritual claim it means. I’m not being a linguistic prescriptivist here; I’m saying that the definition that spiritual people provide of spirituality is the motte, and the bailey is something else, something that is apparently hostile to science and friendly to all sorts of pseudoscientific ideas.

The bailey is where the fertile and valuable ideological land is, and the motte is the easily defensible position that spiritual people can retreat to when their beliefs are questioned. The bailey is not actually fundamentally about the urge to understand nature. It’s not actually about the same type of wonder and joy that a scientist gets when they understand some important piece of how the world works. Based off of many of the interactions I’ve had with self-identified spiritual people, I would define it as something like “belief in the existence of some phenomenon for which there is no evidence, or evidence against, like Reiki, crystal healing, tarot cards, etc.”

***

Looking at what I’ve written so far, it sounds like I see nothing but conflict between spirituality and science. This is not so. I have focused on the aspects of spirituality that do come into conflict with science, mostly because I think that these play a large role in the anti-scientific attitudes among the spiritual. The spiritual are quite friendly towards science when it supports their beliefs.

And it often does! There are spiritual practices that science has found to be genuinely beneficial, more than predicted by placebo effects, and beneficial in many of the ways that the spiritual claim them to be. Meditation and yoga come to mind. Mindfulness practices also have an impressive evidence base. And things like a belief in a higher power and spiritual experiences can be genuinely uplifting and transformative.

I’ve talked about spiritual people as if they were all the same, harboring irrational beliefs and anti-scientific attitudes. But plenty of spiritual people I meet are genuinely appreciative of the sciences, and want their world-view to be as fully supported by the scientific evidence as possible. Some are even scientists themselves!

And anti-science attitudes are not at all ubiquitous across spiritual traditions. Buddhism is often praised for its friendliness towards the sciences, and its scientific approach to belief formation. The Dalai Lama says things like:

If scientific analysis were conclusively to demonstrate certain claims in Buddhism to be false, then we must accept the findings of science and abandon those claims.

I don’t know enough about the Dalai Lama’s personal epistemic habits to be confident that this is more than nice-sounding words. How does he think that this attitude affects Buddhist views on karma and reincarnation, for instance?

It is much easier to proclaim a science-friendly attitude than it is to actually accept the tough implications of such an attitude on beliefs central to one’s ideology. But attitudes like this seem like the right way forward in reconciling the actual meaning of spirituality with the meaning that the spiritual seem to want it to have.

Comprehensibility of the Complex

December 22, 2017March 2, 2018 ~ squarishbracket ~ Leave a comment

(Some speculative rambling about stuff I’ve been thinking about recently.)

There’s a fallacy that I have committed hundreds of times, and that I have only really recently internalized as a fallacy. Perhaps it is not a fallacy, but a confused pattern of thought. In any case, I’ll call it “the incomprehensibility of the complex.”

Here’s the context in which I would make the mistake:

Somebody brings up some political or economic question, say “Should we have left Iraq?” or “Should we raise the minimum wage?”

This sparks a fierce debate. Somebody says that removing the troops left the region defenseless against takeover by extremist groups, or that extra wages given to workers go back into the economy and stimulate the economy. Another objects that our troops were ultimately the source of the instability, or cite the broken-window fallacy.

And I would think: “The world is crazily complicated. Physicists can barely understand complex atoms. Now scale that complexity up to interactions between hundreds of millions of humans, each one a system of a hundred trillion trillion atoms. This should put into perspective the proper degree of epistemic humility we should hold when discussing the minimum wage.”

Basically: If we can’t understand atoms, then we sure as hell can’t understand economic systems or international relations.

Observing that this is a bad argument is not too profound or interesting.

What’s interesting to me is the fact that this is a bad argument. That is, the fact that we can scale up the complexity of the system we are studying by a factor of 10^30, squint our eyes, and then get to work at creating fantastically simple and accurate models of the system. This is absolutely insane, and tells us something about the type of universe that we live in.

***

Recently I watched a lecture on Marginal Revolution University about gun buyback programs and slave redemption policies. The gist of it is this:

Starting in 1993, some humanitarian groups got in their head that they could save Sudanese slaves by buying them from their owners and then freeing them. This maybe sounds like a good idea, until you learn about supply and demand curves.

In truth, what the slave redeemers ended up doing was increasing demand for slaves, resulting in new slaves being captured and tens of thousands of dollars ending up in the hands of slave-owners. Fresh revenue funded weapons purchases, further enabling slave traders to raid villages and capture new slaves.

(By the way, some charity groups still do this)

A similar thing can happen with gun buyback programs. These programs involve the buying of guns in large quantities from gun owners in order to melt them down, the thought being that this will get the guns off of the street. The effect of this?

Well, the gun producers thank their new customers for the money and start manufacturing more guns to supply their larger customer base. In some cases violent crime rates jumped, and a study measuring if these programs actually decrease violent crime rates overall found no statistically significant effects.

Now, I’m ashamed to say that these programs actually initially seemed like fine ideas to me. This is really a statement of my failure to have internalized how supply and demand curves work. In my defense, this is not always a totally horrible policy idea. When demand is much more elastic than supply, the price of the good will jump and many of the original buyers will be priced out of the market. In other words, if the producers have a harder time scaling up their operations than the consumers have buying less of the good, then the world will actually end up freer of slaves/guns.

But that is not how these markets actually work. Demand for guns is in fact less elastic than supply of guns, so the gun nuts are barely affected and the ungun-nuts are handing over free money to the gun manufacturers.

Gun Buybacks

And one more example from Marginal Revolution. Sorry, but we’re on the topic of unintuitive basic econ and it’s just too good to leave out.

In 1990 the United States passed a policy that applied a tax on luxury goods like yachts. The idea, it seems, was, “The federal budget deficit is too high, and if we tax the rich on their fancy luxury goods, we can reduce the deficit without really hurting anybody.” Sounds good, yes?

But what actually happened was that as the price of yachts increased, rich people bought less, and thousands of laborers in the yacht industry lost their jobs. When all was said and done, the government ended up paying more in increased unemployment benefits than they gained in tax revenue from the policy! The government quickly wised up and repealed the tax a few years after it was put in place.

How to understand this? Easy! Draw a graph of supply and demand. Which one has a steeper slope? Well, rich people can fairly easily just spend their money differently if yacht prices increase. They care less about one less yacht than the workers that survive off of the wage they got making that yacht.

So the yacht-buyers will more easily leave the market than the yacht-producers, which means the demand for yachts is more elastic than the supply, which means that the producers are hurt more by the tax.

Luxury Tax

The point is, the model works! It makes weird-sounding and unintuitive predictions, and it turns out to be right. Literally just draw two lines and assess their relative slopes, and you can understand why a tax will sometimes burn consumers and other times burn producers. (You can also do better than the US government in 1990 apparently, but maybe this shouldn’t be surprising)

A simple model of our economy as a bunch of supply and demand curves with varying elasticities has enormous explanatory power. This is a breathtakingly simple model of a breathtakingly complex system. And it tells us something important about the world that it works at all.

Okay, enough fun with econ. All of this was just to say that I feel thoroughly rebutted in my old view that things like interactions of humans are too complex to be understood by anybody. So we have our mystery: how does simplicity arise out of complexity?

Here’s my attempt at an answer: simplicity arises when the universe is playing an optimization game with a simple target.

If every few seconds God scanned the universe, erased the least macroscopically circular shapes, and duplicated the rest, then you would quickly expect to be able the universe to consist of only circles. More to the point, it would quickly become possible to accurately model the universe as a bunch of circles of various sizes at various locations.

The clearest real world example of something like this is natural selection. Natural selection is a process that is optimizing biological systems for a simple target – reproductive fitness. It kills off variation and only lets those few forms that are able to reproduce successfully survive into the next generation.

In this sense, natural selection prunes down the complexity of the world, replacing the incomprehensible with the comprehensible. What was initially a high-entropy system, describable only at the level of fundamental physics, becomes a low-entropy system, describable by a few simple biological principles. Instead of having to describe the organism in full glorious detail at the level of quarks and electrons, we just need to explain how it won the optimization game of natural selection.

Gravity gives us another example of an optimization game our universe plays. Once you get enough mass in one place, gravity will crush it inward towards the center of mass, gradually inching diverse macroscopic shapes towards sphericity.

Gravity

Which is why every large object you’ll see in the sky looks perfectly spherical. Any large objects that started off clunky and non-spherical were ruthlessly optimized into sphericity. (Actually they are oblate spheroids, but that’s because technically the optimization game they’re playing is gravity + angular momentum)

So why do supply and demand curves do a great job at predicting interactions between massive numbers of humans? The implied answer is that humans are the result of an optimization game that has made our behaviors simply describable in terms of supply and demand curves.

What exactly does this mean? Perhaps a trait that enhances reproductive fitness in organisms like us is the cognitive skill to make tradeoffs between different desires, and this gives rise to some type of universal comparison metric between very different goods. Now we can sensibly say things like “I want ice cream less than I want to enjoy a beautiful sunset. Except orange custard chocolate chip ice cream. I’d trade off the sunset for orange custard chocolate chip ice cream any day.”

Then somebody comes along with a bright idea called ‘money’, and suddenly we have a great generalization about human behavior: “Everybody wants more money.” From this, some basic notions like a downward-sloping demand curve, an upward-sloping supply curve, and a push towards equilibrium follow quite nicely. And we have a crazily simple high-level explanation of the crazily complex phenomenon of human interaction.

Correlation and causation

December 22, 2017February 8, 2018 ~ squarishbracket ~ 4 Comments

Previous: Causal intervention

I’m feeling a bit uninspired today, so what I am going to do is take the path of least resistance. Instead of giving a thoughtful discussion of the merits and faults of the slogan “Correlation does not imply causation”, I’ll just disprove it with a counterexample.

We have some condition C. This condition affects some members of our population. We want to know if gender (A) and race (B) play a causal role in the incidence of this condition.

Some starting causal assumptions: Gender does not cause race. Race does not cause gender. And the condition does not cause either gender or race.

First we go search for numbers to determine possible correlations between gender and the condition or race and the conditions. Here’s what we find:

P(A & B & C) = 2%
P(A & B & ~C) = 3%
P(A & ~B & C) = 18%
P(A & ~B & ~C) = 27%
P(~A & B & C) = 0.5%
P(~A & B & ~C) = 4.5%
P(~A & ~B & C) = 4.5%
P(~A & ~B & ~C) = 40.5%

Alright, now what are the possible causal structures of race, gender, and condition consistent with our starting assumptions? There are 4: neither A nor B cause C, only B causes C, only A causes C, and both cause C.

ABC all models

Each of these causal models makes precise, empirical predictions about what sort of correlations we should expect to find. The first model tells us not to expect any correlations whatsoever – each of the variables should vary independently in the population. The second says that A and C will be independent, and B and C will not be. Etc.

We can test all of these straightforwardly: Is it true that P(A & C) = P(A) * P(C)? And is it true that P(B & C) = P(B) * P(C)? We calculate:

P(A & C) = 2% + 18% = 20%
P(B & C) = 2% + .5% = 2.5%

P(A) = 2% + 3% + 18% + 27% = 50%
P(B) = 2% + 3% + .5% + 4.5% = 10%
P(C) = 2% + 18% + .5% + 4.5% = 25%

P(A) * P(C) is 12.5%, and P(B) * P(C) is 2.5%.

So… our third model is correct! We have determined causation from correlation! So much for the famous slogan.

***

The studious one will object that the only way that we have determined causation from correlation in this case is because we started with causal assumptions. This is correct, at least in part. If we had started with no causal assumptions, we still would have found that race and gender are independent. But we would not have been able to determine the direction of our causal arrows.

Here’s a general principle: Purely observational data (read: correlations) cannot tell you on its own the direction of causation. Even this is not actually fully correct: in fact there are special situations called natural experiments in which purely observational data can tell you the direction of causation. We’ll save this discussion for later.

Another studious reader will object: But this is a threadbare notion of causation! On this view, causation is really just statistical dependence!

They are wrong. A causal diagram tells you two things. First, it tells you what correlations you should expect to observe in observational data. But second, it tells you what to expect when you intervene and perform experiments on your variables. This second feature packs in the rest of the intuitive substance of causality.

One final skeptic will point out: Even if we accept your causal assumptions, we cannot truly say that we have ruled out all other causal models. For instance, what if gender does not actually cause the condition, but both gender and the condition are the result of some hidden common cause? This new causal diagram is not ruled out by the data, as one still expects to see a correlation between gender and condition.

They are correct. I am being a little sly in ignoring these subtleties, but this is because they avoid the main point. Which is that causal diagrams are empirically falsifiable, even from purely correlational data. The sense in which the slogan “Correlation does not imply causation” is correct is the sense in which not literally every possible causal model can be eliminated just by observations of correlation. Some causal diagrams truly are empirically indistinguishable. But this doesn’t make causality any more mysterious or un-probeable with the scientific method. We can simply run experiments to deal with the remaining possibilities.

Here are three general ways that you can falsify causal diagrams:

Through observations of correlation or lack of correlation between variables.
Through relevant background information (like temporal order or impossibility of physical interaction between variables)
Through experimental interventions, in which you fix some variables and observe what happens to the others.

Next we’ll discuss some of the useful conceptual tools that arise from this notion of causality.

Previous: Causal intervention

Next: Screening off and explaining away

Causal Intervention

December 21, 2017February 8, 2018 ~ squarishbracket ~ 12 Comments

Previous post: Causal arrows

Let’s quickly review the last post. A causal diagram for two variables A and B tells us how to factor the joint probability distribution P(A & B). The rule we use is that for each variable, we calculate its probability conditional upon all of its parent nodes. This can easily be generalized to any number of variables.

Quick exercises: See if you understand why the following are true.

1. If the causal relationships between three variables A, B, and C are: A>B>C

Then P(A & B & C) = P(A) · P(B | A) · P(C | B).

2. If the causal relationships are:

Then P(A & B & ~C) = P(A | B) · P(B) · P(~C | B).

3. If the causal relationships are:

A>B<C

Then P(~A & ~B & C) = P(~A) · P(~B | ~A & C) · P(C)

Got it? Then you’re ready to move on!

***

Two people are debating a causal question. One of them says that the rain causes the sidewalk to get wet. The other one says that the sidewalk being wet causes the rain. We can express their debate as:

2 var causal OR

We’ve already seen that the probability distributions that correspond to these causal models are empirically indistinguishable. So how do we tell who’s right?

Easy! We go outside with a bucket of water and splash it on the sidewalk. Then we check and see if it’s raining. Another day, we apply a high-powered blow-drier to the sidewalk and check if it’s raining.

We repeat this a bunch of times at random intervals, and see if we find that splashing the sidewalk makes it any more likely to rain than blow-drying the sidewalk. If so, then we know that sidewalk-wetness causes rain, not the other way around.

This is the process of intervention. When we intervene on a variable, we set it to some desired value and see what happens. Let’s express this with our diagrams.

When we splash the sidewalk with water, what we are in essence doing is setting the variable B (“The sidewalk is wet”) to true. And when we blow-dry the sidewalk, we are setting the variable B to false. Since we are now the complete determinant of the value of B, all causal arrows pointing towards B must be erased. So:

A>B becomes A B

and

< stays <

And now our intervened-upon distributions are empirically distinguishable!

The person who thinks that sidewalk-wetness causes rain expects to find a probabilistic dependence between A and B when we intervene. In particular, they expect that it will be more likely to rain when you splash the sidewalk than when you blow-dry it.

And the person who thinks that rain causes sidewalk-wetness expects to find no probabilistic dependence between A and B. They’ll expect that it is equally likely to be raining if you’re splashing the sidewalk as if you’re blow-drying it.

***

This is how to determine the direction of causal arrows using causal models. The key insight here is that a causal model tells you what happens when you perform interventions.

The rule is: Causal intervention on a variable X is represented by erasing all incoming arrows to X and setting its value to its intervened value.

I’ll introduce one last concept here before we move on to the next post: the causal conditional probability.

In our previous example, we talked about the probability that it rains, given that you splash the sidewalk. This is clearly different than the probability that it rains, given that the sidewalk is wet. So we give it a new name.

Normal conditional probability = P(A | B) = probability that it rains given that the sidewalk is wet

Causal conditional probability = P(A | do B) = probability that it rains given that you splash the sidewalk.

The causal conditional probability of A given B, is just the probability of A given that you intervene on B and set it to “True”. And P(A | do ~B) is the probability of A given that you intervene on B and set it to “False”.
If we find that P(A | do B) = P(A | do ~B), then we have ruled out .

Previous: Causal arrows

Next: Correlation and causation

Causal Arrows

December 21, 2017February 8, 2018 ~ squarishbracket ~ 4 Comments

Previous post: Preliminaries

Let’s start discussing causality. The first thing I want to get across is that causal models tell us how to factor joint probability distributions.

Let’s say that we want to express a causal relationship between some variable A and another variable B. We’ll draw it this way:

A > B

Let’s say that A = “It is raining”, and B = “The sidewalk is wet.”

Let’s assign probabilities to the various possibilities.

P(A & B) = 49%
P(A & ~B) = 1%
P(~A & B) = 5%
P(~A & ~B) = 45%

This is the joint probability distribution for our variables A and B. It tells us that it rains about half the time, that the sidewalk is almost always wet when it rains, and the sidewalk is rarely wet when it doesn’t rain.

Factorizations of a joint probability distribution express the joint probabilities in terms of a product of probabilities for each variable. Any given probability distribution may have multiple equivalent factorizations. So, for instance, we can factor our distribution like this:

Factorization 1:
P(A) = 50%
P(B | A) = 98%
P(B | ~A) = 10%

And we can also factor our distribution like this:

Factorization 2
P(B) = 54%
P(A | B) = 90.741%
P(A | ~B) = 2.174%

You can check for yourself that these factorizations are equivalent to our starting joint probability distribution by using the relationship between joint probabilities and conditional probabilities. For example, using Factorization 1:

P(A & ~B)
= P(A) · P(~B | A)
= 50% · 2%
= 1%

Just as expected! If any of this is confusing to you, go back to my last post.

***

Let’s rewind. What does any of this have to do with causality? Well, the diagram we drew above, in which rain causes sidewalk-wetness, instructs us as to how we should factor our joint probability distribution.

Here are the rules:

If node X has no incoming arrows, you express its probability as P(X).
If a node does have incoming arrows, you express its probability as conditional upon the values of its parent nodes – those from which the arrows originate.

Let’s look back at our diagram for rain and sidewalk-wetness.

A > B

Which representation do we use?

A has no incoming arrows, so we express its probability unconditionally: P(A).

B has one incoming arrow from A, so we express its probability as conditional upon the possible values of A. That is, we use P(B | A) and P(B | ~A).

Which means that we use Factorization 1!

Say that instead somebody tells you that they think the causal relationship between rain and sidewalk-wetness goes the other way. I.e., they believe that the correct diagram is:

A < B

Which factorization would they use?

***

So causal diagrams tell us how to factor a probability distribution over multiple variables. But why does this matter? After all, two different factorizations of a single probability distribution are empirically equivalent. Doesn’t this mean that “A causes B” and “B causes A” are empirically indistinguishable?

Two responses: First, this is only one component of causal models. Other uses of causal models that we will see in the next post will allow us to empirically determine the direction of causation.

And second: in fact, some causal diagrams can be empirically distinguished.

Say that somebody proclaims that there are no causal links between rain and sidewalk-wetness. We represent this as follows:

A X B

What does this tell us about how to express our probability distribution?

Well, A has no incoming arrows, so we use P(A). B has no incoming arrows, so we use P(B).

So let’s say we want to know the chance that it’s raining AND the sidewalk is wet. According to the diagram, we’ll calculate this in the following way:

P(A & B) = P(A) · P(B)

But wait! Let’s look back at our initial distribution:

P(A & B) = 49%
P(A & ~B) = 1%
P(~A & B) = 5%
P(~A & ~B) = 45%

Is it possible to get all of these values from just our two values P(A) and P(B)? No! (proof below)

In other words, our data rules out this causal model.

A X B crossed

***

To summarize: a causal diagram over two variables A and B tells you how to calculate things like P(A & B). It says that you break it into the probabilities of the individual propositions, and that the probability for each variable should be conditional on the possible values of its parents.

Next we’ll look at how we can empirically distinguish between > and <

Proof of dependence

Previous post: Preliminaries

Next post: Causal intervention

Causality: Preliminaries

December 21, 2017February 8, 2018 ~ squarishbracket ~ 6 Comments

erooma2

One revolution in my thinking was Bayesianism – applying probability theory to beliefs. This has been thoroughly covered in self-contained series at all levels of accessibility elsewhere.

A more recent revolution in my thinking is causal modeling – using graphical networks to model causal relationships. There appears to be a lack of good online explanations of these tools for reasoning, so it seems worthwhile to create one.

My goal here is not to make you an expert in all things causal, but to pass on the key insights that have modified my thinking. Let’s get started!

***

Much of the framework of causal modeling relies on an understanding of probability theory. So in this first post, I’ll establish the basics that will be used in later posts. If you know how to factor a joint probability distribution, then you can safely skip this.

We’ll label propositions like “The movie has started” with the letters A, B, C, etc. Probability theory is about assigning probabilities to these propositions. A probability is a value between 0 and 1, where 0 is complete confidence that the statement is false and 1 is complete confidence that it is true.

Some notation:

The probability of A = P(A)
The negation of A = ~A
The joint probability that both A and B are true = P(A & B)
The conditional probability of A, given that B is true = P(A | B)

There are just five important things you need to know in order to understand the following posts:

P(A & B) = P(A | B) · P(B)
P(A) + P(~A) = 1
A and B are independent if and only if P(A | B) = P(A). Otherwise, A and B are called dependent.
A joint probability distribution over statements is an assignment of probabilities to all possible truth-values of those statements.
A factorization of a joint probability distribution is a way to break down the joint probabilities into products of probabilities of individual statements.

#1 should make some sense. To see how likely it is that A and B are both true, you can first calculate how likely it is that A is true given that B is true, then multiply by the chance that B is true. You can think of this as breaking a question about the probability of both A and B into two questions:

1. In a world in which B is true, how likely is it that A is true?
and 2. How likely is it that we are in that world where B is true?

#2 is just the idea that a proposition must be either true or false, and not both. This is the type of thing that sounds trivial, but ends up being extremely important for manipulating probabilities. For instance, it is also true that a proposition must be true or false and not both, given some other proposition. This means that the conditional probabilities P(B | A) and P(~B | A) must sum to 1 as well. From this we find that P(A) = P(A & B) + P(A & ~B). We’ll use this last identity often.

#3 is a definition of the terms dependence and independence. If two statements are independent, then the truth of one makes no difference to the probability of the other. It also follows from #1 that if A and B are independent, then P(A & B) = P(A) · P(B). A lot of analysis of causality will be done by looking at probabilistic dependencies, so make sure that this makes sense.

I’ll explain #4 with a simple example. The possible truth-values of two variables A and B are the following:

Both are true: A & B
A is true, and B is false: A & ~B
A is false and B is true: ~A & B
Both are false: ~A & ~B

To specify the joint distribution, we assign probabilities to each of these. For instance:

P(A & B) = .25
P(A & ~B) = .25
P(~A & B) = .30
P(~A & ~B) = .20

In this case, the joint distribution is a set of four different joint probabilities.

And finally, #5 is a definition of factorization. We turn joint distributions into products of individual probabilities by using #1. For instance, one factorization of the joint distribution over A and B uses:

P(A & B) = P(A) · P(B | A)
P(A & ~B) = P(A) · P(~B | A)
P(~A & B) = P(~A) · P(B | ~A)
P(~A & ~B) = P(~A) · P(~B | ~A)

We can see that in order to express all four joint probabilities, we need to know the values of six probabilities. But as a result of #2, we only need to know three of them to find all six. If we specify P(A), P(B | A), and P(B | ~A), then we know the values of P(~A), P(~B | A) and P(~B | ~A). These three probabilities are the factors in our factorization.

P(A)
P(B | A)
P(B | ~A)

One last thing to notice is that our joint distribution of A and B could have been factored in another way. This comes from the fact that we could use #1 to break down P(A & B), or equivalently to break down P(B & A). If we had done the second, then our factors would be P(B), P(A | B), and P(A | ~B).

And that’s everything!

***

Examples!

We’ll apply all this by looking at one factorization of a joint probability distribution over three statements. With three statements, there are eight possible worlds:

A & B & C A & B & ~C
A & ~B & C A & ~B & ~C
~A & B & C ~A & B & ~C
~A & ~B & C ~A & ~B & ~C

The joint distribution over A, B and C is an assignment of probabilities to each of these worlds.

P(A & B & C) P(A & B & ~C)
P(A & ~B & C) P(A & ~B & ~C)
P(~A & B & C) P(~A & B & ~C)
P(~A & ~B & C) P(~A & ~B & ~C)

To factor our joint distribution, we just use Idea #1 twice, treating “B & C” as a single statement the first time:

P(A & B & C)
= P(A | B&C) · P(B&C)
= P(A | B&C) · P(B | C) · P(C)

This tells us that the factors we need to specify are:

P(C),
P(B | C), P(B | ~C),
P(A | B & C), P(A | B & ~C), P(A | ~B & C), and P(A | ~B & ~C)

***

One last application, this time with actual numbers. Let’s revisit our earlier distribution:

P(A & B) = .25
P(A & ~B) = .25
P(~A & B) = .3
P(~A & ~B) = .2

To factor this distribution, we must find P(A), P(B | A), and P(B | ~A).

We’ll start by finding P(A) using #2.

Since P(B) + P(~B) = 1, P(A & B) + P(A & ~B) = P(A).

This means that P(A) = .5

We can now use #1 to find our remaining two numbers.

Plugging in values to P(A & B) = P(A) · P(B | A) and P(~A & B) = P(~A) · P(B | ~A), we have:

.25 = .5 · P(B | A)
.3 = .5 · P(B | ~A)

Therefore, P(B | A) = .5 and P(B | ~A) = .6

Next: Causal arrows

Low-hanging policy fruit

December 20, 2017March 2, 2018 ~ squarishbracket ~ 8 Comments

(Note: none of this is original, just a repackaging of others’ ideas)

A question of great practical importance is “What can I do to improve the world?”

In this post I want to talk about a different but related question: How confident should I be that my ideas for how to improve the world are actually good ideas?

The cynic says something like: “Don’t be naive. The real world is complicated, and there’s almost surely some complex reason that you don’t understand that would make your idea fail spectacularly upon attempted implementation. Besides, there are millions of people out there that are smarter and more knowledgeable than you, and some of them have most likely already thought of your idea. Maybe, if you’re really lucky or really really bright, you might have one or two truly original and not terrible ideas in your life, but I wouldn’t bet on it.”

I don’t want to straw man this perspective, because I think it is right in some really important ways. There is a sense in which perceived low hanging policy fruit is similar to perceived $100 bills lying in the middle of the sidewalk – if it wasn’t some type of trick, you’d better believe that somebody else would have picked them up by now.

And yet…

***

Overfishing removes tens of billions of dollars from global GDP every year. It permanently destroys fish stocks, the livelihoods of fishermen, and seaside communities. And, well… we’ve known how to solve this problem for decades. It’s a classic tragedy of the commons. The standard solutions that you’ll find in an introductory economics textbook are: privatize the common resource, regulate the market through legally enforceable agreements, or tax/subsidize the market to incentivize sustainable fishing.

These are not just good in theory – they actually work. Catch share programs like an individual transferable quota (ITQ) are clever combinations of privatization and regulation – there is a legally enforced fishing quota and individual fishermen own percentages of this quota. These policies have been tried in about a hundred fisheries, and when they are tried they not only stop the trend of overfishing but even reverse it.

Regulation through marine protection programs that temporarily halt activity in heavily fished areas to let them recover could save up to $920 billion of otherwise lost value by 2050. And in 2010, researchers studying subsidies to fisheries found that “the single action of eliminating fuel subsidies could potentially be the most influential factor in stemming the trend of overfishing”.

All of these solutions are perfectly obvious and commonsensical. Want to end overfishing? Stop subsidizing the overfishers and tax them instead, enforce sustainable fishing practices, and protect overfished areas. So if you thought that by applying some basic economics and common sense, you could do better than most of the world’s governments and fisheries over the last century, you would be completely correct.

But then we come back to our $100 bill thought experiment and the cynical argument. Surely the world consists of people that are plenty incentivized to save marine ecosystems, bring in billions of dollars to the country’s economy, and save the livelihoods of fishermen. And surely some of those people know about the policies I’ve just described and have the power to implement them. But overfishing continues as ever, destroying fish populations and draining money from the economy. So what gives?

***

It’s not that the cynic is wrong, it’s just that there is more to be said.

I want to develop the $100 bill analogy some more. When in fact should we expect that you could successfully discover a real $100 bill lying on the floor? Here are some questions whose answers would be important to know:

Are there other people that can see the $100 bill?
Do they realize what it is (that is, money)?
Do they want money?
Could they take the $100 bill?

If the answer is “yes” to all four questions, then the bill is probably a realistic sidewalk painting, or a hallucination, or a prank bill on a fishing line held by some impish teenagers in the nearby bushes. If others can see the $100 bill, know what it is, want it, and are capable of taking it, then it’s probably going to be picked up very quickly.

On the other hand, if the answer is “no” to any of the questions, then the bill is likely real and soon to be yours. All you need is one break in the chain of conditions for the conclusion to not obtain. So, for instance, if everybody is blind, it’s not too surprising that the bill is lying there. Similarly for a society in which nobody has ever seen paper money, or the people are all ascetics, or they are all incapable of bending over to pick it up.

***

Let’s bring this back to our starting question. Say that you’ve thought of an apparently brilliant policy P that solves an important issue I. When should we expect that P is actually a solution to I? Here are the analogous four questions you should ask yourself:

Could other people have thought of P?
Would they be able to tell if it were a solution to I?
Do they want to solve I?
Could they implement P?

We can call this our taxonomy of inadequacy, if we feel fancy. If all of these questions are answered in the affirmative, then we should expect that the policy would have already been implemented if it were actually a solution to I.

At the risk of being redundant, here’s an image:

Adequacy pic

The intersection of these four circles is the set of people that you’d expect to have implemented P, if it were actually a good solution to I. The larger this set is, the more suspicious you should be of your idea.

***

So let’s apply this!

Why is overfishing not solved? Probably because of #4.

People in positions of influence know how to stop overfishing and some would even like to do so. But they have to worry about the influence of the fishing lobby, as well as their approval ratings among the coastal communities that would be temporarily disadvantaged by policies like fishing quotas, marine protected areas, and higher taxes. Sure it’d be better for everybody in the long run, but voters have a hard time accepting short-term losses for long-term gains. So although we can all see ridiculously low-hanging policy fruit, and enough of us care about solving the problem, nobody in power is actually able to implement these policies due to the nature of the system they exist in.

Another example! Everybody agrees that first-past-the-post (FTPT) voting is about the worst voting system out there. It encourages gerrymandering, dooms third parties, and forces smart voters to vote against their preferences. And we know of more sane voting systems! So why are we still stuck with our horrible system?

Well, those that are currently in power are exactly those that have benefited from FPTP. And if a third party came into being that wanted to change the voting system… well FPTP dooms third parties. So we’re stuck. In terms of our taxonomy of inadequacy, this is a combination of #3 and #4 – those that are in power don’t want to change the voting system, and those that want to change the voting system are unable to get in power.

***

What I like about this way of thinking is that you can start with “Hey, this sounds like a good way to solve problem X!” and end up understanding the deep structure of our society, seeing the way that this gigantic beast we call civilization functions and the inadequacies that result.

There’s a lot more to be said about this, but I will leave it for future posts.