Six Case Studies in Consequentialist Reasoning

Consequentialism is a family of moral theories that say that an act is moral or immoral based on its consequences. If an act has overall good consequences then it is moral, and if it has bad consequences then it is immoral. What precisely counts as a “good” or “bad” consequence is what distinguishes one consequentialist theory from another. For instance, act utilitarians say that the only morally relevant feature of the consequences of our actions is the aggregate happiness and suffering produced, while preference utilitarians say that the relevant feature of the consequences is the number and strength of desires satisfied. Another form of consequentialism might strike a balance between aggregate happiness and social equality.

What all these different consequentialist theories have in common is that the ultimate criteria being used to evaluate the moral status of an action is only a function of the consequences of that action, as opposed to, say, the intentions behind the action, or whether the action is an instance of a universalizable Kantian rule.

In this essay, we’ll explore some puzzles in consequentialist theories that force us to take a more nuanced and subtle view of consequentialism. These puzzles are all adapted from Derek Parfit’s Reasons and Persons, with very minor changes.

First, we’ll consider a simple puzzle regarding how exactly to evaluate the consequences of one’s actions, when one is part of a collective that jointly accomplishes some good.

Case 1: There are 100 miners stuck in a mineshaft with flood waters rising. These men can be brought to the surface in a lift raised by weights on long levers. The leverage is such that just four people can stand on a platform and provide sufficient weight to raise the lift and save the lives of the hundred men. But if any fewer than four people stand on the platform, it will not be enough to raise the lift. As it happens, you and three other people happen to be standing there. The four of you stand on the platform, raising the lift and saving the lives of the hundred men.

The question for us to consider is, how many lives did you save by standing on the platform? The answer to this question matters, because to be a good consequentialist, each individual needs to be able to compare their contribution here to the contribution they might make by going elsewhere. As a first thought, we might say that you saved 100 lives by standing on the platform. But the other three people were in the same position as you, and it seems a little strange to say that all four of you saved 100 lives each (since there weren’t 400 lives saved total). So perhaps we want to say that each of you saved one quarter of the total: 25 lives each.

Parfit calls this the Share-of-the-Total View. We can characterize this view as saying that in general, if you are part of a collective of N people who jointly save M lives, then your share of lives saved is M/N.

There are some big problems with this view. To see this, let’s amend Case 1 slightly by adding an opportunity cost.

Case 2: Just as before, there are 100 miners stuck in a mineshaft with flood waters rising, and they can be saved by four or more people standing on a platform. This time though, you and four other people happen to be standing there. The other four are going to stand on the platform no matter what you do. Your choice is either to stand on the platform, or to go elsewhere to save 10 lives. What should you do?

The correct answer here is obviously that you should leave to save the 10 lives. The 100 miners will be saved whether you stay or leave, and the 10 lives will be lost if you stick around. But let’s consider what the Share-of-the-Total View says. According to this view, if you stand on the platform, your share of the lives saved is 100/5 = 20. And if you leave to go elsewhere, you only save 10 lives. So you save more lives by staying and standing on the platform!

This is a reductio of the Share-of-the-Total View. We must revise this view to get a sensible consequentialist theory. Parfit’s suggestion is that we say that when you join others who are doing good, the good that you do is not just your own share of the total benefit. You should also add to your share the change that you caused in the shares of the benefits produced by each other by joining. On their own, the four would each have a share of 25 lives. So by joining, you have a share of 20 lives, minus the 5 lives that have been reduced from the share of each of the other four. In other words, by joining, you have saved 20 – 5(4) lives, in other words, 0 lives. And of course, this is the right answer, because you have done nothing at all by stepping onto the platform!

Applying our revised view to Case 1, we see that if you hadn’t stepped onto the platform, zero lives would be saved. By stepping onto the platform, 100 lives are saved. So your share of those lives is 25, plus 25 lives for each of the others that would have had zero without you. So your share is actually 100 lives! The same applies to the others, so in our revised view, each of the four is responsible for saving all 100 lives. Perhaps on reflection this is not so unintuitive; after all, it’s true for each of them that if they change their behavior, 100 lives are lost.

Case 3: Just as in Case 2, there are 100 miners stuck in a mineshaft. You and four others are standing on the platform while the miners are slowly being raised up. Each of you know of an opportunity to save 10 lives elsewhere (a different 10 lives for each of you), but to successfully save the lives you have to leave immediately, before the miners are rescued. The five of you have to make your decision right away, without communicating with each other.

We might think that if each of the five of you reasons as before, each of you will go off and save the other 10 lives (as by staying, they see that they are saving zero lives). In the end, 50 lives will be saved and 100 lost. This is not good! But in fact, it’s not totally clear that this is the fault of our revised view. The problem here is lack of information. If each of the five knew what the other four planned on doing, then they would make the best decision (if all four planned to stay then the fifth would leave, and if one of the other four planned to leave then the fifth would stay). As things stand, perhaps the best outcome would be that all five stay on the platform (losing the opportunity to save 10 extra lives, but ensuring the safety of the 100). If they can use a randomized strategy, then the optimal strategy is to each stay on the platform with probability 97.2848% (saving an expected 100.66 lives)

Miners Consequentialism

Let’s move on to another type of scenario.

Case 4: X and Y simultaneously shoot and kill me. Either shot, by itself, would have killed.

The consequence of X’s action is not that I die, because if X had not shot, I would have died by Y’s bullet. And the same goes for Y. So if we’re evaluating the morality of X or Y’s action based on its consequences, it seems that we have to say that neither one did anything immoral. But of course, the two of them collectively did do something immoral by killing me. What this tells us that the consequentialist’s creed cannot be “an act is immoral if its consequences are bad”, as an act can also be immoral if it is part of a set of acts whose collective consequences are bad.

Inheriting immorality from group membership has some problems, though. X and Y collectively did something immoral. But what about the group X, Y, and Barack Obama, who was napping at home when this happened? The collective consequences of their actions were bad as well. So did Obama do something immoral too? No. We need to restrict our claim to the following:

“When some group together harm or benefit other people, this group is the smallest group of whom it is true that, if they had all acted differently, the other people would not have been harmed, or benefited.” -Parfit

A final scenario involves the morality of actions that produce imperceptible consequences.

Case 5: One million torturers stand in front of one million buttons. Each button, if pushed, induces a tiny stretch in each of a million racks, each of which has a victim on it. The stretch induced by a single press of the button is so minuscule that it is imperceptible. But the stretch induced by a million button presses produces terrible pain in all the victims.

Clearly we want to say that each torturer is acting immorally. But the problem is that the consequences of each individual torturer’s action are imperceptible! It’s only when enough of the torturers press the button that the consequence becomes perceptible. So what we seem to be saying is that it’s possible to act immorally, even though your action produces no perceptible change in anybody’s conscious experience, if your action is part of a collection of actions that together produce negative changes in conscious experiences.

This is already unintuitive. But we can make it even worse.

Case 6: Consider the final torturer of the million. At the time that he pushes his button, the victims are all in terrible agony, and his press doesn’t make their pain any perceptibly worse. Now, imagine that instead of there being 999,999 other torturers, there are zero. There is just the one torturer, and the victims have all awoken this morning in immense pain, caused by nobody in particular. The torturer presses the button, causing no perceptible change in the victims’ conditions. Has the torturer done something wrong?

It seems like we have to say the same thing about the torturer in Case 6 as we did in Case 5. The only change is that Nature has done the rest of the harm instead of other human beings, but this can’t matter for the morality of the torturer’s action. But if we believe this, then the scope of our moral concerns is greatly expanded, to a point that seems nonsensical. My temptation here is to say “all the worse for consequentialism, then!” and move to a theory that inherently values intentions, but I am curious if there is a way to make a consequentialist theory workable in light of these problems.


Sapiens: How Shared Myths Change the World

I recently read Yuval Noah Harari’s book Sapiens and loved it. In additional to fascinating and disturbing details about the evolutionary history of Homo sapiens and a wonderful account of human history, he has a really interesting way of talking about the cognitive abilities that make humans distinct from other species. I’ll dive right into this latter topic in this post.

Imagine two people in a prisoner’s dilemma. To try to make it relevant to our ancestral environment, let’s say that they are strangers running into one another, and each see that the other has some resources. There are four possible outcomes. First, they could both cooperate and team up to catch some food that neither would be able to get on their own, and then share the food. Second, they could both defect, attacking each other and both walking away badly injured. And third and fourth, one could cooperate while the other defects, corresponding to one of them stabbing the other in the back and taking their resources. (Let’s suppose that each of the two are currently holding resources of more value than they could obtain by teaming up and hunting.)

Now, the problem is that on standard accounts of rational decision making, the decision that maximizes expected reward for each individual is to defect. That’s bad! The best outcome for everybody is that the two team up and share the loot, and neither walks away injured!

You might just respond “Well, who cares about what our theory of rational decision making says? Humans aren’t rational.” We’ll come back to this in a bit. But for now I’ll say that the problem is not just that our theory of rationality says that we should defect. It’s that this line of reasoning implies that cooperating is an unstable strategy. Imagine a society fully populated with cooperators. Now suppose an individual appears with a mutation that causes them to defect. This defector outperforms the cooperators, because they get to keep stabbing people in the back and stealing their loot and never have to worry about anybody doing the same to them. The result is then that the “gene for defecting” (speaking very metaphorically at this point; the behavior doesn’t necessarily have to be transmitted genetically) spreads like a virus through the population, eventually transforming our society of cooperators to a society of defectors. And everybody’s worse off.

One the other hand, imagine a society full of defectors. What if a cooperator is born into this society? Well, they pretty much right away get stabbed in the back and die out. So a society of defectors stays a society of defectors, and a society of cooperators degenerates into a society of defectors. The technical way of speaking about this is to say that in prisoner’s dilemmas, cooperation is not a Nash equilibrium – a strategy that is stable against mutations when universally adopted. The only Nash equilibrium is universal defection.

Okay, so this is all bad news. We have good game theoretic reasons to expect society to degenerate into a bunch of people stabbing each other in the back. But mysteriously, the record of history has humans coming together to form larger and larger cooperative institutions. What Yuval Noah Harari and many others argue is that the distinctively human force that saves us from these game theoretic traps and creates civilizations is the power of shared myths.

For instance, suppose that the two strangers happened to share a belief in a powerful all-knowing God that punishes defectors in the afterlife and rewards cooperators. Think about how this shifts the reasoning. Now each person thinks “Even if I successfully defect and loot this other person’s resources, I still will have hell to pay in the afterlife. It’s just not worth it to risk incurring God’s wrath! I’ll cooperate.” And thus we get a cooperative equilibrium!

Still you might object “Okay, but what if an atheist is born into this society of God-fearing cooperative people? They’ll begin defecting and successfully spread through the population, right? And then so much for your cooperative equilibrium.”

The superbly powerful thing about these shared myths is the way in which they can restructure society around them. So for instance, it would make sense for a society with the cooperator-punishing God myth to develop social norms around punishing defectors. The mythical punishment becomes an actual real-world punishment by the myth’s adherents. And this is enough to tilt the game-theoretic balance even for atheists.

The point being: The spreading of a powerful shared myth can shift the game theoretic structure of the world, altering the landscape of possible social structures. What’s more, such myths can increase the overall fitness of a society. And we need not rely on group selection arguments here; the presence of the shared myth increases the fitness of every individual.

A deeper point is that the specific way in which the landscape is altered depends on the details of the shared myth. So if we contrast the God myth above to a God that punishes defectors but also punishes mortals who punish defectors, we lose the stability property that we sought. The suggestion being: different ideas alter the game theoretic balance of the world in different ways, and sometimes subtle differences can be hugely important.

Another take-away from this simple example is that shared myths can become embodied within us, both in our behavior and in our physiology. Thus we come back to the “humans aren’t rational” point: The cooperator equilibrium becomes more stable if the God myth somehow becomes hardwired into our brains. These ideas take hold of us and shape us in their image.

Let’s go further into this. In our sophisticated secular society, it’s not too controversial to refer to the belief in all-good and all-knowing gods as a myth. But Yuval Noah Harari goes further. To him, the concept of the shared myth goes much deeper than just our ideas about the supernatural. In fact, most of our native way of viewing the world consists of a network of shared myths and stories that we tell one another.

After all, the universe is just physics. We’re atoms bumping into one another. There are no particles of fairness or human rights, no quantum fields for human meaning or karmic debts. These are all shared myths. Economic systems consist of mostly shared stories that we tell each other, stories about how much a dollar bill is worth and what the stock price of Amazon is. None of these things are really out there in the world. They are in our brains, and they are there for an important reason: they open up the possibility for societal structures that would otherwise be completely impossible. Imagine having a global trade network without the shared myth of the value of money. Or a group of millions of humans living packed together in a city that didn’t all on some level believe in the myths of human value and respect.

Just think about this for a minute. Humans have this remarkable ability to radically change our way of interacting with one another and our environments by just changing the stories that we tell one another. We are able to do this because of two features of our brains. First, we are extraordinarily creative. We can come up with ideas like money and God and law and democracy and whole-heartedly believe in them, to the point that we are willing to sacrifice our lives for them. Second, we are able to communicate these ideas to one another. This allows the ideas to spread and become shared myths. And most remarkably, all of these ideas (capitalism and communism, democracy and fascism) are running on essentially the same hardware! In Harari’s words:

While the behaviour patterns of archaic humans remained fixed for tens of thousands of years, Sapiens could transform their social structures, the nature of their interpersonal relations, their economic activities and a host of other behaviours within a decade or two. Consider a resident of Berlin, born in 1900 and living to the ripe age of one hundred. She spent her childhood in the Hohenzollern Empire of Wilhelm II; her adult years in the Weimar Republic, the Nazi Third Reich and Communist East Germany; and she died a citizen of a democratic and reunited Germany. She had managed to be a part of five very different sociopolitical systems, though her DNA remained exactly the same.

Against moral realism

Here’s my primary problem with moral realism: I can’t think of any acceptable epistemic framework that would give us a way to justifiably update our beliefs in the objective truth of moral claims. I.e. I can’t think of any reasonable account of how we could have justified beliefs in objectively true moral principles.

Here’s a sketch of a plausible-seeming account of epistemology. Broad-strokes, there are two sources of justified belief: deduction and induction.

Deduction refers to the process by which we define some axioms and then see what logically follows from them. So, for instance, the axioms of Peano Arithmetic entail the theorem that 1+1=2 – or, in Peano’s language, S(0) + S(0) = S(S(0)). The central reason why reasoning by deduction is reliable is that the truths established are true by definition – they are made true by the way we have constructed our terms, and are thus true in every possible world.

Induction is scientific reasoning – it is the process of taking prior beliefs, observing evidence, and then updating these beliefs (via Bayes’ rule, for instance). The central reason why induction is reliable comes from the notion of causal entanglement. When we make an observation and update our beliefs based upon this observation, the brain state “believes X” has become causally entangled with the truth of the the statement X. So, for instance, if I observe a positive result on a pregnancy test, then my belief in the statement “I am pregnant” has become causally entangled with the truth of the statement “I am pregnant.” It is exactly this that justifies our use of induction in reasoning about the world.

Now, where do moral claims fall? They are not derived from deductive reasoning… that is, we cannot just arbitrarily define right and wrong however we like, and then derive morality from these definitions.

And they are also not truths that can be established through inductive reasoning; after all, objective moral truths are not the types of things that have any causal effects on the world.

In other words, even if there are objective moral truths, we would have no way of forming justified beliefs about this. To my mind, this is a pretty devastating situation for a moral realist. Think about it like this: a moral realist who doesn’t think that moral truths have causal power over the world must accept that all of their beliefs about morality are completely causally independent of their truth. If we imagine keeping all the descriptive truths about the world fixed, and only altering the normative truths, then none of the moral realist’s moral beliefs would change.

So how do they know that they’re in the world where their moral beliefs actually do align with the moral reality? Can they point to any reason why their moral beliefs are more likely to be true than any other moral statements? As far as I can tell, no, they can’t!

Now, you might just object to the particular epistemology I’ve offered up, and suggest some new principle by which we can become acquainted with moral truth. This is the path of many professional philosophers I have talked to.

But every attempt that I’ve heard of for doing this begs the question or resorts to just gesturing at really deeply held intuitions of objectivity. If you talk to philosophers, you’ll hear appeals to a mysterious cognitive ability to reflect on concepts and “detect their intrinsic properties”, even if these properties have no way of interacting with the world, or elaborate descriptions of the nature of “self-evident truths.”

(Which reminds me of this meme)


None of this deals with the central issue in moral epistemology, as I see it. This central issue is: How can a moral realist think that their beliefs about morality are any more likely to be true than any random choice of a moral framework?

Constructing the world

In this six and a half hour lecture series by David Chalmers, he describes the concept of a minimal set of statements from which all other truths are a priori “scrutable” (meaning, basically, in-principle knowable or derivable).

What are the types of statements in this minimal set required to construct the world? Chalmers offers up four categories, and abbreviates this theory PQIT.


P is the set of physical facts (for instance, everything that would be accessible to a Laplacean demon). It can be thought of as essentially the initial conditions of the universe and the laws governing their changes over time.


Q is the set of facts about qualitative experience. We can see Chalmers’ rejection of physicalism here, as he doesn’t think that Q is eclipsed within P. Example of a type of statement that cannot be derived from P without Q: “There is a beige region in the bottom right of my visual field.”


Here’s a true statement: “I’m in the United States.” Could this be derivable from P and Q? Presumably not; we need another set of indexical truths that allows us to have “self-locating” beliefs and to engage in anthropic reasoning.


Suppose that P, Q, and I really are able to capture all the true statements there are to be captured. Well then, the statement “P, Q, and I really are able to capture all the true statements there are to be captured” is a true statement, and it is presumably not captured by P, Q, and I! In other words, we need some final negative statements that tell us that what we have is enough, and that there are no more truths out there. These “that’s all”-type statements are put into the set T.


So this is a basic sketch of Chalmer’s construction. I like that we can use these tags like PQIT or PT or QIT as a sort of philosophical zip-code indicating the core features of a person’s philosophical worldview. I also want to think about developing this further. What other possible types of statements are there out there that may not be captured in PQIT? Here is a suggestion for a more complete taxonomy:

p    microphysics
P    macrophysics (by which I mean all of science besides fundamental physics)
Q    consciousness
R    normative rationality
normative ethics
C    counterfactuals
mathematical / logical truths
I     indexicals
T    “that’s-all” statements

I’ve split P into big-P (macrophysics) and little-p (microphysics) to account for the disagreements about emergence and reductionism. Normativity here is broad enough to include both normative epistemic statements (e.g. “You should increase your credence in the next coin toss landing H after observing it land H one hundred times in a row”) and ethical statements. The others are fairly self-explanatory.

The most ontologically extravagant philosophical worldview would then be characterized as pPQRECLIT.

My philosophical address is pRLIT (with the addendum that I think C comes from p, and am really confused about Q). What’s yours?

Moving Naturalism Forward: Eliminating the macroscopic

Sean Carroll, one of my favorite physicists and armchair philosophers, hosted a fantastic conference on philosophical naturalism and science, and did the world a great favor by recording the whole thing and posting it online. It was a three-day long discussion on topics like the nature of reality, emergence, morality, free will, meaning, and consciousness. Here are the videos for the first two discussion sections, and the rest can be found by following Youtube links.


Having watched through the entire thing, I have updated a few of my beliefs, plan to rework some of my conceptual schema, and am puzzled about a few things.

A few of my reflections and take-aways:

  1. I am much more convinced than before that there is a good case to be made for compatibilism about free will.
  2. I think there is a set of interesting and challenging issues around the concept of representation and intentionality (about-ness) that I need to look into.
  3. I am more comfortable with intense reductionism claims, like “All fact about the macroscopic world are entailed by the fundamental laws of physics.”
  4. I am really interested in hearing Dan Dennett talk more about grounding morality, because what he said was starting to make a lot of sense to me.
  5. I am confused about the majority attitude in the room that there’s not any really serious reason to take an eliminativist stance about macroscopic objects.
  6. I want to find more details about the argument that Simon DeDeo was making for the undecidability of questions about the relationship between macroscopic theories and microscopic theories (!!!).
  7. There’s a good way to express the distinction between the type of design human architects engage in and the type of design that natural selection produces, which is about foresight and representations of reasons. I’m not going to say more about this, and will just refer you to the videos.
  8. There are reasons to suspect that animal intelligence and capacity to suffer are inversely correlated (that is, the more intelligent an animal, the less capacity to suffer it likely has). This really flips some of our moral judgements on their head. (You must deliver a painful electric shock to either a human or to a bird. Which one will you choose?)

Let me say a little more about number 5.

I think that questions about whether macroscopic objects like chairs or plants really REALLY exist, or whether there are really only just fermions and bosons are ultimately just questions about how we should use the word “exist.” In the language of our common sense intuitions, obviously chairs exist, and if you claim otherwise, you’re just playing complicated semantic games. I get this argument, and I don’t want to be that person that clings to bizarre philosophical theses that rest on a strange choice of definitions.

But at the same time, I see a deep problem with relying on our commonsense intuitions about the existence of the macro world. This is that as soon as we start optimizing for consistency, even a teeny tiny bit, these macroscopic concepts fall to pieces.

For example, here is a trilemma (three statements that can’t all be correct):

  1. The thing I am sitting on is a chair.
  2. If you subtract a single atom from a chair, it is still a chair.
  3. Empty space is not a chair.

These seem to me to be some of the most obvious things we could say about chairs. And yet they are subtly incoherent!

Number 1 is really shorthand for something like “there are chairs.” And the reason why the second premise is correct is that denying it requires that there be a chair such that if you remove a single atom, it is no longer a chair. I take it to be obvious that such things don’t exist. But accepting the first two requires us to admit that as we keep shedding atoms from a chair, it stays a chair, even down to the very last atom. (By the way, some philosophers do actually deny number 2. They take a stance called epistemicism, which says that concepts like “chair” and “heap” are actually precise and unambiguous, and there exists a precise point at which a chair becomes a non-chair. This is the type of thing that makes me giggle nervously when reflecting on the adequacy of philosophy as a field.)

As I’ve pointed out in the past, these kinds of arguments can be applied to basically everything in the macroscopic world. They wreak havoc on our common sense intuitions and, to my mind, demand rejection of the entire macroscopic world. And of course, they don’t apply to the microscopic world. “If X is an electron, and you change its electric charge a tiny bit, is it still an electron?” No! Electrons are physical substances with precise and well-defined properties, and if something doesn’t have these properties, it is not an electron! So the Standard Model is safe from this class of arguments.

Anyway, this is all just to make the case that upon close examination, our commonsense intuitions about the macroscopic world turn out to be subtly incoherent. What this means is that we can’t make true statements like “There are two cars in the garage”. Why? Just start removing atoms from the cars until you get to a completely empty garage. Since no single-atom change can make the relevant difference to “car-ness”, at each stage, you’ll still have two cars!

As soon as you start taking these macroscopic concepts seriously, you find yourself stuck in a ditch. This, to me, is an incredibly powerful argument for eliminativism, and I was surprised to find that arguments like these weren’t stressed at the conference. This makes me wonder if this argument is as powerful as I think.

Defining racism

How would you define racism?

I’ve been thinking about this lately in light of some of the scandal around research into race and IQ. It’s a harder question than I initially thought; many of the definitions that pop to mind end up being either too strong or too weak. The term also functions differently in different contexts (e.g. personal racism, institutional racism, racist policies). In this post, I’m specifically talking about personal racism – that term we use to refer to the beliefs and attitudes of those like Nazis or Ku Klux Klan members (at the extreme end).

I’m going to walk through a few possible definitions. This will be fairly stream-of-consciousness, so I apologize if it’s not incredibly profound or well-structured.

Definition 1 Racism is the belief in the existence of inherent differences between the races.

‘Inherent’ is important, because we don’t want to say that somebody is racist for acknowledging differences that can ultimately be traced back to causes like societal oppression. The problem with this definition is that, well, there are inherent differences between the races.

The Chinese are significantly shorter than the Dutch. Raising a Chinese person in a Dutch household won’t do much to equalize this difference. What’s important, it seems, is not the belief in the existence of inherent differences, but instead the belief in the existence of inherent inferiorities and superiorities. So let’s try again.

Definition 2 Racism is the belief in the existence of inherent racial differences that are normatively significant.

This is pretty much the dictionary definition of the term “racism”. While it’s better, there are still some serious problems. Let’s say that somebody discovered that the Slavs are more inherently prone to violence than, say, Arabs. Suppose that somebody ran across this fact, and that this person also held the ethical view that violent tendencies are normatively important. That is, they think that peaceful people are ethically superior to violent people.

If they combine this factual belief with this seemingly reasonable normative belief, they’ll end up being branded as a racist, by our second definition. This is clearly undesirable… given that the word ‘racism’ is highly normatively loaded, we don’t want it to be the case that somebody is racist for believing true things. In other words, we probably don’t want our definition of racism to ever allow it to be the right attitude to take, or even a reasonable attitude to take.

Maybe the missing step is the generalization of attitudes about Slavs and Arabs to individuals. This is a sentiment that I’ve heard fairly often… racism is about applying generalizations about groups to individuals (for instance, racial profiling). Let’s formalize this:

Definition 3 Racism is about forming normative judgments about individuals’ characteristics on the basis of beliefs about normative group-level differences.

This sounds nice and all, but… you know what another term for “applying facts about groups to individuals” is? Good statistical reasoning.

If you live in a town composed of two distinct populations, the Hebbeberans and the Klabaskians, and you know that Klabaskians are on average twenty times more likely than Hebbeberans to be fatally allergic to cod, then you should be more cautious with serving your extra special cod sandwich to a Klabaskian friend than to a Hebbeberan.

Facts about populations do give you evidence about individuals within those populations, and the mere acknowledgement of this evidence is not racist, for the same reason that rationality is not racist.

So if we don’t want to call rationality racist, then maybe our way out of this is to identify racism with irrationality.

Definition 4 Racism is the holding of irrational beliefs about normative racial differences.

Say you meet somebody from Malawi (a region with an extremely low average IQ). Your first rational instinct might be to not expect too much from them in the way of cognitive abilities. But now you learn that they’re a theoretical physicist who’s recently been nominated for a Nobel prize for their work in quantum information theory. If the average IQ of Malawians is still factoring in at all to your belief about this person’s intelligence, then you’re being racist.

I like this definition a lot better than our previous ones. It combines the belief in racial superiority with irrationality. On the other hand, it has problems as well. One major issue is that there are plenty of cases of benign irrationality, where somebody is just a bad statistical reasoner, but not motivated by any racial hatred. Maybe they over-updated on some piece of information, because they failed to take into account an important base-rate.

Well, the base-rate fallacy is one of the most common cognitive biases out there. Surely this isn’t enough to make them a racist? What we want is to capture the non-benign brand of irrational normative beliefs about race – those that are motivated by hatred or prejudice.

Definition 5 Racism is the holding of irrational normative beliefs about racial differences, motivated by racial hatred or prejudice.

I think this does the best at avoiding making the category too large, but it may be too strong and keep out some plausible cases of racism. I’d like to hear suggestions for improvements on this definition, but for now I’ll leave it there. One potential take-away is that the word ‘racism’ is a nasty combination of highly negatively charged and ambiguous, and that such words are best treated with caution, especially when applied them to edge cases.

The Scourge of Our Time

Human life must be respected and protected absolutely from the moment of conception. From the first moment of his existence, a human being must be recognized as having the rights of a person – among which is the inviolable right of every innocent being to life.

Since it must be treated from conception as a person, the embryo must be defended in its integrity, cared for, and healed, as far as possible, like any other human being.

Catechism of the Catholic Church, #2270, 2274

In this paper, Toby Ord advances a strong reductio ad absurdum of the standard pro-life position that life begins at conception. I’ve heard versions of this argument before, but hadn’t seen it laid out so clearly.

Here’s the argument:

  1. The majority (~62%) of embryos die within a few weeks of conception (mostly from failure to implant in the lining of the uterus wall). A mother of three children could be expected to also have had five spontaneous abortions.
  2. The Catholic Church promotes the premise that an embryo at conception has the same moral worth as a developed human. On this view, more than 60% of the world population dies in their first month of life, making this a more deadly condition than anything else in human history. Saving even 5% of embryos would save more lives than a cure for cancer.

  3. Given the 200 million lives per year at stake, those that think life begins at conception should be directing massive amounts of resources towards ending spontaneous abortion and see it as the Scourge of our time.

Here are two graphs of the US survival curve: first, as we ordinarily see it, and second, as the pro-lifer is obligated to see it:

Screen Shot 2018-04-05 at 2.22.12 PMScreen Shot 2018-04-05 at 2.22.22 PM

This is of course a really hard bullet for the pro-life camp to bite. If you’re like me, you see spontaneous abortions as morally neutral. Most of the time they happen before a pregnancy has been detected, leaving the mother unaware that anything even happened. It’s hard then to make a distinction between the enormous amount of spontaneous abortions naturally occurring and the comparatively minuscule number of intentional abortions.

I have previously had mixed feelings about abortion (after all, if our moral decision making ultimately comes down to trying to maximize some complicated expected value, it will likely be blind to whether is a real living being or just a “potential” living being), but this argument pretty much clinches the deal for me.

Where I am with utilitarianism

Morality is one of those weird areas where I have an urge to systematize my intuitions, despite believing that these intuitions don’t reflect any objective features of the world.

In the language of model selection, it feels like I’m trying to fit the data of my moral intuitions to some simple underlying model, and not overfit to the noise in the data. But the concept of  “noise” here makes little sense… if I were really a moral nihilist, then I would see the only sensible task with respect to ethics as a descriptive task: describe my moral psychology and the moral psychology of others. If ethics is like aesthetics, fundamentally a matter of complex individual preferences, then there is no reality to be found by paring down your moral framework into a tight neat package.

You can do a good job at analyzing how your moral cognitive system works and trying to understand the reasons that it is the way it is. But once you’ve managed a sufficiently detailed description of your moral intuitions, then it seems like you’ve exhausted the realm of interesting ethical thinking. Any other tasks seem to rely on some notion of an actual moral truth out there that you’re trying to fit your intuitions to, or at least a notion of your “true” moral beliefs as a simple set of principles from which your moral feelings and intuitions arise.

Despite the fact that I struggle to see any rational reason for systematize ethics, I find myself doing so fairly often. The strongest systematizing urge I feel in analyzing ethics is the urge towards generality. A simple general description that successfully captures many of my moral intuitions feels much better than a complicated high-order description of many disconnected intuitions.

This naturally leads to issues with consistency. If you are satisfied with just describing your moral intuitions in every situation, then you can never really be faced with accusations of inconsistency. Inconsistency arises when you claim to agree with a general moral principle, and yet have moral feelings that contradict this principle.

It’s the difference between ‘It was unjust when X shot Y the other day in location Z” and “It is unjust for people to shoot each other”. The first doesn’t entail any conclusions about other similar scenarios, while the second entails an infinity of moral beliefs about similar scenarios.

Now, getting to utilitarianism. Utilitarianism is the (initially nice-sounding) moral principle that moral action is that which maximizes happiness (/ well-being / sentient flourishing / positive conscious experiences). In any situation, the moral calculation done by a utilitarian is to impartially consider the consequences of all possible actions on the happiness of all other conscious beings, and then take the action that maximizes your expected value.

While the basic premise seems obviously correct upon first consideration, a lot of the conclusions that this style of thinking ends up endorsing seem horrifically immoral. A hard-line utilitarian approach to ethics yields prescriptions for actions that are highly unintuitive to me. Here’s one of the strongest intuition-pumps I know of for why utilitarianism is wrong:

Suppose that there is a doctor that has decided to place one of his patients under anesthesia and then rape them. This doctor has never done anything like this before, and would never do anything like it again afterwards. He is incredibly careful to not leave any evidence, or any noticeable after-effects on the patient whatsoever (neither physical nor mental). In addition, he will forget that he ever did this soon after the patient leaves. In short, the world will be exactly the same one day down the line whether he rapes his patient or not. The only difference in terms of states of consciousness between the world in which he commits the violation and the world in which he does not, will be a momentary pleasurable orgasm that the doctor will experience.

In front of you sits a button. If you press this button, then a nurse assistant will enter the room, preventing the doctor from being alone with the patient and thus preventing the rape. If you don’t, then the doctor will rape his patient just as he has planned. Whether or not you press the button has no other consequences on anybody, including yourself (e.g., if knowing that you hadn’t prevented the rape would make you feel bad, then you will instantly forget that you had anything to do with it immediately after pressing the button.)

Two questions:

1. Is it wrong for the doctor to commit the rape?

2. Should you press the button to stop the doctor?

The utilitarian is committed to answer ‘Yes’ to the first question and ‘No’ to the second.

As far as I can tell, there is no way out of this conclusion for Question 1. Question 2 allows a little more wiggle room; one might say that it is impossible that whether or not you press the button has no effect on your own mental state as you press it, unless you are completely without conscience. A follow-up question might then be whether you should temporarily disable your conscience, if you could, in order to neutralize the negative mental consequences of pressing the button. Again, the utilitarian seems to give the wrong answer.

This thought experiment is pushing on our intuitions about autonomy and consent, which are only considered as instrumentally valuable by the utilitarian, rather than intrinsically so. If you feel pretty icky about utilitarianism right now, then, well… I said it was the strongest anti-utilitarian intuition pump I know.

With that said, how can we formalize a system of ethics that takes into account not just happiness, but also the intrinsic importance of things like autonomy and consent? As far as I’ve seen, every such attempt ends up looking really shabby and accepting unintuitive moral conclusions of its own. And among all of the ethical systems that I’ve seen, only utilitarianism does as good a job at capturing so many of my ethical intuitions in such a simple formalization.

So this is where I am at with utilitarianism. I intrinsically value a bunch of things besides happiness. If I am simply engaging in the purely descriptive project of ethics, then I am far from a utilitarian. But the more I systematize my ethical framework, the more utilitarian I become. If I heavily optimize for consistency, I end up a hard-line utilitarian, biting all of the nasty bullets in favor of the simplicity and generality of the utilitarian framework. I’m just not sure why I should spend so much mental effort systematizing my ethical framework.

This puts me in a strange position when it comes to actually making decisions in my life. While I don’t find myself in positions in which the utilitarian option is as horrifically immoral as in the thought experiment I’ve presented here, I still am sometimes in situations where maximizing net happiness looks like it involves behaving in ways that seem intuitively immoral. I tend to default towards the non-utilitarian option in these situations, but don’t have any great principled reason for doing so.

Value beyond ethics

There is a certain type of value in our existence that transcends ethical value. It is beautifully captured in this quote from Richard Feynman:

It is a great adventure to contemplate the universe, beyond man, to contemplate what it would be like without man, as it was in a great part of its long history and as it is in a great majority of places. When this objective view is finally attained, and the mystery and majesty of matter are fully appreciated, to then turn the objective eye back on man viewed as matter, to view life as part of this universal mystery of greatest depth, is to sense an experience which is very rare, and very exciting. It usually ends in laughter and a delight in the futility of trying to understand what this atom in the universe is, this thing—atoms with curiosity—that looks at itself and wonders why it wonders.

Well, these scientific views end in awe and mystery, lost at the edge in uncertainty, but they appear to be so deep and so impressive that the theory that it is all arranged as a stage for God to watch man’s struggle for good and evil seems inadequate.

The Meaning Of It All

Carl Sagan beautifully expressed the same sentiment.

We are the local embodiment of a Cosmos grown to self-awareness. We have begun to contemplate our origins: starstuff pondering the stars; organized assemblages of ten billion billion billion atoms considering the evolution of atoms; tracing the long journey by which, here at least, consciousness arose. Our loyalties are to the species and the planet. We speak for Earth. Our obligation to survive is owed not just to ourselves but also to that Cosmos, ancient and vast, from which we spring.


The ideas expressed in these quotes feels a thousand times deeper and more profound than anything offered in ethics. Trolley problems seem trivial by comparison. If somebody argued that the universe would be better off without us on the basis of, say, a utilitarian calculation of net happiness, I would feel like there is an entire dimension of value that they are completely missing out on. This type of value, a type of raw aesthetic sense of the profound strangeness and beauty of reality, is tremendously subtle and easily slips out of grasp, but is crucially important. My blog header serves as a reminder: We are atoms contemplating atoms.

Timeless ethics and Kant

The more I think about timeless decision theory, the more it seems obviously correct to me.

The key idea is that sometimes there is a certain type of non-causal logical dependency (called a subjunctive dependence) between agents that must be taken into account by those agents in order to make rational decisions. The class of cases in which subjunctive dependences become relevant involve agents in environments that contain other agents trying to predict their actions, and also environments that contain other agents that are similar to them.

Here’s my favorite motivating thought experiment for TDT: Imagine that you encounter a perfect clone of yourself. You have lived identical lives, and are structurally identical in every way. Now you are placed in a prisoner’s dilemma together. Should you cooperate or defect?

A non-TDTist might see no good reason to cooperate – after all, defecting dominates cooperation as a strategy, and your decision doesn’t affect your clone’s decision. If the two of you share no common cause explanation for your similarity, then this conclusion is even stronger – both evidential and causal decision theory would defect. So both you and your clone defect and you both walk away unhappy.

TDT is just the admission that there is an additional non-causal dependence between your decision and your clone’s decision that must be taken into account. This dependence comes from the fact that you and your clone have a shared input-output structure. That is, no matter what you end up doing, you know that your clone must do the same thing, because your clone is operating identically to you.

In a deterministic world, it is logically impossible that you choose to do X and your clone does Y. The initial conditions are the same, so the final conditions must be the same. So you end up cooperating, as does your clone, and everybody walks away happy.

With an imperfect clone, it is no longer logically impossible, but there still exists a subjunctive dependence between your actions and your clone’s.

This is a natural and necessary modification to decision theory. We take into account not only the causal effects of our actions, but the evidential effects of our actions. Even if your action does not causally affect a given outcome, it might still make it more or less likely, and subjunctive dependence is one of the ways that this can happen.

TDTists interacting with each other would get along really nicely. They wouldn’t fall victim to coordination problems, because they wouldn’t see their decisions as isolated and disconnected from the decisions of the others. They wouldn’t undercut each other in bargaining problems in which one side gets to make the deals and the other can only accept or reject.

In general, they would behave in a lot of ways that are standardly depicted as irrational (like one-boxing in Newcomb’s problem and cooperating in the prisoner’s dilemma), and end up much better off as a result. Such a society seems potentially much nicer and subject to fewer of the common failure modes of standard decision theory.

In particular, in a society in which it is common knowledge that everybody is a perfect TDTist, there can be strong subjunctive dependencies between the actions causally disconnected agents. If a TDTist is considering whether or not to vote for their preferred candidate, they aren’t comparing outcomes that differ by a single vote. They are considering outcomes that differ by the size of the entire class of individuals that would be reasoning similar to them.

In simple enough cases, this could mean that your decision about whether to vote is really a decision about if millions of people will vote or not. This may sound weird, but it follows from the exact same type of reasoning as in the clone prisoner’s dilemma.

Imagine that the society consisted entirely of 10 million exact clones of you, each deciding whether or not to vote. In such a world, each individual’s choice is perfectly subjunctively dependent upon every other individual’s choice. If one of them decides not to vote, then all of them decide not to vote.

In a more general case, perfect clones of you don’t exist in your environment. But in any given context, there is still a large class of individuals that reason similarly to you as a result of a similar input-output process.

For example, all humans are very similar in certain ways. If I notice that my blood is red, and I had previously never heard about or seen the blood color of anybody else, then I should now strongly update on the redness of the blood of other humans. This is obviously not because my blood being red causes others to have red blood. It is also not because of a common cause – in principle, any such cause could be screened off, and we would expect the same dependence to exist solely in virtue of the similarity of structure. We would expect red blood in alien whose evolutionary history has been entirely causally separated from ours but who by a wild coincidence has the same DNA structure as humans.

Our similarities in structure can be less salient to us when we think about our minds and the way we make decisions, but they still are there. If you notice that you have a strong inclination to decide to take action X, then this actually does serve as evidence that a large class of other people will take action X. The size of this class and the strength of this evidence depends on which particular X is being analyzed.

Ethics and TDT

It is natural to wonder: what sort of ethical systems naturally arise out of TDT?

We can turn a decision theory into an ethical framework by choosing a utility function that encodes the values associated with that ethical framework. The utility function for hedonic utilitarianism assigns utility according to the total well-being in the universe. The utility function for egoism assigns utility only to your own happiness, apathetic to the well-being of others.

Virtue ethics and deontological ethics are harder to encode. We could do the first by assigning utility to virtuous character traits and disutility to vices. The second could potentially be achieved by assigning negative infinities to violations of the moral rules.

Let’s brush aside the fact that some of these assignments are less feasible than others. Pretend that your favorite ethical system has a nice neat way of being formalized as a utility function. Now, the distinctive feature of TDT-based ethics is that when we are trying to decide on the most ethical course of action, TDT says that we must imagine that our decision would also be taken by anybody else that is sufficiently similar to you in a sufficiently similar context.

In other words, in contemplating what the right action to take is, you imagine a world in which these actions are universalized! This sounds very Kantian. One of his more famous descriptions of the categorical imperative was:

Act only according to that maxim whereby you can at the same time will that it should become a universal law.

This could be a tagline for ethics in a world of TDTists! The maxim for your action resembles the notion of similarity in motivation and situational context that generates subjunctive dependence, and the categorical imperative is the demand that you must take into account this subjunctive dependence if you are to reason consistently.

But actually, I think that the resemblance between Kantian ethical reasoning and timeless decision theory begins to fade away when you look closer. I’ll list three main points of difference:

  1. Consistency vs expected utility
  2. Maxim vs subjunctive dependence
  3. Differences in application

1. Consistency vs expected utility

Kantian universalizability is not about expected utility, it is about consistency. The categorical imperative forbids acts that, when universalized, become self-undermining. If an act is consistently universalizable, then it is not a violation of the categorical imperative, even if it ends up with everybody in horrible misery.

Timeless decision theory looks at a world in which everybody acts according to the same maxim that you are acting under, and then asks whether this world looks nice or not. “Looks nice” refers to your utility function, not any notion of consistency or non-self-underminingness (not a word, I know).

So this is the first major difference: A timeless ethical theorist cares ultimately about optimizing their moral values, not about making sure that their values are consistently applicable in the limit of universal instantiation. This puts TDT-based ethics closer to a rule-based consequentialism than to Kantian ethics, although this comparison is also flawed.

Is this a bug or a feature of TDT?

I’m tempted to say it’s a feature. My favorite example of why Kantian consistency is not a desirable meta-ethical principle is that if everybody were to give to charity, then all the problems that could be solved by giving to charity would be solved, and the opportunity to give to charity would disappear. So the act of giving to charity becomes self-undermining upon universalization.

To which I think the right response is: “So what?”

If a world in which everybody gives to charity is a world in which there are no more problems to be solved by charity-giving, then that sounds pretty great to me. If this consistency requirement prevents you from solving the problems you set out to solve, then it seems like a pretty useless requirement for ethical reasoning.

If your values can be encoded into an expected utility function, then the goal of your ethics should be to maximize that function. The antecedent of this conditional could be reasonably disputed, but I think the conditional as a whole is fairly unobjectionable.

2. Maxim versus subjunctive dependence

One of the most common retorts to Kant’s formulation of the categorical imperative rests on the ambiguity of the term ‘maxim’.

For Kant, your maxim is supposed to be the motivating principle behind your action. It can be thought of as a general rule that determines the contexts in which you would take this action.

If your action is to donate money, then your maxim might be to give 10% of your income to charity every year. If your action is to lie to your boss about why you missed work, then your maxim might be to be dishonest whenever doing otherwise will damage your career prospects.

Now, the maxim is the thing that is universalized, not the action itself. So you don’t suppose that everybody suddenly starts lying to their boss. Instead, you imagine that anybody in a situation where being honest would hurt their career prospects begins lying.

In this situation, Kant would argue that if nobody was honest in these situations, then their bosses would just assume dishonesty, in which case, the employees would never even get the chance to lie in the first place. This is self-undermining; hence, forbidden!

I actually like this line of reasoning a lot. Scott Alexander describes it as similar to the following rule:

Don’t do things that undermine the possibility to offer positive-sum bargains.

Coordination problems arise because individuals decide to defect from optimal equilibriums. If these defectors were reasoning from the Kantian principle of universalizability, they would realize that if everybody behaved similarly then the ability to defect might be undermined.

But the problem largely lies in how one specifies the maxim. For example, compare the following two maxims:

Maxim 1: Lie to your boss whenever being honest would hurt your career opportunities.

Maxim 2: Lie to your boss about why you missed work whenever the real reason is that you went on all-night bar-hopping marathon with your friends Jackie and Khloe and then stayed up all night watched Breaking Bad highlight clips on your Apple TV.

If Maxim 2 is the true motivating principle of your action, then it seems a lot less obvious that the action is a violation of the categorical imperative. If only people in this precisely specified context lied to their bosses, then bosses would overall probably not become less trusting of their employees (unless your boss knows an unusual amount about your personal life). So the maxim is not self-undermining under universalization, and is therefore not forbidden.

Under Maxim 1, lying is forbidden, and under Maxim 2, it is not. But what is the true maxim? There’s no clear answer to this question. Any given action can be truthfully described as arising from numerous different motivational schema, and in general these choices will result in a variety of different moral guidelines.

In TDT, the analog to the concept of a maxim is subjunctive dependence, and this can be defined fairly precisely, without ambiguity. Subjunctive dependence between agents in a given context is just the degree of evidence you get about the actions of an agent given information about the actions of the other agents in that context.

More precisely, it is the degree of non-causal dependence between the actions of agents. It essentially arises from the fact that in a lawful physical universe, similar initial conditions will result in similar final conditions. This can be worded as similarity in initial conditions, in input-output structure, in computational structure, or in logical structure, but the basic idea is the same.

Not only is this precisely defined, it is a real dependence. You don’t have to imagine a fictional universe in which your action makes it more likely that others will act similarly; the claim of TDT is that this is actually the case!

In this sense, TDT is rooted in a simple acknowledgement of dependencies that really do exist and that can be precisely defined, while Kant’s categorical imperative relies on the ambiguous notion of a maxim, as well as a seemingly arbitrary hypothetical consideration. One might be tempted to ask: “Who cares what would happen if hypothetically everybody else acted according to a similar maxim? We should care about is what will actually happen in the real world; we shouldn’t be basing our decisions off of absurd hypothetical worlds!”

3. Difference in application

These two theoretical reasons are fairly convincing to me that Kantianism and TDT ethics are only superficially similar, and are theoretically quite different. But there still remains a question of how much the actual “outputs” of the two frameworks converge. Do they just end up giving similar ethical advice?

I don’t think so. First, consider the issue I touched on previously. I said that defectors that paid attention to the categorical imperative would rethink their decision, because it is not universalizable. But this is not in general true.

If defectors are always free to defect, regardless of how many others defect as well, then defecting will still be universalizable! It is only in special cases that Kantians will not defect, like when a mob boss will come in and necessitate cooperation if enough people defect, or where universal defection depletes an expendable resource that would otherwise be renewable.

The set of coordination problems in which defecting automatically becomes impossible at a certain point are the easiest cases of coordination problems. It’s much harder to get individuals to coordinate if there is no mob boss to step in and set everybody right. These are the cases where Kantianism fails, and TDT succeeds.

TDTists with shared goals for whom cooperation would be more effective for achieving these goals would always cooperate, even if “each would individually be better off” if they defected. (I put scare quotes because you only come to this conclusion by ignoring important dependencies in the problem).

The key difference here comes down again to #1: timeless decision theorists maximize expected utility, not Kantian consistency.

In addition, TDTists don’t necessarily have Kantian hangups about using people as means to an end: if doing so ends up producing a higher expected utility than not, then they’ll go for it without hesitation.

A TDTist that can save two people’s lives by causing a little harm to one person would probably do it if their utility function was relatively impartial and placed a positive value on life. A Kantian would forbid this act.

(Why? Well, Kant thought that this principle of treating people as ends in themselves rather than means to an end was equivalent to the universalizability principle, and as far as I know, pretty much nobody was convinced by his argument for why this was the case. As such, a lot of Kantian ethics looks like it doesn’t actually follow from the universalizability principle.)

An application that might be similar for Kantian ethics and TDT ethics is the treatment of dishonesty and deception. Kant famously forbid any lying of any kind, regardless of the consequences, on the basis that universal lying would undermine the trust that is necessary to make lying a possibility.

One can imagine a similar case made for honesty in TDT ethics. In a society of TDTs, a decision to lie is a decision to produce a society that is overall less honest and less trusting. In situations where the individual benefits of dishonesty are zero-sum, only the negative effects of dishonesty are amplified. This could plausibly make dishonesty on the whole a net negative policy.