Where I am with utilitarianism

Morality is one of those weird areas where I have an urge to systematize my intuitions, despite believing that these intuitions don’t reflect any objective features of the world.

In the language of model selection, it feels like I’m trying to fit the data of my moral intuitions to some simple underlying model, and not overfit to the noise in the data. But the concept of  “noise” here makes little sense… if I were really a moral nihilist, then I would see the only sensible task with respect to ethics as a descriptive task: describe my moral psychology and the moral psychology of others. If ethics is like aesthetics, fundamentally a matter of complex individual preferences, then there is no reality to be found by paring down your moral framework into a tight neat package.

You can do a good job at analyzing how your moral cognitive system works and trying to understand the reasons that it is the way it is. But once you’ve managed a sufficiently detailed description of your moral intuitions, then it seems like you’ve exhausted the realm of interesting ethical thinking. Any other tasks seem to rely on some notion of an actual moral truth out there that you’re trying to fit your intuitions to, or at least a notion of your “true” moral beliefs as a simple set of principles from which your moral feelings and intuitions arise.

Despite the fact that I struggle to see any rational reason for systematize ethics, I find myself doing so fairly often. The strongest systematizing urge I feel in analyzing ethics is the urge towards generality. A simple general description that successfully captures many of my moral intuitions feels much better than a complicated high-order description of many disconnected intuitions.

This naturally leads to issues with consistency. If you are satisfied with just describing your moral intuitions in every situation, then you can never really be faced with accusations of inconsistency. Inconsistency arises when you claim to agree with a general moral principle, and yet have moral feelings that contradict this principle.

It’s the difference between ‘It was unjust when X shot Y the other day in location Z” and “It is unjust for people to shoot each other”. The first doesn’t entail any conclusions about other similar scenarios, while the second entails an infinity of moral beliefs about similar scenarios.

Now, getting to utilitarianism. Utilitarianism is the (initially nice-sounding) moral principle that moral action is that which maximizes happiness (/ well-being / sentient flourishing / positive conscious experiences). In any situation, the moral calculation done by a utilitarian is to impartially consider the consequences of all possible actions on the happiness of all other conscious beings, and then take the action that maximizes your expected value.

While the basic premise seems obviously correct upon first consideration, a lot of the conclusions that this style of thinking ends up endorsing seem horrifically immoral. A hard-line utilitarian approach to ethics yields prescriptions for actions that are highly unintuitive to me. Here’s one of the strongest intuition-pumps I know of for why utilitarianism is wrong:

Suppose that there is a doctor that has decided to place one of his patients under anesthesia and then rape them. This doctor has never done anything like this before, and would never do anything like it again afterwards. He is incredibly careful to not leave any evidence, or any noticeable after-effects on the patient whatsoever (neither physical nor mental). In addition, he will forget that he ever did this soon after the patient leaves. In short, the world will be exactly the same one day down the line whether he rapes his patient or not. The only difference in terms of states of consciousness between the world in which he commits the violation and the world in which he does not, will be a momentary pleasurable orgasm that the doctor will experience.

In front of you sits a button. If you press this button, then a nurse assistant will enter the room, preventing the doctor from being alone with the patient and thus preventing the rape. If you don’t, then the doctor will rape his patient just as he has planned. Whether or not you press the button has no other consequences on anybody, including yourself (e.g., if knowing that you hadn’t prevented the rape would make you feel bad, then you will instantly forget that you had anything to do with it immediately after pressing the button.)

Two questions:

1. Is it wrong for the doctor to commit the rape?

2. Should you press the button to stop the doctor?

The utilitarian is committed to answer ‘Yes’ to the first question and ‘No’ to the second.

As far as I can tell, there is no way out of this conclusion for Question 1. Question 2 allows a little more wiggle room; one might say that it is impossible that whether or not you press the button has no effect on your own mental state as you press it, unless you are completely without conscience. A follow-up question might then be whether you should temporarily disable your conscience, if you could, in order to neutralize the negative mental consequences of pressing the button. Again, the utilitarian seems to give the wrong answer.

This thought experiment is pushing on our intuitions about autonomy and consent, which are only considered as instrumentally valuable by the utilitarian, rather than intrinsically so. If you feel pretty icky about utilitarianism right now, then, well… I said it was the strongest anti-utilitarian intuition pump I know.

With that said, how can we formalize a system of ethics that takes into account not just happiness, but also the intrinsic importance of things like autonomy and consent? As far as I’ve seen, every such attempt ends up looking really shabby and accepting unintuitive moral conclusions of its own. And among all of the ethical systems that I’ve seen, only utilitarianism does as good a job at capturing so many of my ethical intuitions in such a simple formalization.

So this is where I am at with utilitarianism. I intrinsically value a bunch of things besides happiness. If I am simply engaging in the purely descriptive project of ethics, then I am far from a utilitarian. But the more I systematize my ethical framework, the more utilitarian I become. If I heavily optimize for consistency, I end up a hard-line utilitarian, biting all of the nasty bullets in favor of the simplicity and generality of the utilitarian framework. I’m just not sure why I should spend so much mental effort systematizing my ethical framework.

This puts me in a strange position when it comes to actually making decisions in my life. While I don’t find myself in positions in which the utilitarian option is as horrifically immoral as in the thought experiment I’ve presented here, I still am sometimes in situations where maximizing net happiness looks like it involves behaving in ways that seem intuitively immoral. I tend to default towards the non-utilitarian option in these situations, but don’t have any great principled reason for doing so.

Value beyond ethics

There is a certain type of value in our existence that transcends ethical value. It is beautifully captured in this quote from Richard Feynman:

It is a great adventure to contemplate the universe, beyond man, to contemplate what it would be like without man, as it was in a great part of its long history and as it is in a great majority of places. When this objective view is finally attained, and the mystery and majesty of matter are fully appreciated, to then turn the objective eye back on man viewed as matter, to view life as part of this universal mystery of greatest depth, is to sense an experience which is very rare, and very exciting. It usually ends in laughter and a delight in the futility of trying to understand what this atom in the universe is, this thing—atoms with curiosity—that looks at itself and wonders why it wonders.

Well, these scientific views end in awe and mystery, lost at the edge in uncertainty, but they appear to be so deep and so impressive that the theory that it is all arranged as a stage for God to watch man’s struggle for good and evil seems inadequate.

The Meaning Of It All

Carl Sagan beautifully expressed the same sentiment.

We are the local embodiment of a Cosmos grown to self-awareness. We have begun to contemplate our origins: starstuff pondering the stars; organized assemblages of ten billion billion billion atoms considering the evolution of atoms; tracing the long journey by which, here at least, consciousness arose. Our loyalties are to the species and the planet. We speak for Earth. Our obligation to survive is owed not just to ourselves but also to that Cosmos, ancient and vast, from which we spring.


The ideas expressed in these quotes feels a thousand times deeper and more profound than anything offered in ethics. Trolley problems seem trivial by comparison. If somebody argued that the universe would be better off without us on the basis of, say, a utilitarian calculation of net happiness, I would feel like there is an entire dimension of value that they are completely missing out on. This type of value, a type of raw aesthetic sense of the profound strangeness and beauty of reality, is tremendously subtle and easily slips out of grasp, but is crucially important. My blog header serves as a reminder: We are atoms contemplating atoms.

Timeless ethics and Kant

The more I think about timeless decision theory, the more it seems obviously correct to me.

The key idea is that sometimes there is a certain type of non-causal logical dependency (called a subjunctive dependence) between agents that must be taken into account by those agents in order to make rational decisions. The class of cases in which subjunctive dependences become relevant involve agents in environments that contain other agents trying to predict their actions, and also environments that contain other agents that are similar to them.

Here’s my favorite motivating thought experiment for TDT: Imagine that you encounter a perfect clone of yourself. You have lived identical lives, and are structurally identical in every way. Now you are placed in a prisoner’s dilemma together. Should you cooperate or defect?

A non-TDTist might see no good reason to cooperate – after all, defecting dominates cooperation as a strategy, and your decision doesn’t affect your clone’s decision. If the two of you share no common cause explanation for your similarity, then this conclusion is even stronger – both evidential and causal decision theory would defect. So both you and your clone defect and you both walk away unhappy.

TDT is just the admission that there is an additional non-causal dependence between your decision and your clone’s decision that must be taken into account. This dependence comes from the fact that you and your clone have a shared input-output structure. That is, no matter what you end up doing, you know that your clone must do the same thing, because your clone is operating identically to you.

In a deterministic world, it is logically impossible that you choose to do X and your clone does Y. The initial conditions are the same, so the final conditions must be the same. So you end up cooperating, as does your clone, and everybody walks away happy.

With an imperfect clone, it is no longer logically impossible, but there still exists a subjunctive dependence between your actions and your clone’s.

This is a natural and necessary modification to decision theory. We take into account not only the causal effects of our actions, but the evidential effects of our actions. Even if your action does not causally affect a given outcome, it might still make it more or less likely, and subjunctive dependence is one of the ways that this can happen.

TDTists interacting with each other would get along really nicely. They wouldn’t fall victim to coordination problems, because they wouldn’t see their decisions as isolated and disconnected from the decisions of the others. They wouldn’t undercut each other in bargaining problems in which one side gets to make the deals and the other can only accept or reject.

In general, they would behave in a lot of ways that are standardly depicted as irrational (like one-boxing in Newcomb’s problem and cooperating in the prisoner’s dilemma), and end up much better off as a result. Such a society seems potentially much nicer and subject to fewer of the common failure modes of standard decision theory.

In particular, in a society in which it is common knowledge that everybody is a perfect TDTist, there can be strong subjunctive dependencies between the actions causally disconnected agents. If a TDTist is considering whether or not to vote for their preferred candidate, they aren’t comparing outcomes that differ by a single vote. They are considering outcomes that differ by the size of the entire class of individuals that would be reasoning similar to them.

In simple enough cases, this could mean that your decision about whether to vote is really a decision about if millions of people will vote or not. This may sound weird, but it follows from the exact same type of reasoning as in the clone prisoner’s dilemma.

Imagine that the society consisted entirely of 10 million exact clones of you, each deciding whether or not to vote. In such a world, each individual’s choice is perfectly subjunctively dependent upon every other individual’s choice. If one of them decides not to vote, then all of them decide not to vote.

In a more general case, perfect clones of you don’t exist in your environment. But in any given context, there is still a large class of individuals that reason similarly to you as a result of a similar input-output process.

For example, all humans are very similar in certain ways. If I notice that my blood is red, and I had previously never heard about or seen the blood color of anybody else, then I should now strongly update on the redness of the blood of other humans. This is obviously not because my blood being red causes others to have red blood. It is also not because of a common cause – in principle, any such cause could be screened off, and we would expect the same dependence to exist solely in virtue of the similarity of structure. We would expect red blood in alien whose evolutionary history has been entirely causally separated from ours but who by a wild coincidence has the same DNA structure as humans.

Our similarities in structure can be less salient to us when we think about our minds and the way we make decisions, but they still are there. If you notice that you have a strong inclination to decide to take action X, then this actually does serve as evidence that a large class of other people will take action X. The size of this class and the strength of this evidence depends on which particular X is being analyzed.

Ethics and TDT

It is natural to wonder: what sort of ethical systems naturally arise out of TDT?

We can turn a decision theory into an ethical framework by choosing a utility function that encodes the values associated with that ethical framework. The utility function for hedonic utilitarianism assigns utility according to the total well-being in the universe. The utility function for egoism assigns utility only to your own happiness, apathetic to the well-being of others.

Virtue ethics and deontological ethics are harder to encode. We could do the first by assigning utility to virtuous character traits and disutility to vices. The second could potentially be achieved by assigning negative infinities to violations of the moral rules.

Let’s brush aside the fact that some of these assignments are less feasible than others. Pretend that your favorite ethical system has a nice neat way of being formalized as a utility function. Now, the distinctive feature of TDT-based ethics is that when we are trying to decide on the most ethical course of action, TDT says that we must imagine that our decision would also be taken by anybody else that is sufficiently similar to you in a sufficiently similar context.

In other words, in contemplating what the right action to take is, you imagine a world in which these actions are universalized! This sounds very Kantian. One of his more famous descriptions of the categorical imperative was:

Act only according to that maxim whereby you can at the same time will that it should become a universal law.

This could be a tagline for ethics in a world of TDTists! The maxim for your action resembles the notion of similarity in motivation and situational context that generates subjunctive dependence, and the categorical imperative is the demand that you must take into account this subjunctive dependence if you are to reason consistently.

But actually, I think that the resemblance between Kantian ethical reasoning and timeless decision theory begins to fade away when you look closer. I’ll list three main points of difference:

  1. Consistency vs expected utility
  2. Maxim vs subjunctive dependence
  3. Differences in application

1. Consistency vs expected utility

Kantian universalizability is not about expected utility, it is about consistency. The categorical imperative forbids acts that, when universalized, become self-undermining. If an act is consistently universalizable, then it is not a violation of the categorical imperative, even if it ends up with everybody in horrible misery.

Timeless decision theory looks at a world in which everybody acts according to the same maxim that you are acting under, and then asks whether this world looks nice or not. “Looks nice” refers to your utility function, not any notion of consistency or non-self-underminingness (not a word, I know).

So this is the first major difference: A timeless ethical theorist cares ultimately about optimizing their moral values, not about making sure that their values are consistently applicable in the limit of universal instantiation. This puts TDT-based ethics closer to a rule-based consequentialism than to Kantian ethics, although this comparison is also flawed.

Is this a bug or a feature of TDT?

I’m tempted to say it’s a feature. My favorite example of why Kantian consistency is not a desirable meta-ethical principle is that if everybody were to give to charity, then all the problems that could be solved by giving to charity would be solved, and the opportunity to give to charity would disappear. So the act of giving to charity becomes self-undermining upon universalization.

To which I think the right response is: “So what?”

If a world in which everybody gives to charity is a world in which there are no more problems to be solved by charity-giving, then that sounds pretty great to me. If this consistency requirement prevents you from solving the problems you set out to solve, then it seems like a pretty useless requirement for ethical reasoning.

If your values can be encoded into an expected utility function, then the goal of your ethics should be to maximize that function. The antecedent of this conditional could be reasonably disputed, but I think the conditional as a whole is fairly unobjectionable.

2. Maxim versus subjunctive dependence

One of the most common retorts to Kant’s formulation of the categorical imperative rests on the ambiguity of the term ‘maxim’.

For Kant, your maxim is supposed to be the motivating principle behind your action. It can be thought of as a general rule that determines the contexts in which you would take this action.

If your action is to donate money, then your maxim might be to give 10% of your income to charity every year. If your action is to lie to your boss about why you missed work, then your maxim might be to be dishonest whenever doing otherwise will damage your career prospects.

Now, the maxim is the thing that is universalized, not the action itself. So you don’t suppose that everybody suddenly starts lying to their boss. Instead, you imagine that anybody in a situation where being honest would hurt their career prospects begins lying.

In this situation, Kant would argue that if nobody was honest in these situations, then their bosses would just assume dishonesty, in which case, the employees would never even get the chance to lie in the first place. This is self-undermining; hence, forbidden!

I actually like this line of reasoning a lot. Scott Alexander describes it as similar to the following rule:

Don’t do things that undermine the possibility to offer positive-sum bargains.

Coordination problems arise because individuals decide to defect from optimal equilibriums. If these defectors were reasoning from the Kantian principle of universalizability, they would realize that if everybody behaved similarly then the ability to defect might be undermined.

But the problem largely lies in how one specifies the maxim. For example, compare the following two maxims:

Maxim 1: Lie to your boss whenever being honest would hurt your career opportunities.

Maxim 2: Lie to your boss about why you missed work whenever the real reason is that you went on all-night bar-hopping marathon with your friends Jackie and Khloe and then stayed up all night watched Breaking Bad highlight clips on your Apple TV.

If Maxim 2 is the true motivating principle of your action, then it seems a lot less obvious that the action is a violation of the categorical imperative. If only people in this precisely specified context lied to their bosses, then bosses would overall probably not become less trusting of their employees (unless your boss knows an unusual amount about your personal life). So the maxim is not self-undermining under universalization, and is therefore not forbidden.

Under Maxim 1, lying is forbidden, and under Maxim 2, it is not. But what is the true maxim? There’s no clear answer to this question. Any given action can be truthfully described as arising from numerous different motivational schema, and in general these choices will result in a variety of different moral guidelines.

In TDT, the analog to the concept of a maxim is subjunctive dependence, and this can be defined fairly precisely, without ambiguity. Subjunctive dependence between agents in a given context is just the degree of evidence you get about the actions of an agent given information about the actions of the other agents in that context.

More precisely, it is the degree of non-causal dependence between the actions of agents. It essentially arises from the fact that in a lawful physical universe, similar initial conditions will result in similar final conditions. This can be worded as similarity in initial conditions, in input-output structure, in computational structure, or in logical structure, but the basic idea is the same.

Not only is this precisely defined, it is a real dependence. You don’t have to imagine a fictional universe in which your action makes it more likely that others will act similarly; the claim of TDT is that this is actually the case!

In this sense, TDT is rooted in a simple acknowledgement of dependencies that really do exist and that can be precisely defined, while Kant’s categorical imperative relies on the ambiguous notion of a maxim, as well as a seemingly arbitrary hypothetical consideration. One might be tempted to ask: “Who cares what would happen if hypothetically everybody else acted according to a similar maxim? We should care about is what will actually happen in the real world; we shouldn’t be basing our decisions off of absurd hypothetical worlds!”

3. Difference in application

These two theoretical reasons are fairly convincing to me that Kantianism and TDT ethics are only superficially similar, and are theoretically quite different. But there still remains a question of how much the actual “outputs” of the two frameworks converge. Do they just end up giving similar ethical advice?

I don’t think so. First, consider the issue I touched on previously. I said that defectors that paid attention to the categorical imperative would rethink their decision, because it is not universalizable. But this is not in general true.

If defectors are always free to defect, regardless of how many others defect as well, then defecting will still be universalizable! It is only in special cases that Kantians will not defect, like when a mob boss will come in and necessitate cooperation if enough people defect, or where universal defection depletes an expendable resource that would otherwise be renewable.

The set of coordination problems in which defecting automatically becomes impossible at a certain point are the easiest cases of coordination problems. It’s much harder to get individuals to coordinate if there is no mob boss to step in and set everybody right. These are the cases where Kantianism fails, and TDT succeeds.

TDTists with shared goals for whom cooperation would be more effective for achieving these goals would always cooperate, even if “each would individually be better off” if they defected. (I put scare quotes because you only come to this conclusion by ignoring important dependencies in the problem).

The key difference here comes down again to #1: timeless decision theorists maximize expected utility, not Kantian consistency.

In addition, TDTists don’t necessarily have Kantian hangups about using people as means to an end: if doing so ends up producing a higher expected utility than not, then they’ll go for it without hesitation.

A TDTist that can save two people’s lives by causing a little harm to one person would probably do it if their utility function was relatively impartial and placed a positive value on life. A Kantian would forbid this act.

(Why? Well, Kant thought that this principle of treating people as ends in themselves rather than means to an end was equivalent to the universalizability principle, and as far as I know, pretty much nobody was convinced by his argument for why this was the case. As such, a lot of Kantian ethics looks like it doesn’t actually follow from the universalizability principle.)

An application that might be similar for Kantian ethics and TDT ethics is the treatment of dishonesty and deception. Kant famously forbid any lying of any kind, regardless of the consequences, on the basis that universal lying would undermine the trust that is necessary to make lying a possibility.

One can imagine a similar case made for honesty in TDT ethics. In a society of TDTs, a decision to lie is a decision to produce a society that is overall less honest and less trusting. In situations where the individual benefits of dishonesty are zero-sum, only the negative effects of dishonesty are amplified. This could plausibly make dishonesty on the whole a net negative policy.

Infinite ethics

There are a whole bunch of ways in which infinities make decision theory go screwy. I’ve written about some of those ways here. This post is about a thought experiment in which infinities make ethics go screwy.

WaitButWhy has a great description of the thought experiment, and I recommend you check out the post. I’ll briefly describe it here anyway:

Imagine two worlds, World A and World B. Each is an infinite two-dimensional checkerboard, and on each square sits a conscious being that can either be very happy or very sad. At the birth of time, World A is entirely populated by happy beings, and World B entirely by sad beings.

From that moment forwards, World A gradually becomes infected with sadness in a growing bubble, while World B gradually becomes infected with happiness in a growing bubble. Both universes exist forever, so the bubble continues to grow forever.

Picture from WaitButWhy

The decision theory question is: if you could choose to be placed in one of these two worlds in a random square, which should you choose?

The ethical question is: which of the universes is morally preferable? Said another way: if you had to bring one of the two worlds into existence, which would you choose?

On spatial dominance

At every moment of time, World A contains an infinity of happiness and a finite amount of sadness. On the other hand, World B always contains an infinity of sadness and a finite amount of happiness.

This suggests the answer to both the ethical question and the decision theory question: World A is better. Ethically, it seems obvious that infinite happiness minus finite sadness is infinitely better than infinite sadness minus finite happiness. And rationally, given that there are always infinitely more people outside the bubble than inside, at any given moment in time you can be sure that you are on the outside.

A plot of the bubble radius over time in each world would look like this:

Infinite Ethics Plots

In this image, we can see that no matter what moment of time you’re looking at, World A dominates World B as a choice.

On temporal dominance

But there’s another argument.

Let’s look at a person at any given square on the checkerboard. In World A, they start out happy and stay that way for some finite amount of time. But eventually, they are caught by the expanding sadness bubble, and then stay sad forever. In World B, they start out sad for a finite amount of time, but eventually are caught by the expanding happiness bubble and are happy forever.

Plotted, this looks like:

Infinite Ethics Plots 2.png

So which do you prefer? Well, clearly it’s better to be sad for a finite amount of time and happy for an infinite amount of time than vice versa. And ethically, choosing World A amounts to dooming every individual to a lifetime of finite happiness and then infinite sadness, while World B is the reverse.

So no matter which position on the checkerboard you’re looking at, World B dominates World A as a choice!

An impossible synthesis

Let’s summarize: if you look at the spatial distribution for any given moment of time, you see that World A is infinitely preferable to World B. And if you look at the temporal distribution for any given position in space, you find that B is infinitely preferable to A.

Interestingly, I find that the spatial argument seems more compelling when considering the ethical question, while the temporal argument seems more compelling when considering the decision theory question. But both of these arguments apply equally well to both questions. For instance, if you are wondering which world you should choose to be in, then you can think forward to any arbitrary moment of time, and consider your chances of being happy vs being sad in that moment. This will get you the conclusion that you should go with World A, as for any moment in the future, you have a 100% chance of being one of the happy people as opposed to the sad people.

I wonder if the difference is that when we are thinking about decision theory, we are imagining ourselves in the world at a fixed location with time flowing past us, and it is less intuitive to think of ourselves at a fixed time and ask where we likely are.

Regardless, what do we do in the face of these competing arguments? One reasonable thing is to try to combine the two approaches. Instead of just looking at a fixed position for all time, or a fixed time over all space, we look at all space and all time, summing up total happiness moments and subtracting total sadness moments.

But now we have a problem… how do we evaluate this? What we have in both worlds is essentially a +∞ and a -∞ added together, and no clear procedure for how to make sense of this addition.

In fact, it’s worse than this. By cleverly choosing a way of adding up the total amount of the quantity happiness – sadness, we can make the result turn out however we want! For instance, we can reach the conclusion that World A results in a net +33 happiness – sadness by first counting up 33 happy moments, and then ever afterwords switching between counting a happy moment and a sad moment. This summation will eventually end up counting all the happy and sad moments, and will conclude that the total is +33.

But of course, there’s nothing special about +33; we could have chosen to reach any conclusion we wanted by just changing our procedure accordingly. This is unusual. It seems that both the total expected value and moral value are undefined for this problem.

The undefinability of the total happiness – sadness of this universe is a special case of the general rule that you can’t subtract infinity from infinity. This seems fairly harmless… maybe it keeps us from giving a satisfactory answer to this one thought experiment, but surely nothing like this could matter to real-world ethical or decision theory dilemmas?

Wrong! If in fact we live in an infinite universe, then we are faced with exactly this problem. If there are an infinite number of conscious experiencers out there, some suffering and some happy, then the total quantity of happiness – sadness in the universe is undefined! What’s more, a moral system that says that we ought to increase the total happiness of the universe will return an error if asked to evaluate what we ought to do in an in infinite universe!

If you think that you should do your part to make the universe a happier place, then you must have some notion of a total amount of happiness that can be increased. And if the total amount of happiness is unbounded, then there is no sensible way to increase it. This seems like a serious problem for most brands of consequentialism, albeit a very unusual one.

Sam Harris and the is-ought distinction

Sam Harris puzzles me sometimes.

I recently saw a great video explaining Hume’s is-ought distinction and its relation to the orthogonality thesis in artificial intelligence. Even if you are familiar with these two, the video is a nice and clear exposition of the basic points, and I recommend it highly.

Quick summary: Statements about what one ought to do are logically distinct from ‘is’ statements – those that simply describe the world. Statements in the second category cannot be derived from statements in the first, and vice versa. Artificial intelligence researchers recognize this in the form of the orthogonality thesis – an AI can have any combination of intelligence level and terminal values, and learning more about the world or becoming more intelligent will not result in value convergence.

This is more than just pure theory. When you’re building an AI, you actually have to design its goals separately from its capacity to describe and model the world. And if you don’t do so, then you’ll have failed at the alignment task – ensuring that the goals of the AI are friendly to humanity (and most of the space of all possible goals looks fairly unfriendy).

Said more succinctly: if you think that a sufficiently intelligent being that spends enough time observing the world and figuring out all of its “is” statements would naturally start to converge towards some set of “ought” statements, you’re wrong. While this may seem very intuitively compelling, it just isn’t the case. Values and beliefs are orthogonal, and if you think otherwise, you’d make a bad AI designer.

I am very confused by the existence of apparently reasonable people that have spent any significant amount of time thinking about this and have concluded that “ought”s can be derived from “is”s, or that “ought”s are really just some special type of “is”s.

Case in point: Sam Harris’s “argument” for getting an ought from an is.

Getting from “Is” to “Ought”

1/ Let’s assume that there are no ought’s or should’s in this universe. There is only what *is*—the totality of actual (and possible) facts.

2/ Among the myriad things that exist are conscious minds, susceptible to a vast range of actual (and possible) experiences.

3/ Unfortunately, many experiences suck. And they don’t just suck as a matter of cultural convention or personal bias—they really and truly suck. (If you doubt this, place your hand on a hot stove and report back.)

4/ Conscious minds are natural phenomena. Consequently, if we were to learn everything there is to know about physics, chemistry, biology, psychology, economics, etc., we would know everything there is to know about making our corner of the universe suck less.

5/ If we *should* to do anything in this life, we should avoid what really and truly sucks. (If you consider this question-begging, consult your stove, as above.)

6/ Of course, we can be confused or mistaken about experience. Something can suck for a while, only to reveal new experiences which don’t suck at all. On these occasions we say, “At first that sucked, but it was worth it!”

7/ We can also be selfish and shortsighted. Many solutions to our problems are zero-sum (my gain will be your loss). But *better* solutions aren’t. (By what measure of “better”? Fewer things suck.)

8/ So what is morality? What *ought* sentient beings like ourselves do? Understand how the world works (facts), so that we can avoid what sucks (values).

This is clearly a bad argument. Depending on what he intended it to be, step 5 is either a non sequitur or a restatement of his conclusion. It’s especially surprising because the mistake is so clearly visible.

I don’t know what to make of this, and really have no charitable interpretation. My least uncharitable interpretation is that maybe the root of the problem is an inability to let go of the longing to ground your moral convictions in objectivity. I’ve certainly had times where I felt so completely confident about a moral conviction that I convinced myself that it just had to be objectively true, although I always eventually came down from those highs.

I’m not sure, but this is something that I am very confused about and want to understand better.