Inference as a balance of accommodation, prediction, and simplicity

(This post is a soft intro to some of the many interesting aspects of model selection. I will inevitably skim over many nuances and leave out important details, but hopefully the final product is worth reading as a dive into the topic. A lot of the general framing I present here is picked up from Malcolm Forster’s writings.)

What is the optimal algorithm for discovering the truth? There are many different candidates out there, and it’s not totally clear how to adjudicate between them. One issue is that it is not obvious exactly how to measure correspondence to truth. There are several different criterion that we can use, and in this post, I want to talk about three big ones: accommodation, prediction, and simplicity.
The basic idea of accommodation is that we want our theories to do a good job at explaining the data that we have observed. Prediction is about doing well at predicting future data. Simplicity is, well, just exactly what it sounds like. Its value has been recognized in the form of Occam’s razor, or the law of parsimony, although it is famously difficult to formalize.
Let’s say that we want to model the relationship between the number of times we toss a fair coin and the number of times that it lands H. We might get a data set that looks something like this:
Data

Now, our goal is to fit a curve to this data. How best to do this?

Consider the following two potential curves:

Curve fitting

Curve 1 is generated by Procedure 1: Find the lowest-order polynomial that perfectly matches the data.

Curve 2 is generated by Procedure 2: Find the straight line that best fits the data.

If we only cared about accommodation, then we’ll prefer Curve 1 over Curve 2. After all, Curve 1 matches our data perfectly! Curve 2, on the other hand, is always close but never exactly right.

On the other hand, regardless of how well Curve 1 fits the data, it entirely misses the underlying pattern in the data captured by Curve 2! This demonstrates one of the failure modes of a single-minded focus on accommodation: the problem of overfitting.

We might want to solve in this problem by noting that while Curve 1 matches the data better, it does so in virtue of its enormous complexity. Curve 2, on the other hand, matches the data pretty well, but does so simply. A combined focus on accommodation + simplicity might, therefore, favor Curve 2. Of course, this requires us to precisely specify what we mean by ‘simplicity’, which has been the subject of a lot of debate. For instance, some have argued that an individual curve cannot be said to be more or less simple than a different curve, as just rephrasing the data in a new coordinate system can flip the apparent simplicity relationship. This is a general version of the grue-bleen problem, which is a fantastic problem that deserves talking about in a separate post.

Another way to solve this problem is by optimizing for accommodation + prediction. The over-fitted curve is likely to be very off if you ask for predictions about future data, while the straight line is likely going to do better. This makes sense – a straight line makes better forecasts about future data because it has gotten to the true nature of the underlying relationship.

What if we want to ensure that our model does a good job at predicting future data, but are unable to gather future data? For example, suppose that we lost the coin that we were using to generate the data, but still want to know what model would have done best at predicting future flips? Cross-validation is a wonderful technique that can be used to deal with exactly this problem.

How does it work? The idea is that we randomly split up the data we have into two sets, the training set and the testing set. Then we train our models on the training set (see which curve each model ends up choosing as its best fit, given the training data), and test it on the testing set. For instance, if our training set is just the data from the early coin flips, we find the following:

Curve fitting cross validation
Cross validation

We can see that while the new Curve 2 does roughly as well as it did before, the new Curve 1 will do horribly on the testing set. We now do this for many different ways of splitting up our data set, and in the end accumulate a cross-validation “score”. This score represents the average success of the model at predicting points that it was not trained on.

We expect that in general, models that overfit will tend to do horribly badly when asked to predict the testing data, while models that actually get at the true relationship will tend to do much better. This is a beautiful method for avoiding overfitting by getting at the deep underlying relationships, and optimizing for the value of predictive accuracy.

It seems like predictive accuracy and simplicity often go hand-in-hand. In our coin example, the simpler model (the straight line) was also the more predictively accurate one. And models that overfit tend to be both bad at making accurate predictions and enormously complicated. What is the explanation for this relationship?

One classic explanation says that simpler models tend to be more predictive because the universe just actually is relatively simple. For whatever reason, the actual relationships between different variables in the universe happens to be best modeled by simple equations, not complicated ones. Why? One reason that you could point to is the underlying simplicity of the laws of nature.

The Standard Model of particle physics, which gives rise to basically all of the complex behavior we see in the world, can be expressed in an equation that can be written on a t-shirt. In general, physicists have found that reality seems to obey very mathematically simple laws at its most fundamental level.

I think that this is somewhat of a non-explanation. It predicts simplicity in the results of particle physics experiments, but does not at all predict simple results for higher-level phenomenon. In general, very complex phenomena can arise from very simple laws, and we get no guarantee that the world will obey simple laws when we’re talking about patterns involving 1020 particles.

An explanation that I haven’t heard before references possible selection biases. The basic idea is that most variables out there that we could analyze are likely not connected by any simple relationships. Think of any random two variables, like the number of seals mating at any moment and the distance between Obama and Trump at that moment. Are these likely to be related by a simple equation? Of course!

(Kidding. Of course not.)

The only times when we do end up searching for patterns in variables is when we have already noticed that some pattern does plausibly seem to exist. And since we’re more likely to notice simpler patterns, we should expect a selection bias among those patterns we’re looking at. In other words, given that we’re looking for a pattern between two variables, it is fairly likely that there is a pattern that is simple enough for us to notice in the first place.

Regardless, it looks like an important general feature of inference systems to provide a good balance between accommodation and either prediction or simplicity. So what do actual systems of inference do?

I’ve already talked about cross validation as a tool for inference. It optimizes for accommodation (in the training set) + prediction (in the testing set), but not explicitly for simplicity.

Updating of beliefs via Bayes’ rule is a purely accommodation procedure. When you take your prior credence P(T) and update it with evidence E, you are ultimately just doing your best to accommodate the new information.

Bayes’ Rule: P(T | E) = P(T) ∙ P(E | T) / P(T) 

The theory that receives the greatest credence bump is going to be the theory that maximizes P(E | T), or the likelihood of the evidence given the theory. This is all about accommodation, and entirely unrelated to the other virtues. Technically, the method of choosing the theory that maximizes the likelihood of your data is known as Maximum Likelihood Estimation (MLE).

On the other hand, the priors that you start with might be set in such a way as to favor simpler theories. Most frameworks for setting priors do this either explicitly or implicitly (principle of indifference, maximum entropy, minimum description length, Solomonoff induction).

Leaving Bayes, we can look to information theory as the foundation for another set of epistemological frameworks. These are focused mostly on minimizing the information gain from new evidence, which is equivalent to maximizing the relative entropy of your new distribution and your old distribution.

Two approximations of this procedure are the Akaike Information Criterion (AIC) and the Bayesian Information Criterion (BIC), each focusing on subtly different goals. Both of these explicitly take into account simplicity in their form, and are designed to optimize for both accommodation and prediction.

Here’s a table of these different procedures, as well as others I haven’t mentioned yet, and what they optimize for:

Optimizes for…

Accommodation?

Prediction?

Simplicity?

Maximum Likelihood Estimation

Minimize Sum of Squares

Bayesian Updating

Principle of Indifference

Maximum Entropy Priors

Minimum Message Length

Solomonoff Induction

P-Testing

Minimize Mallow’s Cp

Maximize Relative Entropy

Minimize Log Loss

Cross Validation

Minimize Akaike Information Criterion (AIC)

Minimize Bayesian Information Criterion (AIC)

Some of the procedures I’ve included are closely related to others, and in some cases they are in fact approximations of others (e.g. minimize log loss ≈ maximize relative entropy, minimize AIC ≈ minimize log loss).

We can see in this table that Bayesianism (Bayesian updating + a prior-setting procedure) does not explicitly optimize for predictive value. It optimizes for simplicity through the prior-setting procedure, and in doing so also happens to pick up predictive value by association, but doesn’t get the benefits of procedures like cross-validation.

This is one reason why Bayesianism might be seen as suboptimal – prediction is the great goal of science, and it is entirely missing from the equations of Bayes’ rule.

On the other hand, procedures like cross validation and maximization of relative entropy look like good candidates for optimizing for accommodation and predictive value, and picking up simplicity along the way.

Pascal’s mugging

  • You should make decisions by evaluating the expected utilities of your various options and choosing the largest one.

This is a pretty standard and uncontroversial idea. There is room for controversy about how to fill in the details about how to evaluate expected utilities, but this basic premise is hard to argue against. So let’s argue against it!

Suppose that a stranger walks up to you in the street and says to you “I have been wired in from outside the simulation to give you the following message: If you don’t hand over five dollars to me right now, your simulator will teleport you to a dungeon and torture you for all eternity.” What should you do?

The obviously correct answer is that you should chuckle, continue on with your day, and laugh about the incident later on with your friends.

The answer you get from a simple application of decision theory is that as long as you aren’t absolutely, 100% sure that they are wrong, you should give them the five dollars. And you should definitely not be 100% sure. Why?

Suppose that the stranger says next: “I know that you’re probably skeptical about the whole simulation business, so here’s some evidence. Say any word that you please, and I will instantly reshape the clouds in the sky into that word.” You do so, and sure enough the clouds reshape themselves. Would this push your credences around a little? If so, then you didn’t start at 100%. Truly certain beliefs are those that can’t be budged by any evidence whatsoever. You can never update downwards on truly certain beliefs, by the definition of ‘truly certain’.

To go more extreme, just suppose that they demonstrate to you that they’re telling you the truth by teleporting you to a dungeon for five minutes of torture, and then bringing you back to your starting spot. If you would even slightly update your beliefs about their credibility in this scenario, then you had a non-zero credence in their credibility from the start.

And after all, this makes sense. You should only have complete confidence in the falsity of logical contradictions, and it’s not literally logically impossible that we are in a simulation, or that the simulator decides to mess with our heads in this bizarre way.

Okay, so you have a nonzero credence in their ability to do what they say they can do. And any nonzero credence, no matter how tiny, will result in the rational choice being to hand over the $5. After all, if expected utility is just calculated by summing up utilities weighted by probabilities, then you have something like the following:

EU(keep $5) – EU(give $5) = ε · U(infinite torture) – U(keep $5)
where ε = P(infinite torture | keep $5) – P(infinite torture | give $5)

As long as losing $5 isn’t infinitely bad to you, you should hand over the money. This seems like a problem, either for our intuitions or for decision theory.

***

So here are four propositions, and you must reject at least one of them:

  1. There is a nonzero chance of the stranger’s threat being credible.
  2. Infinite torture is infinitely worse than losing $5.
  3. The rational thing to do is that which maximizes expected utility.
  4. It is irrational to give the stranger $5.

I’ve already argued for (1), and (2) seems virtually definitional. So our choice is between (3) and (4). In other words, we either abandon the principle of maximizing expected utility as a guide to instrumental rationality, or we reject our intuitive confidence in the correctness of (4).

Maybe at this point you feel more willing to accept (4). After all, intuitions are just intuitions, and humans are known to be bad at reasoning about very small probabilities and very large numbers. Maybe it actually makes sense to hand over the $5.

But consider where this line of reasoning leads.

The exact same argument should lead you to give in to any demand that the stranger makes of you, as long as it doesn’t have a literal negative infinity utility value. So if the stranger tells you to hand over your car keys, to go dance around naked in a public square, or to commit heinous crimes… all of these behaviors would be apparently rationally mandated.

Maybe, maybe, you might be willing to bite the bullet and say that yes, these behaviors are all perfectly rational, because of the tiny chance that this stranger is telling the truth. I’d still be willing to bet that you wouldn’t actually behave in this self-professedly “rational” manner if I now made this threat to you.

Also, notice that this dilemma is almost identical to Pascal’s wager. If you buy the argument here, then you should also be doing all that you can to ensure that you stay out of Hell. If you’re queasy about the infinities and think decision theory shouldn’t be messing around with such things, then we can easily modify the problem.

Instead of “your simulator will teleport you to a dungeon and torture you for all eternity”, make it “your simulator will teleport you to a dungeon and torture you for 3↑↑↑↑3 years.” The negative utility of this is large enough as to outweigh any reasonable credence you could place in the credibility of the threat. And if it isn’t, we can just make the number of years even larger.

Maybe the probability of a given payout scales inversely with the size of the payout? But this seems fairly arbitrary. Is it really the case that the ability to torture you for 3↑↑↑↑3 years is twice as likely as the ability to torture you for 2 ∙ 3↑↑↑↑3 years? I can’t imagine why. It seems like the probability of these are going to be roughly equal – essentially, once you buy into the prospect of a simulator that is able to torture you for 3↑↑↑↑3 years, you’ve already basically bought into the prospect that they are able to torture you for twice that amount of time.

All we’re left with is to throw our hands up and say “I can’t explain why this argument is wrong, and I don’t know how decision theory has gone wrong here, but I just know that it’s wrong. There is no way that the actually rational thing to do is to allow myself to get mugged by anybody that has heard of Pascal’s wager.”

In other words, it seems like the correct response to Pascal’s mugging is to reject (3) and deny the expected-utility-maximizing approach to decision theory. The natural next question is: If expected utility maximization has failed us, then what should replace it? And how would it deal with Pascal’s mugging scenarios? I would love to see suggestions in the comments, but I suspect that this is a question that we are simply not advanced enough to satisfactorily answer yet.

Those who have forgotten words

The fish trap exists because of the fish. Once you’ve gotten the fish you can forget the trap. The rabbit snare exists because of the rabbit. Once you’ve gotten the rabbit, you can forget the snare. Words exist because of meaning. Once you’ve gotten the meaning, you can forget the words. Where can I find a man who has forgotten words so I can talk with him?

― Zhuangzi

Why systematize epistemology?

A general pattern I’ve noticed in meta-level thinking is a spectrum of systemizing. I’ll explain what this means by a personal example.

When I was first exposed to the idea of ethics as a serious discipline, I found it fairly silly. I mean, clearly our ethical beliefs are not the types of things that we should expect objectivity from. They form from a highly subjective and complex mix of factors involving the peer group we surround ourselves with, the type of parents we had, our religious background, our inbuilt deep moral intuitions, our life experiences, and so on. What’s the point in thinking hard about your ethical beliefs – they just are what they are, right?

What I found funny was the idea that people thought it made sense to spend serious time and effort trying to analyze their ethical intuitions and creating general frameworks that capture as much of these intuitions as they could. I would say that I, for whatever reason, had an initially highly non-systematizing attitude towards ethics.

In college, I fell in with a crowd that liked spending long hours debating abstract ethical principals, and eventually grew fond of it myself. It became intuitive to me that of course it is desirable to have a simple, precisely formalized, and vastly generalizable ethical framework to guide your beliefs and actions. This remained the case even though I never lost the intuitive sense of the obviousness of moral non-objectivity.

Frameworks like utilitarianism appealed to me as incredibly simple general “laws of morality” that were able to capture most of my ethical intuitions, When they contradicted strong ethical intuitions, I felt okay with overriding these intuitions for the sake of the more valuable synthesis that was the framework as a whole.

These types of cognitive patterns – taking complex disparate phenomena, analyzing patterns in them, looking for precise and simple descriptions of these patterns and trying to generalize them as far as possible – are what I mean by systematizing. Some people are very strong systematizers when it comes to their aesthetic tastes – they will spend hours arguing about what beauty is and analyzing their basic aesthetic reactions in order to form simple general Theories of Everything Beautiful. Others think that this is stupid and a waste of time and cognitive resources.

Philosophers tend to be systematizers about literally everything – I’d say systematization comes close to a general definition of philosophy as an intellectual field. Scientists tend to be systematizers about the field that they work in, where they work obsessively to cleanly and neatly describe vast realms of natural phenomena. In our daily lives, systematizing tendencies come out in arguments about the quality of a certain movie or the tastiness of a meal or the attractiveness of a celebrity. Some people will want to dive into these debates with an attitude towards forming general principles of what makes a quality movie, or a tasty meal, or an attractive person, while others will dismiss the general principles, arguing instead from their gut-level reactions to the movie. Which is to say, some people will feel a desire to systematize their thoughts/ opinions/ desires/ tastes, and others will not.

Those that do not are perfectly content with a complicated and messy reality. They feel no inner urge pulling them towards de-cluttering their view of the world. From this perspective, it can be perplexing to see people working very hard to systematize their intuitions. Such efforts can seem fairly pointless, and downright absurd when the final product ends up contradicting some of the intuitions from which it was built.

About a lot of things, I am an extreme systematizer, relentlessly searching for concise, elegant, and powerful models to piece everything together. But there are plenty of other areas where I feel totally fine with messiness and complexity and am turned off by efforts to reduce or remove them. Aesthetics is one such area – I appreciate art on a gut level, and am weirded out by the prospect of trying to formulate a simple general theory of aesthetics.

One of the areas where I have the most extreme systematizing tendencies (as might be obvious from my writings on this blog) is formal epistemology. A single neat equation that summarizes the process of rational belief formation is just obviously desirable to me. This is not a desirability borne out of practical considerations. It is perhaps at its root a deeply aesthetic feeling about different structures of reasoning. I want to know not just what is practically useful for day-to-day reasoning, but also what is ultimately the best and most fundamental framework with which to describe my epistemological intuitions.

I choose the phrase ‘epistemological intuitions’ carefully and intentionally. We do not have any direct line to objective epistemic truth; we are not provided by Nature with a golden shining book in which the true nature of normative rational reasoning is laid out for us. What we do have, ultimately, is a set of deep intuitions about the way that good reasoning works. These intuitions are messy and complicated.

I say this all to make the point that strong enough systematizing intuitions can make the non-objective look objective, and I think it’s important to try to avoid that mistake. Maybe we think that if we extend our framework of reasoning enough, we can eventually find evolutionary justifications for why our patterns of reasoning should in general align with the truth. But this is simply an appeal to the value of reflective equilibria – the criterion that multiple alternative perspectives on the same framework end up cohering and bolstering one another.

If we try to say something like “We can find out what framework works best by just seeing how they do at predicting future events,” then we are relying on the intuition that empiricism is an epistemic virtue. Similarly, if we appeal to Occam’s razor, we are relying on intuitions about simplicity. If we think that better frameworks take little for granted and are cautious about jumping to strong conclusions, then we are relying on intuitions about epistemic humility. Etc.

The best we can do, it seems to me, is to compile different arguments starting from our deepest intuitions and ending at a particular epistemic framework. Bayesianism has arguments like Cox’s theorem and Dutch Book arguments. The empirical case for Bayesianism can be made by convergence and consistency theorems, as well as case studies in which Bayesian methods result in great predictive power.

But I think that it’s important to keep in mind that these are not absolute proofs of the objective superiority of Bayesianism. Ultimately, arguments for any epistemic framework rest on some set of deep-seated epistemic intuitions, and are ineradicably tied to these intuitions.

Timeless decision theory and homogeneity

Something that seems difficult to me about timeless decision theory is how to reason in a world where most people are not TDTists. In such a world, it seems like the subjunctive dependence between you and others gets weaker and weaker the more TDT influences your decision process.

Suppose you are deciding whether or not to vote. You think through all of the standard arguments you know of: your single vote is virtually guaranteed to not swing the election, so the causal effect of your vote is essentially nothing; the cost to you of voting is tiny, or maybe even positive if you go with a friend and show off your “I Voted” sticker all day;  if you vote, you might be able to persuade others to vote as well; etc. At the end of your pondering, you decide that it’s overall not worth it to vote.

Now a TDTist pops up behind your shoulder and says to you: “Look, think about all the other people out there reasoning similarly to you. If you end up not voting as a result of this reasoning, then it’s pretty likely that they’ll all not vote as well. On the other hand, if you do end up voting, then they probably will vote too! So instead of treating your decision as if it only makes the world 1 vote different, you should treat it is if it influences all the votes of those sufficiently similar to you.”

 Maybe you instantly find this convincing, and decide to go to the voting booth right away. But the problem is that in taking into account this extra argument, you have radically reduced the set of people whose overall reasoning process is similar to you!

This set was initially everybody that had thought through all the similar arguments and felt similarly to you about them, and most of these people ended up not voting. But as soon as the TDTist popped up and presented their argument, the set of people that were subjunctively dependent upon you shrunk to just those in the initial set that had also heard this argument.

In a world in which only a single person ever had thought about subjunctive dependence, and this person was not going to vote before thinking about it, the evidential effect of not voting is basically zero. Given this, the argument would have no sway on them.

This seems like it would weaken the TDTist’s case that TDTists do better in real world problems. At the same time, it seems actually right. In a case where very few people follow the same reasoning processes as you, your decisions tell you very little about the decisions of others, for the same reason that a highly neuro-atypical person should be hesitant to generalize information about their brain to other people.

Another conclusion of this is that timeless decision theory is most powerful in a community where there is homogeneity of thought and information. Propagation of the idea of timeless decision theory would amplify the coordination-inducing power of the procedure.

I’m not sure if this implies that a TDTist is motivated to spread the idea and homogenize their society, as doing so increases subjunctive dependence and thus enhances their influence. I’d guess that they would only reason this way if they thought themselves to be above average in decision-making, or to have information that others don’t, so that the expected utility of them having increased decision-making ability would outweigh the costs of homogeneity.

Timeless ethics and Kant

The more I think about timeless decision theory, the more it seems obviously correct to me.

The key idea is that sometimes there is a certain type of non-causal logical dependency (called a subjunctive dependence) between agents that must be taken into account by those agents in order to make rational decisions. The class of cases in which subjunctive dependences become relevant involve agents in environments that contain other agents trying to predict their actions, and also environments that contain other agents that are similar to them.

Here’s my favorite motivating thought experiment for TDT: Imagine that you encounter a perfect clone of yourself. You have lived identical lives, and are structurally identical in every way. Now you are placed in a prisoner’s dilemma together. Should you cooperate or defect?

A non-TDTist might see no good reason to cooperate – after all, defecting dominates cooperation as a strategy, and your decision doesn’t affect your clone’s decision. If the two of you share no common cause explanation for your similarity, then this conclusion is even stronger – both evidential and causal decision theory would defect. So both you and your clone defect and you both walk away unhappy.

TDT is just the admission that there is an additional non-causal dependence between your decision and your clone’s decision that must be taken into account. This dependence comes from the fact that you and your clone have a shared input-output structure. That is, no matter what you end up doing, you know that your clone must do the same thing, because your clone is operating identically to you.

In a deterministic world, it is logically impossible that you choose to do X and your clone does Y. The initial conditions are the same, so the final conditions must be the same. So you end up cooperating, as does your clone, and everybody walks away happy.

With an imperfect clone, it is no longer logically impossible, but there still exists a subjunctive dependence between your actions and your clone’s.

This is a natural and necessary modification to decision theory. We take into account not only the causal effects of our actions, but the evidential effects of our actions. Even if your action does not causally affect a given outcome, it might still make it more or less likely, and subjunctive dependence is one of the ways that this can happen.

TDTists interacting with each other would get along really nicely. They wouldn’t fall victim to coordination problems, because they wouldn’t see their decisions as isolated and disconnected from the decisions of the others. They wouldn’t undercut each other in bargaining problems in which one side gets to make the deals and the other can only accept or reject.

In general, they would behave in a lot of ways that are standardly depicted as irrational (like one-boxing in Newcomb’s problem and cooperating in the prisoner’s dilemma), and end up much better off as a result. Such a society seems potentially much nicer and subject to fewer of the common failure modes of standard decision theory.

In particular, in a society in which it is common knowledge that everybody is a perfect TDTist, there can be strong subjunctive dependencies between the actions causally disconnected agents. If a TDTist is considering whether or not to vote for their preferred candidate, they aren’t comparing outcomes that differ by a single vote. They are considering outcomes that differ by the size of the entire class of individuals that would be reasoning similar to them.

In simple enough cases, this could mean that your decision about whether to vote is really a decision about if millions of people will vote or not. This may sound weird, but it follows from the exact same type of reasoning as in the clone prisoner’s dilemma.

Imagine that the society consisted entirely of 10 million exact clones of you, each deciding whether or not to vote. In such a world, each individual’s choice is perfectly subjunctively dependent upon every other individual’s choice. If one of them decides not to vote, then all of them decide not to vote.

In a more general case, perfect clones of you don’t exist in your environment. But in any given context, there is still a large class of individuals that reason similarly to you as a result of a similar input-output process.

For example, all humans are very similar in certain ways. If I notice that my blood is red, and I had previously never heard about or seen the blood color of anybody else, then I should now strongly update on the redness of the blood of other humans. This is obviously not because my blood being red causes others to have red blood. It is also not because of a common cause – in principle, any such cause could be screened off, and we would expect the same dependence to exist solely in virtue of the similarity of structure. We would expect red blood in alien whose evolutionary history has been entirely causally separated from ours but who by a wild coincidence has the same DNA structure as humans.

Our similarities in structure can be less salient to us when we think about our minds and the way we make decisions, but they still are there. If you notice that you have a strong inclination to decide to take action X, then this actually does serve as evidence that a large class of other people will take action X. The size of this class and the strength of this evidence depends on which particular X is being analyzed.

Ethics and TDT

It is natural to wonder: what sort of ethical systems naturally arise out of TDT?

We can turn a decision theory into an ethical framework by choosing a utility function that encodes the values associated with that ethical framework. The utility function for hedonic utilitarianism assigns utility according to the total well-being in the universe. The utility function for egoism assigns utility only to your own happiness, apathetic to the well-being of others.

Virtue ethics and deontological ethics are harder to encode. We could do the first by assigning utility to virtuous character traits and disutility to vices. The second could potentially be achieved by assigning negative infinities to violations of the moral rules.

Let’s brush aside the fact that some of these assignments are less feasible than others. Pretend that your favorite ethical system has a nice neat way of being formalized as a utility function. Now, the distinctive feature of TDT-based ethics is that when we are trying to decide on the most ethical course of action, TDT says that we must imagine that our decision would also be taken by anybody else that is sufficiently similar to you in a sufficiently similar context.

In other words, in contemplating what the right action to take is, you imagine a world in which these actions are universalized! This sounds very Kantian. One of his more famous descriptions of the categorical imperative was:

Act only according to that maxim whereby you can at the same time will that it should become a universal law.

This could be a tagline for ethics in a world of TDTists! The maxim for your action resembles the notion of similarity in motivation and situational context that generates subjunctive dependence, and the categorical imperative is the demand that you must take into account this subjunctive dependence if you are to reason consistently.

But actually, I think that the resemblance between Kantian ethical reasoning and timeless decision theory begins to fade away when you look closer. I’ll list three main points of difference:

  1. Consistency vs expected utility
  2. Maxim vs subjunctive dependence
  3. Differences in application

1. Consistency vs expected utility

Kantian universalizability is not about expected utility, it is about consistency. The categorical imperative forbids acts that, when universalized, become self-undermining. If an act is consistently universalizable, then it is not a violation of the categorical imperative, even if it ends up with everybody in horrible misery.

Timeless decision theory looks at a world in which everybody acts according to the same maxim that you are acting under, and then asks whether this world looks nice or not. “Looks nice” refers to your utility function, not any notion of consistency or non-self-underminingness (not a word, I know).

So this is the first major difference: A timeless ethical theorist cares ultimately about optimizing their moral values, not about making sure that their values are consistently applicable in the limit of universal instantiation. This puts TDT-based ethics closer to a rule-based consequentialism than to Kantian ethics, although this comparison is also flawed.

Is this a bug or a feature of TDT?

I’m tempted to say it’s a feature. My favorite example of why Kantian consistency is not a desirable meta-ethical principle is that if everybody were to give to charity, then all the problems that could be solved by giving to charity would be solved, and the opportunity to give to charity would disappear. So the act of giving to charity becomes self-undermining upon universalization.

To which I think the right response is: “So what?”

If a world in which everybody gives to charity is a world in which there are no more problems to be solved by charity-giving is impossible, then that sounds pretty great to me. If this consistency requirement prevents you from solving the problems you set out to solve, then it seems like a pretty useless requirement for ethical reasoning.

If your values can be encoded into an expected utility function, then the goal of your ethics should be to maximize that function. The antecedent of this conditional could be reasonably disputed, but I think the conditional as a whole is fairly unobjectionable.

2. Maxim versus subjunctive dependence

One of the most common retorts to Kant’s formulation of the categorical imperative rests on the ambiguity of the term ‘maxim’.

For Kant, your maxim is supposed to be the motivating principle behind your action. It can be thought of as a general rule that determines the contexts in which you would take this action.

If your action is to donate money, then your maxim might be to give 10% of your income to charity every year. If your action is to lie to your boss about why you missed work, then your maxim might be to be dishonest whenever doing otherwise will damage your career prospects.

Now, the maxim is the thing that is universalized, not the action itself. So you don’t suppose that everybody suddenly starts lying to their boss. Instead, you imagine that anybody in a situation where being honest would hurt their career prospects begins lying.

In this situation, Kant would argue that if nobody was honest in these situations, then their bosses would just assume dishonesty, in which case, the employees would never even get the chance to lie in the first place. This is self-undermining; hence, forbidden!

I actually like this line of reasoning a lot. Scott Alexander describes it as similar to the following rule:

Don’t do things that undermine the possibility to offer positive-sum bargains.

Coordination problems arise because individuals decide to defect from optimal equilibriums. If these defectors were reasoning from the Kantian principle of universalizability, they would realize that if everybody behaved similarly then the ability to defect might be undermined.

But the problem largely lies in how one specifies the maxim. For example, compare the following two maxims:

Maxim 1: Lie to your boss whenever being honest would hurt your career opportunities.

Maxim 2: Lie to your boss about why you missed work whenever the real reason is that you went on all-night bar-hopping marathon with your friends Jackie and Khloe and then stayed up all night watched Breaking Bad highlight clips on your Apple TV.

If Maxim 2 is the true motivating principle of your action, then it seems a lot less obvious that the action is a violation of the categorical imperative. If only people in this precisely specified context lied to their bosses, then bosses would overall probably not become less trusting of their employees (unless your boss knows an unusual amount about your personal life). So the maxim is not self-undermining under universalization, and is therefore not forbidden.

Under Maxim 1, lying is forbidden, and under Maxim 2, it is not. But what is the true maxim? There’s no clear answer to this question. Any given action can be truthfully described as arising from numerous different motivational schema, and in general these choices will result in a variety of different moral guidelines.

In TDT, the analog to the concept of a maxim is subjunctive dependence, and this can be defined fairly precisely, without ambiguity. Subjunctive dependence between agents in a given context is just the degree of evidence you get about the actions of an agent given information about the actions of the other agents in that context.

More precisely, it is the degree of non-causal dependence between the actions of agents. It essentially arises from the fact that in a lawful physical universe, similar initial conditions will result in similar final conditions. This can be worded as similarity in initial conditions, in input-output structure, in computational structure, or in logical structure, but the basic idea is the same.

Not only is this precisely defined, it is a real dependence. You don’t have to imagine a fictional universe in which your action makes it more likely that others will act similarly; the claim of TDT is that this is actually the case!

In this sense, TDT is rooted in a simple acknowledgement of dependencies that really do exist and that can be precisely defined, while Kant’s categorical imperative relies on the ambiguous notion of a maxim, as well as a seemingly arbitrary hypothetical consideration. One might be tempted to ask: “Who cares what would happen if hypothetically everybody else acted according to a similar maxim? We should care about is what will actually happen in the real world; we shouldn’t be basing our decisions off of absurd hypothetical worlds!”

3. Difference in application

These two theoretical reasons are fairly convincing to me that Kantianism and TDT ethics are only superficially similar, and are theoretically quite different. But there still remains a question of how much the actual “outputs” of the two frameworks converge. Do they just end up giving similar ethical advice?

I don’t think so. First, consider the issue I touched on previously. I said that defectors that paid attention to the categorical imperative would rethink their decision, because it is not universalizable. But this is not in general true.

If defectors are always free to defect, regardless of how many others defect as well, then defecting will still be universalizable! It is only in special cases that Kantians will not defect, like when a mob boss will come in and necessitate cooperation if enough people defect, or where universal defection depletes an expendable resource that would otherwise be renewable.

The set of coordination problems in which defecting automatically becomes impossible at a certain point are the easiest cases of coordination problems. It’s much harder to get individuals to coordinate if there is no mob boss to step in and set everybody right. These are the cases where Kantianism fails, and TDT succeeds.

TDTists with shared goals for whom cooperation would be more effective for achieving these goals would always cooperate, even if “each would individually be better off” if they defected. (I put scare quotes because you only come to this conclusion by ignoring important dependencies in the problem).

The key difference here comes down again to #1: timeless decision theorists maximize expected utility, not Kantian consistency.

In addition, TDTists don’t necessarily have Kantian hangups about using people as means to an end: if doing so ends up producing a higher expected utility than not, then they’ll go for it without hesitation.

A TDTist that can save two people’s lives by causing a little harm to one person would probably do it if their utility function was relatively impartial and placed a positive value on life. A Kantian would forbid this act.

(Why? Well, Kant thought that this principle of treating people as ends in themselves rather than means to an end was equivalent to the universalizability principle, and as far as I know, pretty much nobody was convinced by his argument for why this was the case. As such, a lot of Kantian ethics looks like it doesn’t actually follow from the universalizability principle.)

An application that might be similar for Kantian ethics and TDT ethics is the treatment of dishonesty and deception. Kant famously forbid any lying of any kind, regardless of the consequences, on the basis that universal lying would undermine the trust that is necessary to make lying a possibility.

One can imagine a similar case made for honesty in TDT ethics. In a society of TDTs, a decision to lie is a decision to produce a society that is overall less honest and less trusting. In situations where the individual benefits of dishonesty are zero-sum, only the negative effects of dishonesty are amplified. This could plausibly make dishonesty on the whole a net negative policy.

Against falsifiability

What if time suddenly stopped everywhere for 5 seconds?

Your first instinct might be to laugh at the question and write it off as meaningless, given that such questions are by their nature unfalsifiable. I think this is a mistaken impulse, and that we can in general have justified beliefs about such questions. Doing so requires moving beyond outdated philosophies of science, and exploring the nature of evidence and probability. Let me present two thought experiments.

The Cyclic Universe

imagine that the universe evolves forward in time in such a way that at one time t1 its state is exactly identical to an earlier state at time t0. I mean exactly identical – the wave function of the universe at time t1 is quantitatively identical to the wave function at time t0.

By construction, we have two states of the universe that cannot be distinguished in any way whatsoever – no observation or measurement that you could make of the one will distinguish it from the other. And yet we still want to say that they are different from one another, in that one was earlier than the other.

But then we are allowing the universe to have a quantity (the ‘time-position’ of events) that is completely undetectable and makes no measurable difference in the universe. This should certainly make anybody that’s read a little Popper uneasy, and should call into question the notion that a question is meaningless if it refers to unfalsifiable events. But let’s leave this there for the moment and consider a stronger reason to take such questions seriously.

The Freezing Rooms

The point of this next thought experiment will be that we can be justified in our beliefs about unobservable and undetectable events. It’s a little subtler, but here we go.

Let’s imagine a bizarre building in which we have three rooms with an unusual property: each room seems to completely freeze at regular intervals. By everything I mean everything – a complete cessation of change in every part of the room, as if time has halted within.

Let’s further imagine that you are inside the building and can freely pass from one room to the other. From your observations, you conclude that Room 1 freezes every other day, Room 2 every fourth day, and Room 3 every third day. You also notice that when you are in any of the rooms, the other two rooms occasionally seem to suddenly “jump forward” in time by a day, exactly when you expect that your room would be frozen.

Room 1

Room 2 Room 3

So you construct this model of how these bizarre rooms work, and suddenly you come to a frightening conclusion – once every twelve days, all three rooms will be frozen at the same time! So no matter what room you are in, there will be a full day that passes without anybody noticing it in the building, and with no observable consequences in any of the rooms.

Sure, you can just step outside the building and observe it for yourself. But let’s expand our thought experiment: instead of a building with three rooms, let’s imagine that the entire universe is partitioned into three regions of space, in which the same strange temporal features exist. You can go from one region of the universe to another, allowing you to construct an equivalent model of how things work. And you will come to a justified belief that there are periods of time in which absolutely NOTHING is changing in the universe, and yet time is still passing.

Let’s just go a tiny bit further with this line of thought – imagine that suddenly somehow the other two rooms are destroyed (or the other two regions of space become causally disconnected in the extended case). Now the beings in one region will truly have no ability to do the experiments that allowed them to conclude that time is frozen on occasion in their own universe – and yet they are still justified in this belief. They are justified in the same way that somebody that observed a beam of light heading towards the event horizon of the universe is justified in continuing to believe in the existence of the beam of light, even thought it is entirely impossible to ‘catch up’ to the light and do an experiment that verifies that no, it hasn’t gone out of existence.

This thought experiment demonstrates that questions that refer to empirically indistinguishable states of the universe can be meaningful. This is a case that is not easy for Popperian falsifiability or old logical positivists to handle, but can be analyzed through the lens of modern epistemology.

Compare the following two theories of the time patterns of the building, where the brackets indicate a repeating pattern:

Theory 1
Room 1: [ ✓,  ]
Room 2: [ ✓, ✓, ✓, ]
Room 3: [ ✓, ✓, ]

Theory 2
Room 1: [ ✓, , ✓, , ✓, , ✓, , ✓, , ✓ ]
Room 2:  [ ✓, ✓, ✓, , ✓, ✓, ✓, ✗, ✓, ✓, ✓ ]
Room 3: [ ✓, ✓, ✗, ✓, ✓, ✗, ✓, ✓, ✗, ✓, ✓ ]

Notice that these two theories make all the same predictions about what everybody in each room will observe. But Theory 2 denies the existence of the the total freeze every 12 days, while Theory 1 accepts it.

Notice also that Theory 2 requires a much more complicated description to describe the pattern that it postulates. In Theory 1, you only need 9 bits to specify the pattern, and the days of total freeze are entailed as natural consequences of the pattern.

In Theory 2, you need 33 bits to be able to match the predictions of Theory 1 while also removing the total freeze!

Since observational evidence does not distinguish between these theories, this difference in complexity must be accounted for in the prior probabilities for Theory 1 and Theory 2, and would give us a rational reason to prefer Theory 1, even given the impossibility of falsification of Theory 2. This preference wouldn’t go away even in the limit of infinite evidence, and could in fact become stronger.

For instance, suppose that the difference in priors is proportional to the ratio of information required to specify the theory. In addition, suppose that all other theories of the universe that are empirically distinguishable from Theory 1 and Theory 2 starts with a total prior of 50%. If in the limit of infinite evidence we find that all other theories have been empirically ruled out, then we’ll see:

Initially
P(Theory 1) = 39.29%
P(Theory 2) = 10.71%
P(All else) = 50%

Infinite evidence limit
P(Theory 1) = 78.57%
P(Theory 2) =21.43%
P(All else) = 0%

The initial epistemic tax levied on Theory 2 due to its complexity has functionally doubled, as it is now two times less likely that Theory 1! Notice how careful probabilistic thinking does a great job of dealing with philosophical subtleties that are too much for obsolete frameworks of philosophy of science based on the concept of falsifiability. The powers of Bayesian reasoning are on full display here.

Metaphysics and fuzziness: Why tables don’t exist and nobody’s tall

  • The tallest man in the world is tall.
  • If somebody is one nanometer shorter than a tall person, then they are themselves tall.

If the word tall is to mean anything, then it must imply at least these two premises. But from the two it follows by mathematical induction that a two-foot infant is tall, that a one-inch bug is tall, and worst, that a zero-inch tall person is tall. Why? If the tallest man is the world is tall (let’s name him Fred), then he would still be tall if he was shrunk by a single nanometer. We can call this new person ‘Fred – 1 nm’. And since ‘Fred – 1 nm’ is tall, so is ‘Fred – 2 nm’. And then so is ‘Fred – 3 nm’. Et cetera until absurdity ensues.

So what went wrong? Surely the first premise can’t be wrong – who could the word apply to if not the tallest man in the world?

The second seems to be the only candidate for denial. But this should make us deeply uneasy; the implication of such a denial is that there is a one-nanometer wide range of heights, during which somebody makes the transition from being completely not tall to being completely tall. Somebody exactly at this line could be wavering back and forth between being tall and not every time a cell dies or divides, and every time a tiny draft rearranges the tips of their hairs.

Let’s be clear just how tiny a nanometer really is: A sheet of paper is about a hundred thousand nanometers thick. That’s more than the number of inches that make up a mile. If the word ‘tall’ means anything at all, this height difference just can’t make a difference in our evaluation of tallness.

Tall: Not.png

So we are led to the conclusion: Fred is not tall. And if the tallest man on the planet isn’t tall, then nobody is tall. Our concept of tallness is just a useful idea that falls apart on any close examination.

This is the infamous Sorites paradox. What else is vulnerable to versions of the Sorites paradox? Almost every concept that we use in our day to day life! Adulthood, intelligence, obesity, being cold, personhood, wealthiness, and on and on. It’s harder to look for concepts that aren’t affected than those that are!

The Sorites paradox is usually seen in discussions of properties, but it can equally well be applied to discussions of objects. This application leads us to a view of the world that differs wildly from our common sense view. Let’s take a standard philosophical case study: the table. What is it for something to be a table? What changes to a table make it no longer a table?

Whatever answers these questions about tables have, they will hopefully embody our common sense notions about tables and allow us to make the statements that we ordinarily want to make about tables. One such common sense notion involves what it takes for a table to cease being a table; presumably little changes in the table are allowed, while big changes (cleaving it into small pieces) are not. But here we run into the problem of vagueness.

If X is a table, then X would still be a table if it lost a tiny bit of the matter constituting it. Like before, we’ll take this to the extreme to maximize its intuitive plausibility: If a single atom is shed from a table, it’s still a table. Denial of this is even worse than it was before; if changes by single atoms could change table-hood, we would be in a position where we should be constantly skeptical of whether objects are tables, given the microscopic changes that are happening to ordinary tables all the time.

sorites.png

And so we are led inevitably to the conclusion that single atoms are tables, and even that empty space is a table. (Iteratively remove single atoms from a table until it has become arbitrarily small.) Either that, or there are no tables. I take this second option to be preferable.

Fragment

How far do these arguments reach? It seems like most or all macroscopic objects are vulnerable to them. After all, we don’t change our view of macroscopic objects that undergo arbitrarily small losses of constituent material. And this leads us to a worldview in which the things that actually exist match up with almost none of the things that our common-sense intuitions tell us exist: tables, buildings, trees, planets, computers, people, and so on.

But is everything eliminated? Plausibly not. What can be said about a single electron, for instance, that would lead to a continuity premise? Probably nothing; electrons are defined by a set of intrinsic properties, none of which can differ to any degree while the particle still remains an electron. In general, all of the microscopic entities that are thought to fundamentally compose everything else in our macroscopic world will be (seemingly) invulnerable to attack by a version of the Sorites paradox.

The conclusion is that some form of eliminativism is true (objects don’t exist, but their lowest-level constituents do). I think that this is actually the right way to look at the world, and is supported by a host of other considerations besides those in this post.

Closing comments

  • The subjectivity of ‘tall’ doesn’t remove the paradox. What’s in question isn’t the agreement between multiple people about what tall means, but the coherency of the concept as used by a single person. If a single person agrees that Fred is tall, and that arbitrarily small height differences can’t make somebody go from not tall to tall, then they are led straight into the paradox.
  • The most common response to this puzzle I’ve noticed is just to balk and laugh it off as absurd, while not actually addressing the argument. Yes, the conclusion is absurd, which is exactly why the paradox is powerful! If you can resolve the paradox and erase the absurdity, you’ll be doing more than 2000 years of philosophers and mathematicians have been able to do!

More on random sampling from Nature’s Urn

In a previous post, I developed an analogy between patterns of reasoning and sampling procedures. I want to go a little further with two expansions on this idea.

Scientific laws and domains of validity

First, different sampling procedures can focus on sampling from different regions of the urn. This is analogous to how scientific theories have specific domains of validity that they were built to explain, and in general their conclusions do not spread beyond this domain.

Classical Newtonian mechanics is a great theory to explain slowly swinging pendulums and large gravitating bodies, but if you apply it to particles that are too small, or moving too fast, or too massive, then you’ll get bad results. In general, any scientific law will be known to work within a certain range of energies or sizes or speeds.

By analogy, the Super Persuader was not a good source of evidence, because its sampling procedure was to scour the urn for any black balls it could find, and ignore all white balls. Ideally, we want our truth-seeking enterprises to function like random sampling of balls from an urn. But of course, the way that scientists seek out evidence is not analogous to randomly sampling from the entire urn consisting of all pieces of evidence as to the structure of reality. Instead, a psychologist will focus on one region of the urn, a biologist another, and a physicist another.

In this way, a psychologist can say that the evidence they receive is representative of the general state of evidence in a certain region of the urn. The region of the urn being sampled by the scientist represents the domain of validity of the laws they develop.

Developing this line further, we might imagine that there is a general positioning of pieces of evidence or good arguments in terms of accessibility to humans. Some arguments or ideas or pieces of evidence about reality will lie near the top of the urn, and will be low-hanging fruits for any investigators. (Mixing metaphors!) Others will lie deeper down, requiring more serious thought and dedicated investigation to come across.

Advances in tech can allow scientists to dig deeper into Nature’s urn, expanding the domains of validity of their theories and becoming better acquainted with the structure of reality.

Cognitive biases and generalized distortions of reasoning

Second, a taxonomy of different ways in which reasoning can go wrong naturally arises from the metaphor. Some of these correspond nicely to well-known cognitive biases.

For instance, the sampling procedure used by the Super Persuader involved selectively choosing evidence to support a certain hypothesis. In general, this corresponds to selection biases. A special case of this is motivated reasoning. When we strongly desire a hypothesis to be true, we are more likely to find, remember, and fairly judge evidence in its favor than evidence against it. Selection biases are in general just non-random sampling procedures.

Another class of error is misjudgment, where we draw a black ball, but see it as a white ball. This would correspond to things like the backfire effect (LINK), where evidence against a proposition we favor serves to strengthen our belief in it, or just failure to understand an argument or a piece of evidence.

A third class of error is bad extrapolation, where we are sampling randomly from one region of the urn, but then act as if we are sampling from some other region. This would include hasty generalizations and all forms of irrational stereotyping.

Generalizing argument strength

Finally, a weakness of the urn analogy is that it treats all arguments as equally strong. We can fix this by imagining that some balls come clustered together as a single, stronger argument. Additionally, we could imagine argument strength as ball density, and suppose that we actually want to estimate the ratio of mass of black balls to mass of white balls. In this way, denser balls effect our judgment of the ratio more severely than less dense ones.

Free will and decision theory (Part 2)

In a previous post, I talked about something that has been confusing me regarding free will and decision theory. I want to revisit this topic and express a different way to frame the issue.

Here goes:

A decision theory is an algorithm used for calculating the expected utilities of the different possible actions you could take: EU(A). It returns a recommendation for you to take the action that maximizes expected utility: A* = argmax EU(A).

I have underlined the word that is the source of the confusion. The question is: how can we make sense of this notion of possible actions given determinism? If we take determinism very seriously, then the set of possible actions is a set with a single member, which is the action that you end up actually taking. There’s an intuitive sense of possibility at play here that looks benign enough, but upon closer examination becomes problematic.

For instance, we obviously want our set of actions to be restricted to some degree – we don’t want our decision theory telling us to snap our fingers and magically turn the world into a utopia. One seemingly clear line we could draw is to say that possibility here just means physical possibility. Actions that require us to exceed the speed of light, violate conservation of energy, or other such physical impossibilities are not allowed to be included in the set of possible actions.

But this is no solution at all! After all, if the physical laws are the things uniquely generating our actual actions, then all other actions must be violations! Determinism dictates that there can’t be multiple different answers to the question of “what happens next?”. We have an intuitive notion of physical possibility that includes things like “wave my hand through the air” and “take a nap”. But upon close examination, these seem to really just be the product of our ignorance of the true workings of the laws of nature. If we could deeply internalize the way that physics generates behaviors like hand-waving and napping, then we would be able to see why in a particular case hand-waving is possible (and thus happens), and why in other cases it cannot happen.

In other words, the claim I am making is that there is no clear distinction on the level of physics between the claim that I can jump to the moon and the claim that I could have waved my hand around in front of me even though I didn’t. The only difference between these two, it seems to me, is in terms of the intuitive obviousness of the impossibility of the first, and the lack of intuitive obviousness for the second.

Let’s say that eventually physicists reduce all of fundamental physics to a single principle, for example the Principle of Minimum Action. Then for any given action, either it is true that this action minimizes Action, or it is false. (sorry for the two meanings of the word ‘action’, it couldn’t be helped) If it is true, then the action is physically possible, and will in fact happen. And if it is false, then the action is physically impossible, and will not happen. We can explicitly lay out an explanation of why me jumping to the moon does not minimize Action, but it is much much much harder to lay out explicitly why me waving my hand in front of my face right now does not minimize Action. The key point is that the only difference here is an epistemic one – some actions are easier for us to diagnose as non-action-minimizing than others, but in reality, they either are or are not.

If this is all true, then physical possibility is hopeless as a source to ground a choice of the set of possible actions, and any formal decision theory will ultimately rest on an unreal distinction between possible and impossible actions. This distinction will not be represented in any real features of the physical world, and will be vulnerable to future discoveries or increases in computational power that expand our knowledge of the causal determinants of our actions.

Are there other notions of possibility that might be more fruitful for grounding the choice of the set of possible actions? I think not. Here’s a general argument for why not.

Ought implies can

(1) If you should do something, then you can do it.
(2) There is only a single thing that you can do.
(3) Therefore, there is at most a single thing that you should do.

This is an argument that I initially saw in the context of morality. I regarded it as a mere intellectual curiosity, fun to ponder but fairly unimportant (given that I didn’t expect much out of ethics in the first place).

But I think that the exact same argument applies for any theory of normative instrumental rationality. This is much more troubling to me! Unlike morality, I actually feel fairly strongly that there are objective facts about instrumental rationality – that is, facts about how an agent should act in order to optimize their values. (This is no longer an ethical should, but an epistemic one)

But I also feel strongly tempted to endorse both premises (1) and (2) with regard to this epistemic should, and want to reject the conclusion. Let’s lay out our options.

Reject (1): But then this means that there are some actions that it is true that you should do, even though you can’t do them. Do we really want a theory of instrumental rationality that tells us that the most rational course of action is one that we definitely cannot take? This seems obviously undesirable, for the same reason that the decision theory that says that the optimal action is to snap your fingers and turn the world into a utopia is undesirable. If this premise is not true of our decision theory, then we might sometimes have to accept that the action we should take is physically impossible, and what’s the use of a decision theory like that?

Reject (2): But this entails an abandonment of our best understanding of physical reality. Even in standard formulations of quantum mechanics, the wave function that describes the state of the universe evolves completely deterministically. (You might now wonder why quantum mechanics is always thought of as a fundamentally probabilistic theory, but this is definitely too big of a topic to go into here.) So it seems likely that this premise is just empirically correct.

Accept (3): But then our theory of rationality is useless, as it tells us nothing besides “Just do what you are going to do”!

This is the puzzle. Do you see any way out?