Bayesians believe in treating belief probabilistically, and updating credences via Bayes’ rule. They face the problem of how to set priors – while probability theory gives a clear prescription for how to update beliefs, it doesn’t tell you what credences you should start with before getting any evidence.
Bayesians are thus split into two camps: objective Bayesians and subjective Bayesians. Subjective Bayesians think that there are no objectively correct priors. A corollary to this is that there are no correct answers to what somebody should believe, given their evidence.
Objective Bayesians disagree. Different variants specify different procedures for determining priors. For instance, the principle of indifference (POI) prescribes that the proper priors are those that are indifferent between all possibilities. If you have N possibilities, then according to the POI, you should distribute your priors credences evenly (1/N each). If you are considering a continuum of hypotheses (say, about the mass of an object), then the principle of indifference says that your probability density function should be uniform over all possible masses.
Now, here’s a problem for objective Bayesians.
You are going to be handed a cube, and all that you know about it is that it is smaller than 1 cm3. What should your prior distribution over possible cubes you might be handed look like?
Naively applying the POI, you might evenly distribute your credences across all volumes from 0 cm3 to 1 cm3 (so that there is a 50% chance that the cube has a volume less than .50 cm3 and a 50% chance its volume is between greater than .50 cm3).
But instead of choosing to be indifferent over possible volumes, we could equally well have chosen to be indifferent over possible side areas, or side lengths. The key point is that these are all different distributions. If we spread our credences evenly across possible side lengths from 0 cm to 1 cm, then we would have a distribution with a 50% chance that the cube has a volume less than .125 cm3 and a 50% chance that the volume is greater than this.
In other words, our choice of concepts (edge length vs side area vs volume) ends up determining the shape of our prior. Insofar as there is no objectively correct choice of concepts to be using, there is no objectively correct prior distribution.
I’ve known about this thought experiment for a while, but only recently internalized how serious of a problem it presents. It essentially says that your choice of priors is hostage to your choice of concepts, which is a pretty unsavory idea. In some cases, which concept to choose is very non-obvious (e.g. length vs area vs volume). In others, there are strong intuitions about some concepts being better than others.
The most famous example of this is contained in Nelson Goodman’s “new riddle of induction.” He proposes a new concept grue, which is defined as the set of objects that are either observed before 2100 and green, or observed after 2100 and blue. So if you spot an emerald before 2100, it is grue. So is a blue ball that you spot after 2100. But if you see an emerald after 2100, it will not be grue.
To characterize objects like this emerald that is observed after 2100, Goodman also creates another concept bleen, which is the inverse of grue. The set of bleen objects is composed of blue objects observed before 2100 and green objects observed after 2100.
Now, if we run ordinary induction using the concepts grue and bleen, we end up making bizarre predictions. For instance, say we observe many emeralds before 2100, and always found them to be green. By induction, we should infer that the next emerald we observe after 2100 is very likely going to be green as well. But if we thought in terms of the concepts grue and bleen, then we would say that all our observations of emeralds so far have provided inductive support for the claim “All emeralds are grue.” The implication of this is that the emeralds we observe after time 2100 will very likely also be grue (and thus blue).
In other words, by simply choosing a different set of fundamental concepts to work with, we end up getting an entirely different prediction about the future.
Here’s one response that you’ve probably already thought of: “But grue and bleen are such weird artificial choices of concepts! Surely we can prefer green/blue over bleen/grue on the basis of the additional complexity required in specifying the transition time 2100?”
The problem with this is that we could equally well define green and blue in terms of grue and bleen:
Green = grue before 2100 or bleen after 2100
Blue = bleen before 2100 or grue after 2100
If for whatever reason somebody had grue and bleen as their primitive concepts, they would see green and blue as the concepts that required the additional complexity of the time specification.
“Okay, sure, but this is only if we pretend that color is something that doesn’t emerge from lower physical levels. If we tried specifying the set of grue objects in terms of properties of atoms, we’d have a lot harder time than if we tried specifying the set of green or blue objects in terms of properties of atoms.”
This is right, and I think it’s a good response to this particular problem. But it doesn’t work as a response to a more generic form of the dilemma. In particular, you can construct a grue/bleen-style set of concepts for whatever you think is the fundamental level of reality. If you think electrons and neutrinos are undecomposable into smaller components, then you can imagine “electrinos” and “neuctrons.” And now we have the same issue as before… thinking in terms of electrinos would lead us to conclude that all electrons will suddenly transform into neutrinos in 2100.
The type of response I want to give is that concepts like “electron” and “neutrino” are preferable to concepts like “electrinos” and “neuctrons” because they mirror the structure of reality. Nature herself computes electrons, not electrinos.
But the problem is that we’re saying that in order to determine which concepts we should use, we need to first understand the broad structure of reality. After which we can run some formal inductive schema to, y’know, figure out the broad structure of reality.
Said differently, we can’t really appeal to “the structure of reality” to determine our choices of concepts, since our choices of concepts end up determining the results of our inductive algorithms, which are what we’re relying on to tell us the structure of reality in the first place!
This seems like a big problem to me, and I don’t know how to solve it.