In a couple of previous posts, I’ve said some things about Bayesianism that I now think might not be right. Specifically, I claimed a few times that Bayesians will have trouble with overfitting data. Having looked into it more and seen some complicated arguments on either side, I’m less sure about this. I’m currently just confused about it, so I’m writing up my confusion here.
The reasoning behind my initial claim was something like this:
- Overfitting arises from an excessive focus on accommodation. (If your only epistemic priority is accommodating the data you receive, then you will over-accommodate the data, by fitting the noise in the data instead of just the underlying trend.)
- We can deal with overfitting by optimizing for other epistemic virtues like simplicity, predictive accuracy, or some measure of distance to truth. (For example, minimum description length and maximum entropy optimize for simplicity, and cross validation optimizes for predictive accuracy).
- Bayesianism is an epistemological procedure that has two steps, setting of priors and updating those priors.
- Updating of priors is done via Bayes’ rule, which rewards theories according to how well they accommodate their data (creating the potential for overfitting).
- Bayesian priors can be set in ways that optimize for other epistemic virtues, like simplicity or humility.
- In the limit of infinite evidence, differences in priors between empirically distinguishable theories are washed away.
- Thus, in the limit, Bayesianism becomes a primarily accommodating procedure, as the strength of the evidential update swamps your initial differences in priors.
In other words, the model that is best supported by the data will be the one that fits it perfectly (i.e. overfitting). We get out of this by giving overfitting models low priors… but we should expect that even this won’t be sufficient if we get strong enough evidence.
Is this wrong? And why?