There’s a beautiful parallel between Bayesian updating of beliefs and evolutionary dynamics of a population that I want to present.
Let’s start by deriving some basic evolutionary game theory! We’ll describe a population as made up of N different genotypes:
(1, 2, 3, …, N)
Each of these genotypes is represented in some proportion of the population, which we’ll label with an X.
Distribution of genotypes in the population X = (X1, X2, X3, …, XN)
Each of these fractions will in general change with time. For example, if some ecosystem change occurs that favors genotype 1 over the other genotypes, then we expect X1 to grow. So we’ll write:
Distribution of genotypes over time = (X1(t), X2(t), X3(t), …, XN(t))
Each genotype has a particular fitness that represents how well-adjusted it is to survive onto the next generation in a population.
Fitness of genotypes = (f1, f2, f3, …, fN)
Now, if Genotype 1 corresponds to a predator, and Genotype 2 to its prey, then the fitness of Genotype 2 very much depends on the population of Genotype 1 organisms as well its own population. In general, the fitness function for a particular genotype is going to depend on the distribution of all the genotypes, not just that one. This means that we should write each fitness as a function of all the Xis
Fitness of genotypes = (f1(X), f2(X), f3(X), …, fN(X))
Now, what is relevant to the change of any Xi is not the absolute value of the fitness function fi, but the comparison of fi to the average fitness of the entire population. This reflects the fact that natural selection is competitive. It’s not enough to just be fit, you need to be more fit than your neighbors to successfully pass on your genes.
We can find the average fitness of the population by the standard method of summing over each fitness weighted by the proportion of the population that has that fitness:
favg = X1 f1 + X2 f2 + … + XN fN
And since the fitness of a genotype is relative to the average population genotype the change of Xi is proportional to the ratio of fi / favg. In addition, the change of Xi at time t should be proportional to the size of Xi at time t (larger populations grow faster than small populations). Here is the simplest equation we could write with these properties:
Xi(t + 1) = Xi(t) · fi / favg
This is the discrete replicator equation. Each genotype either grows or shrinks over time according to the ratio of its fitness to the average population fitness. If the fitness of a given genotype is exactly the same as the average fitness, then the proportion of the population that has that genotype stays the same.
Now, how does this relate to Bayesian inference? Instead of a population composed of different genotypes, we have a population composed of beliefs in different theories. The fitness function for each theory corresponds to how well it predicts new evidence. And the evolution over time corresponds to the updating of these beliefs upon receiving new evidence.
Xi(t + 1) → P(Ti | E)
Xi(t) → P(Ti)
fi → P(E | Ti)
What does favg become?
favg = X1 f1 + … + XN fN
P(E) = P(T1) P(E | T2) + … + P(TN) P(E | TN)
But now our equation describing evolutionary dynamics just becomes identical to Bayes’ rule!
Xi(t + 1) = Xi(t) · fi / favg
P(Ti | E) = P(Ti) P(E | Ti) / P(E)
This is pretty fantastic. It means that we can quite literally think of Bayesian reasoning as a form of natural selection, where only the best ideas survive and all others are outcompeted. A Bayesian treats their beliefs as if they are organisms in an ecosystem that punishes those that fail to accurately predict what will happen next. It is evolution towards maximum predictive power.
There are some intriguing hints here of further directions for study. For example, the Bayesian fitness function only depended on the particular theory whose fitness was being evaluated, but it could have more generally depended on all of the different theories as in the original replicator equation.
Plus, the discrete replicator equation is only one simple idealized model of patterns of evolutionary change in populations. There is a continuous replicator equation, where populations evolve smoothly as analytic functions of time. There are also generalizations that introduce mutation, allowing a population to spontaneously generate new genotypes and transition back and forth between similar genotypes. Evolutionary graph theory incorporates population structure into the model, allowing for subtleties regarding complex spatial population interactions.
What would an inference system based off of these more general evolutionary dynamics look like? How would it compare to Bayesianism?