There’s a beautiful parallel between Bayesian updating of beliefs and evolutionary dynamics of a population that I want to present.

Let’s start by deriving some basic evolutionary game theory! We’ll describe a population as made up of N different genotypes:

(1, 2, 3, …, N)

Each of these genotypes is represented in some proportion of the population, which we’ll label with an X.

Distribution of genotypes in the population X = (X_{1}, X_{2}, X_{3}, …, X_{N})

Each of these fractions will in general change with time. For example, if some ecosystem change occurs that favors genotype 1 over the other genotypes, then we expect X_{1} to grow. So we’ll write:

Distribution of genotypes over time = (X_{1}(t), X_{2}(t), X_{3}(t), …, X_{N}(t))

Each genotype has a particular *fitness* that represents how well-adjusted it is to survive onto the next generation in a population.

Fitness of genotypes = (f_{1}, f_{2}, f_{3}, …, f_{N})

Now, if Genotype 1 corresponds to a predator, and Genotype 2 to its prey, then the fitness of Genotype 2 very much depends on the population of Genotype 1 organisms as well its own population. In general, the fitness function for a particular genotype is going to depend on the distribution of *all* the genotypes, not just that one. This means that we should write each fitness as a function of *all* the X_{i}s

Fitness of genotypes = (f_{1}(X), f_{2}(X), f_{3}(X), …, f_{N}(X))

Now, what is relevant to the change of any X_{i} is not the absolute value of the fitness function f_{i}, but the comparison of f_{i} to the average fitness of the entire population. This reflects the fact that natural selection is *competitive*. It’s not enough to just be fit, you need to be *more fit than your neighbors* to successfully pass on your genes.

We can find the average fitness of the population by the standard method of summing over each fitness weighted by the proportion of the population that has that fitness:

f_{avg} = X_{1} f_{1} + X_{2} f_{2} + … + X_{N} f_{N}

And since the fitness of a genotype is relative to the average population genotype the change of X_{i} is proportional to the ratio of f_{i }/ f_{avg}. In addition, the change of X_{i} at time t should be proportional to the size of X_{i }at time t (larger populations grow faster than small populations). Here is the simplest equation we could write with these properties:

X_{i}(t + 1) = X_{i}(t) · f_{i }/ f_{avg}

This is the *discrete replicator equation*. Each genotype either grows or shrinks over time according to the ratio of its fitness to the average population fitness. If the fitness of a given genotype is exactly the same as the average fitness, then the proportion of the population that has that genotype stays the same.

Now, how does this relate to Bayesian inference? Instead of a population composed of different genotypes, we have a population composed of beliefs in different theories. The fitness function for each theory corresponds to how well it predicts new evidence. And the evolution over time corresponds to the updating of these beliefs upon receiving new evidence.

X_{i}(t + 1) → P(T_{i} | E)

X_{i}(t) → P(T_{i})

f_{i} → P(E | T_{i})

What does f_{avg} become?

f_{avg} = X_{1} f_{1} + … + X_{N} f_{N}_{
}*becomes*

P(E) = P(T_{1}) P(E | T_{2}) + … + P(T_{N}) P(E | T_{N})

But now our equation describing evolutionary dynamics just becomes identical to Bayes’ rule!

X_{i}(t + 1) = X_{i}(t) · f_{i }/ f_{avg
}*becomes*

P(T_{i} | E) = P(T_{i}) P(E | T_{i}) / P(E)

This is pretty fantastic. It means that we can quite literally think of Bayesian reasoning as a form of natural selection, where only the best ideas survive and all others are outcompeted. A Bayesian treats their beliefs as if they are organisms in an ecosystem that punishes those that fail to accurately predict what will happen next. It is evolution towards maximum predictive power.

There are some intriguing hints here of further directions for study. For example, the Bayesian fitness function only depended on the particular theory whose fitness was being evaluated, but it could have more generally depended on *all* of the different theories as in the original replicator equation.

Plus, the discrete replicator equation is only one simple idealized model of patterns of evolutionary change in populations. There is a *continuous* replicator equation, where populations evolve smoothly as analytic functions of time. There are also generalizations that introduce mutation, allowing a population to spontaneously generate new genotypes and transition back and forth between similar genotypes. Evolutionary graph theory incorporates population structure into the model, allowing for subtleties regarding complex spatial population interactions.

What would an inference system based off of these more general evolutionary dynamics look like? How would it compare to Bayesianism?