January 16, 2018 – Rising Entropy

The method of reasoning illustrated here is somewhat reminiscent of Laplace’s “principle of indifference.” However, we are concerned here with indifference between problems, rather than indifference between events. The distinction is essential, for indifference between events is a matter of intuitive judgment on which our intuition often fails even when there is some obvious geometrical symmetry (as Bertrand’s paradox shows).

E. T. Jaynes
Prior Probabilities

I’ve previously written praise of the principle of maximum entropy as a prior-setting method that is justified on the basis of a very minimal and highly intuitive set of epistemic features.

But there’s an even better technique for prior-setting, one that is justified on incredibly fundamental grounds. This technique can only be used in rare times, and is immensely powerful when it is used. It’s the principle of transformation groups.

Here is the single assumption from which the principle arises:

“In problems where we have the same prior information, we should assign the same prior probabilities.” (Jaynes’ wording)

This is simple to the point of seeming almost tautological. So what can we do with it?

We’ll start with one of the simplest applications of transformation groups. Suppose that somebody gives you the following information:

I = “This coin will land either tails or heads.”

Now you want to say what the following probabilities should be:

P(This coin will land tails | I) = p
P(This coin will land heads | I) = q

Intuitively, it seems obvious to us that absent any other information, we should assign equal probabilities to these. But why? Is there a principled reason for assuming that the coin is a fair coin? Or is this just a presumption that is importing into the problem our background knowledge about most coins being fair?

The method of transformation groups gives us a principled reason. It says to rephrase the problem as follows:

I’ = “This coin will land either heads or tails.”

Now, our initial problem has only changed to our new problem by replacing every “heads” with “tails” and “tails” with “heads”. Since our prior-setting procedure found that P(This coin will land tails | I) = p in the first problem, it should now find P(This coin will land heads | I) = P in this new one. This is required for any consistent prior-setting procedure! If the problem changes by just switching places of labels, then the priors should change in the exact same way. This means that:

P(This coin will land heads | I’) = p
P(This coin will land tails | I’) = q

But clearly, I = I’; the logical operator “OR” is symmetric! Which means that:

P(This coin will land heads | I’) = P(This coin will land heads | I)

And this is only possible if p = q = ½!

This is simple, but beautiful. The principle tells us that the only logical way to set our priors in this case is evenly – anything else would be either logically inconsistent, or assuming extra information that breaks the symmetry between heads and tails. It goes from logical symmetry to probability symmetry!

Finding these symmetries is what the method of transformation groups is all about. More generally, one can represent a choice between N different possibilities as the statement:

I = “Possibility 1 or possibility 2 or … possibility N”

But this is symmetric with:

I’ = “Possibility 2 or possibility 1 or … possibility N”

As well as all other orderings.

By the exact same argument as above, your prior-setting procedure is required by logical consistency to evenly distribute credences across the N procedures. So for each n from 1 to N, P(Possibility k | I) = 1/N.

The method of transformation groups can also be applied to continuous variables, where finding the right set of priors can be a lot less intuitive. You do so by noting different types of symmetries for different types of parameters.

For instance, a location parameter is one that serves to merely shift a probability distribution over an observable quantity, without reshaping the distribution. We can formally express this by saying that for a location parameter µ, the distribution over x depends only on the difference x – µ:

p(x | µ) = f(x – µ)

For such parameters, it must be the case that the prior distribution over them is similarly symmetric over translational shifts:

For all ∆, p(µ) = p(µ + ∆)
So p(µ) = c, for some constant c

Another common category of parameters are scale parameters. These are parameters that serve to rescale probability distributions without reshaping them. Formally:

p(x | σ) = 1/σ g(x / σ)

For this symmetry, the requirement for consistency is:

For all s, p(σ) = 1/s · p(σ / s)
So, p(σ) = 1/σ

In summation, by carefully analyzing the symmetries of the background information you have, you can extract out requirements for how to set your prior distribution that are mandated on threat of logical inconsistency!

These are the laws of probability, which we have proved to be necessarily true of any consistent set of degrees of belief. Any definite set of degrees of belief which broke them would be inconsistent in the sense that it violated the laws of preference between options … If anyone’s mental condition violated these laws, his choice would depend on the precise form in which the options were offered him, which would be absurd. He could have a book made against him by a cunning better and would then stand to lose in any event.

We find, therefore, that a precise account of the nature of partial belief reveals that the laws of probability are laws of consistency, an extension to partial beliefs of formal logic, the logic of consistency.

Frank Ramsey
The Foundations of Mathematics and Other Logical Essays, Volume 5

In this post, I’m going to describe one of the more famous arguments for Bayesianism.

These arguments are about how different types of epistemological frameworks will handle different series of wagers. Let me just lay out clearly what exactly we mean by a wager, so as to remove any ambiguity.

A wager on proposition A is a betting opportunity. It involves a payoff amount S and a buy-in quantity. In general, the amount that the buy-in costs will be some fraction f of the payoff amount, so we’ll write it as fS. If you bet on A and it turns out true, then you get the payout S, but still lost the initial buy-in. And if you bet on A and it turns out false, then you get no payout and lose the fS you already spent.

A	Net Payout
True	S – fS
False	-fS

From this payout table, you can calculate that an agent will find the wager to be favorable to them exactly in the case that P(A) is greater than f. That is, the agent will want to take the bet whenever the chance of a payout is greater than the proportion of the payout that is required to buy into the bet.

Now, imagine that somebody has a credence of 52% in a proposition A and a credence of 52% in the proposition ~A. How will they evaluate the following set of bets?

B₁: pays out $100 if A is true, buy-in of $51
B₂: pays out $100 if ~A is true, buy-in of $51
B₃: pays out a guaranteed $100, buy-in of $102

They will see both B₁ and B₂ as favorable bets, since the buy-in is a smaller fraction of the payout than the chance of payout. And they will see B₃ as an unfavorable bet, since clearly the buy-in is a larger proportion of the payout than the chance of a payout.

But B₃ is just the same as the combination of bets B₁ and B₂!

Why? Well, if you bet on both B₁ and B₂, then you are guaranteed to win exactly one of the two (since A and ~A cannot both be true, but one of the two must be). Then you will have paid in a net sum of $102, and gotten back only $100.

A similar argument can be made for any levels of credence C(A) and C(~A) that don’t sum up to 100%. And all of the usual axioms of probability theory can be argued for in the same way. Such arguments are called Dutch book arguments.

Dutch book arguments are standardly presented as revealing that if one does not form beliefs according to the laws of probability theory, then they will be able to be juiced for money by clever bookies.

This is true; somebody with beliefs like those described above can be endlessly exploited for profit. But it is much less impressive than the real conclusion of the Dutch book argument.

Recall that our agent above was found to believing a logical contradiction as a result of not having their beliefs align with probability theory (they had to believe that a bet was simultaneously favorable to them and not favorable to them)

Said another way, an agent not following the probability calculus may evaluate the same proposition differently if presented in a different form.

This is what Dutch book arguments really say: if you want your beliefs to be logically consistent, then you are required to reason according to probability theory!

M	T	W	T	F	S	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30	31

Day: January 16, 2018

Consistency and priors

Dutch book arguments